wgtmac commented on code in PR #519:
URL: https://github.com/apache/iceberg-cpp/pull/519#discussion_r2708138306
##########
src/iceberg/test/CMakeLists.txt:
##########
@@ -109,6 +109,7 @@ add_iceberg_test(util_test
SOURCES
bucket_util_test.cc
config_test.cc
+ data_file_set_test.cc
Review Comment:
Add it to the meson test.
##########
src/iceberg/update/snapshot_update.h:
##########
@@ -103,25 +107,75 @@ class ICEBERG_EXPORT SnapshotUpdate : public
PendingUpdate {
/// \brief Write data manifests for the given data files
///
- /// \param data_files The data files to write
+ /// \tparam Iterator Iterator type that dereferences to
std::shared_ptr<DataFile>
+ /// \param begin Iterator to the beginning of the data files range
+ /// \param end Iterator to the end of the data files range
/// \param spec The partition spec to use
/// \param data_sequence_number Optional data sequence number for the files
/// \return A vector of manifest files
- /// TODO(xxx): Change signature to accept iterator begin/end instead of
vector to avoid
- /// intermediate vector allocations (e.g., from DataFileSet)
+ // TODO(xxx): write manifests in parallel
+ template <typename Iterator>
Result<std::vector<ManifestFile>> WriteDataManifests(
- const std::vector<std::shared_ptr<DataFile>>& data_files,
Review Comment:
Originally I suggested to use iterator because we need an extra conversion
to call this function. Now I found that `DataFileSet` also uses std::vector
internally, do you think it is better to use `std::span<const
std::shared_ptr<DataFile>> ` as the input here and add a `std:span<const
std::shared_ptr<DataFile>> DataFileSet::as_span() const` or something similar?
In this approach, we can still hide the implementation in the source file.
##########
src/iceberg/util/data_file_set.h:
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#pragma once
+
+/// \file iceberg/util/data_file_set.h
Review Comment:
Please install it in the meson.build
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]