Re: [PR] feat: add FastAppend [iceberg-cpp]

via GitHub Tue, 20 Jan 2026 00:14:04 -0800


zhjwpku commented on code in PR #516:
URL: https://github.com/apache/iceberg-cpp/pull/516#discussion_r2707239801



##########
src/iceberg/util/content_file_util.h:
##########
@@ -35,6 +38,113 @@
 
 namespace iceberg {
 
+/// \brief Hash functor for std::shared_ptr<DataFile> based on file path.
+struct ICEBERG_EXPORT DataFilePtrHash {
+  size_t operator()(const std::shared_ptr<DataFile>& file) const {
+    if (!file) {
+      return 0;
+    }
+    return std::hash<std::string>{}(file->file_path);
+  }
+};
+
+/// \brief Equality functor for std::shared_ptr<DataFile> based on file path.
+struct ICEBERG_EXPORT DataFilePtrEqual {
+  bool operator()(const std::shared_ptr<DataFile>& left,
+                  const std::shared_ptr<DataFile>& right) const {
+    if (left == right) {
+      return true;
+    }
+    if (!left || !right) {
+      return false;
+    }
+    return left->file_path == right->file_path;
+  }
+};
+
+/// \brief A set of DataFile pointers, deduplicated by file path.
+///
+/// This preserves insertion order, which is important for row ID assignment 
in v3
+/// manifests. Similar to Java's DataFileSet which uses LinkedHashSet to 
maintain
+/// insertion order.
+class ICEBERG_EXPORT DataFileSet {

Review Comment:
   This is more performant and cleaner, so I changed it to this approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: add FastAppend [iceberg-cpp]

Reply via email to