Re: [PR] Core, Parquet: Allow for Writing Parquet/Avro Manifests in V4 [iceberg]

via GitHub Fri, 17 Apr 2026 03:57:51 -0700


nastra commented on code in PR #15634:
URL: https://github.com/apache/iceberg/pull/15634#discussion_r3099390523



##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -581,9 +584,33 @@ private static <K, V> Map<K, V> copyMap(Map<K, V> map, 
Set<K> keys) {
 
   private static Map<Integer, ByteBuffer> copyByteBufferMap(
       Map<Integer, ByteBuffer> map, Set<Integer> keys) {
-    return SerializableByteBufferMap.wrap(copyMap(map, keys));
+    if (map == null) {
+      return null;
+    }
+
+    // This is required as long as we have Map<Integer, ByteBuffer> in the API 
since Parquet is
+    // re-using buffers.
+    Map<Integer, ByteBuffer> deepCopy = 
Maps.newHashMapWithExpectedSize(map.size());

Review Comment:
   nit: maybe worth putting this into a `deepCopyMap` method



##########
core/src/main/java/org/apache/iceberg/V4Metadata.java:
##########
@@ -278,28 +279,35 @@ static Schema wrapFileSchema(Types.StructType fileSchema) 
{
   }
 
   static Types.StructType fileType(Types.StructType partitionType) {
-    return Types.StructType.of(
-        DataFile.CONTENT.asRequired(),
-        DataFile.FILE_PATH,
-        DataFile.FILE_FORMAT,
-        required(
-            DataFile.PARTITION_ID, DataFile.PARTITION_NAME, partitionType, 
DataFile.PARTITION_DOC),
-        DataFile.RECORD_COUNT,
-        DataFile.FILE_SIZE,
-        DataFile.COLUMN_SIZES,
-        DataFile.VALUE_COUNTS,
-        DataFile.NULL_VALUE_COUNTS,
-        DataFile.NAN_VALUE_COUNTS,
-        DataFile.LOWER_BOUNDS,
-        DataFile.UPPER_BOUNDS,
-        DataFile.KEY_METADATA,
-        DataFile.SPLIT_OFFSETS,
-        DataFile.EQUALITY_IDS,
-        DataFile.SORT_ORDER_ID,
-        DataFile.FIRST_ROW_ID,
-        DataFile.REFERENCED_DATA_FILE,
-        DataFile.CONTENT_OFFSET,
-        DataFile.CONTENT_SIZE);
+    List<Types.NestedField> fields = Lists.newArrayList();

Review Comment:
   minor: might be worth using a List Builder instead of adding elements one by 
one and causing list size to grow/expand



##########
core/src/test/java/org/apache/iceberg/TestSnapshotProducer.java:
##########
@@ -228,6 +229,10 @@ public TableMetadata refresh() {
 
   @TestTemplate
   public void testDefaultManifestCompression() throws IOException {
+    assumeThat(formatVersion)
+        .as("V4 uses Parquet manifests by default; Avro codec checks do not 
apply")
+        .isLessThan(TableMetadata.MIN_FORMAT_VERSION_PARQUET_MANIFESTS);

Review Comment:
   nit: maybe let's rename the test method to something like 
`testDefaultAvroManifestCompression()` to make this clearer



##########
core/src/test/java/org/apache/iceberg/TestSnapshotProducer.java:
##########
@@ -236,6 +241,10 @@ public void testDefaultManifestCompression() throws 
IOException {
 
   @TestTemplate
   public void testManifestCompressionFromTableProperty() throws IOException {
+    assumeThat(formatVersion)
+        .as("V4 uses Parquet manifests by default; Avro codec checks do not 
apply")
+        .isLessThan(TableMetadata.MIN_FORMAT_VERSION_PARQUET_MANIFESTS);

Review Comment:
   same as above



##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java:
##########
@@ -847,7 +849,8 @@ protected List<E> newListData(List<E> reuse) {
       }
 
       if (reuse != null) {
-        this.lastList = reuse;
+        // reuse containers may come from a different reader (e.g. Avro) with 
incompatible types
+        this.lastList = reuse instanceof ArrayList ? reuse : null;

Review Comment:
   I think the check here and below is too specific. Maybe we should check for 
known immutable / unmodifiable collections instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core, Parquet: Allow for Writing Parquet/Avro Manifests in V4 [iceberg]

Reply via email to