HonahX commented on code in PR #12177:
URL: https://github.com/apache/iceberg/pull/12177#discussion_r1944246239


##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java:
##########
@@ -386,6 +390,20 @@ public void write(int repetitionLevel, UUID value) {
     }
   }
 
+  private static class UnknownWriter extends PrimitiveWriter<Object> {
+    private UnknownWriter(ColumnDescriptor desc) {
+      super(desc);
+    }
+
+    @Override
+    public void write(int repetitionLevel, Object value) {}
+
+    @Override
+    public List<TripleWriter<?>> columns() {
+      return ImmutableList.of();
+    }
+  }

Review Comment:
   I feel that my mind gets stuck in the following dilemma:. Suppose the schema 
is (int, string, unknown, string); the record received by the writer is (1, 
"test1", null, "test2"). We must skip the null at index 2 when writing to 
Parquet. 
   
https://github.com/apache/iceberg/blob/998102546d5bef26402b068649f0e8490b9f6495/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java#L644-L650
   That leaves us with two options:
   
   1. Use a dummy writer (`UnknownWriter` here) for the unknown column so that 
nothing is written for it, while keeping the one-to-one correspondence between 
the record’s fields and the writer array.
   
   2. Change the `StructWriter`/`RecordWriter` to skip over the unknown 
column’s value. This would require maintaining additional mapping information 
between the record's full schema and the list of writers, which would involve a 
more extensive refactoring.
   
   Given the potential impact of refactoring the StructWriter, I chose option 
1—the dummy writer—as it maintains the field ordering without introducing 
significant changes. Do I miss anything? If so, I'd greatly appreciate any 
suggestions. Thanks in advance!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to