HonahX commented on code in PR #12177:
URL: https://github.com/apache/iceberg/pull/12177#discussion_r1944246239
##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java:
##########
@@ -386,6 +390,20 @@ public void write(int repetitionLevel, UUID value) {
}
}
+ private static class UnknownWriter extends PrimitiveWriter<Object> {
+ private UnknownWriter(ColumnDescriptor desc) {
+ super(desc);
+ }
+
+ @Override
+ public void write(int repetitionLevel, Object value) {}
+
+ @Override
+ public List<TripleWriter<?>> columns() {
+ return ImmutableList.of();
+ }
+ }
Review Comment:
I feel that my mind gets stuck in the following dilemma:. Suppose the schema
is (int, string, unknown, string); the record received by the writer is (1,
"test1", null, "test2"). We must skip the null at index 2 when writing to
Parquet.
https://github.com/apache/iceberg/blob/998102546d5bef26402b068649f0e8490b9f6495/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java#L644-L650
That leaves us with two options:
1. Use a dummy writer (`UnknownWriter` here) for the unknown column so that
nothing is written for it, while keeping the one-to-one correspondence between
the record’s fields and the writer array.
2. Change the `StructWriter`/`RecordWriter` to skip over the unknown
column’s value. This would require maintaining additional mapping information
between the record's full schema and the list of writers, which would involve a
more extensive refactoring.
Given the potential impact of refactoring the StructWriter, I chose option
1—the dummy writer—as it maintains the field ordering without introducing
significant changes. Do I miss anything? If so, I'd greatly appreciate any
suggestions. Thanks in advance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]