fengjiajie commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1361854341
##########
flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/data/FlinkParquetReaders.java:
##########
@@ -262,7 +262,11 @@ public ParquetValueReader<?> primitive(
switch (primitive.getPrimitiveTypeName()) {
case FIXED_LEN_BYTE_ARRAY:
case BINARY:
- return new ParquetValueReaders.ByteArrayReader(desc);
+ if (expected.typeId() == Types.StringType.get().typeId()) {
+ return new StringReader(desc);
+ } else {
+ return new ParquetValueReaders.ByteArrayReader(desc);
+ }
Review Comment:
> I am concerned about the backward compatibility of this change. Someone
might already depend on reading them as binary, and this change would break
their use-case
This modification is only applicable to cases where the iceberg definition
is 'string' and parquet column is 'binary'. Previously, such cases would
encounter the following exception (unit test can reproduce this exception):
```
java.lang.ClassCastException: [B cannot be cast to
org.apache.flink.table.data.StringData
at
org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169)
at
org.apache.iceberg.flink.data.TestFlinkParquetReader.testReadBinaryFieldAsString(TestFlinkParquetReader.java:137)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]