Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

via GitHub Tue, 17 Oct 2023 03:04:40 -0700


fengjiajie commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1361854341



##########
flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/data/FlinkParquetReaders.java:
##########
@@ -262,7 +262,11 @@ public ParquetValueReader<?> primitive(
       switch (primitive.getPrimitiveTypeName()) {
         case FIXED_LEN_BYTE_ARRAY:
         case BINARY:
-          return new ParquetValueReaders.ByteArrayReader(desc);
+          if (expected.typeId() == Types.StringType.get().typeId()) {
+            return new StringReader(desc);
+          } else {
+            return new ParquetValueReaders.ByteArrayReader(desc);
+          }

Review Comment:
   > I am concerned about the backward compatibility of this change. Someone 
might already depend on reading them as binary, and this change would break 
their use-case
   
   This modification is only applicable to cases where the iceberg definition 
is 'string' and parquet column is 'binary'. Previously, such cases would 
encounter the following exception (unit test can reproduce this exception):
   
   ```
   java.lang.ClassCastException: [B cannot be cast to 
org.apache.flink.table.data.StringData
        at 
org.apache.flink.table.data.GenericRowData.getString(GenericRowData.java:169)
        at 
org.apache.iceberg.flink.data.TestFlinkParquetReader.testReadBinaryFieldAsString(TestFlinkParquetReader.java:137)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

Reply via email to