fengjiajie commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1771924888
> how are we guaranteed that the binary is parsable as UTF8 bytes? @RussellSpitzer Thank you for participating in the review. If a column is not encoded in UTF-8, it should not be defined as a string type in the iceberg metadata. The data reading type should be determined based on the column type definition in the iceberg metadata, rather than the column type definition in the parquet file. An imperfect analogy would be reading a CSV file where the column type is determined by the table's structural metadata during reading, rather than the type defined in the CSV file itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
