nandorKollar commented on issue #14547: URL: https://github.com/apache/iceberg/issues/14547#issuecomment-4027202558
> Hi [@nandorKollar](https://github.com/nandorKollar) I was going through this today and I think it does throw the exception if schema has some types which are not supported https://github.com/apache/iceberg/blob/main/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/ArrowReader.java#L256 Let me know if I am missing something or how can I reproduce this. Thanks I think the problem is not there. If I recall correctly, then the problem is how Arrow reader interprets unsigned Parquet types. Let's say there's a Parquet file, which was not written via Iceberg, and uses unsigned 64 bit integers. I'm afraid that the vectorized reader will simply allocate a vector for a signed long type [here](https://github.com/apache/iceberg/blob/main/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L594), and possibly read incorrectly as a signed value from the Parquet file. Take [BaseParquetReaders](https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java#L197), which correctly throws an exception when encounters an unsigned int64. This is an edge case, fixing this is probably simple in the Arrow reader, what's more challenging is writing a test case for this scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
