nandorKollar commented on issue #14547:
URL: https://github.com/apache/iceberg/issues/14547#issuecomment-4027202558

   > Hi [@nandorKollar](https://github.com/nandorKollar) I was going through 
this today and I think it does throw the exception if schema has some types 
which are not supported 
https://github.com/apache/iceberg/blob/main/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/ArrowReader.java#L256
 Let me know if I am missing something or how can I reproduce this. Thanks
   
   I think the problem is not there. If I recall correctly, then the problem is 
how Arrow reader interprets unsigned Parquet types. Let's say there's a Parquet 
file, which was not written via Iceberg, and uses unsigned 64 bit integers. I'm 
afraid that the vectorized reader will simply allocate a vector for a signed 
long type 
[here](https://github.com/apache/iceberg/blob/main/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedArrowReader.java#L594),
 and possibly read incorrectly as a signed value from the Parquet file. Take 
[BaseParquetReaders](https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java#L197),
 which correctly throws an exception when encounters an unsigned int64.
   
   This is an edge case, fixing this is probably simple in the Arrow reader, 
what's more challenging is writing a test case for this scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to