ndrluis commented on issue #2372:
URL: 
https://github.com/apache/iceberg-python/issues/2372#issuecomment-3226679593

   @Fokko @kevinjqliu I started working on a fix, and while preparing to write 
a test, I noticed we already have a 
[test](https://github.com/apache/iceberg-python/blob/06b9467f1db2b71c51d4c1be5f3a5144afa52880/tests/integration/test_reads.py#L597)
 that should cover this scenario. Through my testing, I found that the issue 
isn't with reading but with writing. Parquet files written by PySpark 
(parquet-mr version 1.15.2) work fine, but those written by PyIceberg don't. I 
tried updating PyArrow to the latest version thinking it might be a missing 
logical type issue, but that didn't fix it. There seems to be something 
different about how parquet-mr handles UUID writing.
   
   Given this, I think we should revert to `fixed[16]`, though we'll likely run 
into the same problem @Fokko fixed in #2007. If I'm understanding correctly, 
that was a logical type issue resolved in PyArrow 21.0.0. So reverting to 
`fixed[16]`and updating to PyArrow 21.0.0 should solve this for now.
   
   The catch is that Bodo has a [hard 
requirement](https://github.com/bodo-ai/Bodo/blob/e7a04824431e0b8bcb37748fbe98a1e48e7ddb83/pyproject.toml#L9)
 for PyArrow 19.0. I think this deserves its own discussion, but I'd love to 
hear your thoughts. Maybe we should start a thread on the mailing list about 
these kinds of dependency constraints. While I get that having `to_bodo()` just 
work out of the box is nice for users, I feel like Bodo should be the one 
supporting Iceberg, not PyIceberg bending over backwards to support Bodo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to