Re: [PR] fix: Use binary(16) for UUID type to ensure Spark compatibility [iceberg-python]

via GitHub Mon, 05 Jan 2026 12:32:48 -0800


Fokko commented on code in PR #2881:
URL: https://github.com/apache/iceberg-python/pull/2881#discussion_r2662723462



##########
pyiceberg/io/pyarrow.py:
##########
@@ -789,7 +789,7 @@ def visit_string(self, _: StringType) -> pa.DataType:
         return pa.large_string()
 
     def visit_uuid(self, _: UUIDType) -> pa.DataType:
-        return pa.uuid()
+        return pa.binary(16)

Review Comment:
   Which comment? I would prefer to keep it UUID and fix this on the Java side.
   
   > Python and Rust Arrow implementations don't recognize Java's UUID metadata.
   
   Most implementations don't really look at the Arrow/Parquet/etc logical 
annotations, so both `uuid` (which is a `fixed[16]` with an UUID label on it) 
and cast it to a type that's compatible with the query engine. Spark has shown 
to be problematic because it doesn't have a native UUID type, but it handles it 
internally as a string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Use binary(16) for UUID type to ensure Spark compatibility [iceberg-python]

Reply via email to