Re: [PR] PyArrow: Avoid buffer-overflow by avoid doing a sort [iceberg-python]

via GitHub Wed, 22 Jan 2025 12:11:25 -0800


Fokko commented on code in PR #1555:
URL: https://github.com/apache/iceberg-python/pull/1555#discussion_r1925902812



##########
pyiceberg/partitioning.py:
##########
@@ -413,8 +413,10 @@ def partition_record_value(partition_field: 
PartitionField, value: Any, schema:
     the final partition record value.
     """
     iceberg_type = 
schema.find_field(name_or_id=partition_field.source_id).field_type
-    iceberg_typed_value = _to_partition_representation(iceberg_type, value)
-    transformed_value = 
partition_field.transform.transform(iceberg_type)(iceberg_typed_value)
+    if not isinstance(value, int):
+        # When adding files, it can be that we still need to convert from 
logical types to physical types
+        value = _to_partition_representation(iceberg_type, value)
+    transformed_value = 
partition_field.transform.transform(iceberg_type)(value)

Review Comment:
   Yes, so I got to the bottom of it. It has to do with the return types of the 
transforms. eg. When we apply the `bucket` transform, the result is always an 
int, which is great. The problem is with the identity transform where the 
destination type is equal to the source type. So when a date comes in, it also 
comes out.
   
   I think in the end it is better to remove the `_to_partition_representation` 
and see if we can consolidate this somewhere, but that's a different PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] PyArrow: Avoid buffer-overflow by avoid doing a sort [iceberg-python]

Reply via email to