dingo4dev opened a new issue, #2002:
URL: https://github.com/apache/iceberg-python/issues/2002
### Apache Iceberg version
0.9.0 (latest release)
### Please describe the bug 🐞
## Description
When using UUIDType as a BucketTransform Partition, an error occurs during
table operations such as upsert. The issue appears to be related to the
partition key changing from int to str, which causes a type mismatch when the
Avro encoder attempts to write an integer.
## Steps to Reproduce
1. Create a table with UUIDType column
2. Configure the table to use BucketTransform on that column for partitioning
3. Attempt to upsert data into the table
## Current Behavior
The operation fails with a TypeError as the system attempts to perform
integer operations on a string value.
## Expected Behavior
The operation should properly handle UUIDType columns when used with
BucketTransform partitioning. The uuid bucket partition value should be `1`
instead of `"1"`
## Error Stack Trace
```python
Traceback (most recent call last):
File "test_upsert.py", line 248, in <module>
result = table.upsert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\pyiceberg\table\__init__.py", line 1216,
in upsert
tx.append(rows_to_insert)
File ".venv\Lib\site-packages\pyiceberg\table\__init__.py", line 470, in
append
with self._append_snapshot_producer(snapshot_properties) as
append_files:
File ".venv\Lib\site-packages\pyiceberg\table\update\__init__.py", line
71, in __exit__
self.commit()
File ".venv\Lib\site-packages\pyiceberg\table\update\__init__.py", line
67, in commit
self._transaction._apply(*self._commit())
^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\pyiceberg\table\update\snapshot.py", line
242, in _commit
new_manifests = self._manifests()
^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\pyiceberg\table\update\snapshot.py", line
201, in _manifests
return self._process_manifests(added_manifests.result() +
delete_manifests.result() + existing_manifests.result())
^^^^^^^^^^^^^^^^^^^^^^^^
File "~\Python312\Lib\concurrent\futures\_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "~\Python312\Lib\concurrent\futures\_base.py", line 401, in
__get_result
raise self._exception
File "~\Python312\Lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv\Lib\site-packages\pyiceberg\table\update\snapshot.py", line
159, in _write_added_manifest
writer.add(
File ".venv\Lib\site-packages\pyiceberg\manifest.py", line 847, in add
self.add_entry(self._reused_entry_wrapper._wrap_append(self._snapshot_id, None,
entry.data_file))
File ".venv\Lib\site-packages\pyiceberg\manifest.py", line 840, in
add_entry
self._writer.write_block([self.prepare_entry(entry)])
File ".venv\Lib\site-packages\pyiceberg\avro\file.py", line 281, in
write_block
self.writer.write(block_content_encoder, obj)
writer.write(encoder, val[pos] if pos is not None else None)
File ".venv\Lib\site-packages\pyiceberg\avro\writer.py", line 176, in
write
writer.write(encoder, val[pos] if pos is not None else None)
writer.write(encoder, val[pos] if pos is not None else None)
File ".venv\Lib\site-packages\pyiceberg\avro\writer.py", line 176, in
write
writer.write(encoder, val[pos] if pos is not None else None)
File ".venv\Lib\site-packages\pyiceberg\avro\writer.py", line 66, in
write
encoder.write_int(val)
File ".venv\Lib\site-packages\pyiceberg\avro\encoder.py", line 45, in
write_int
datum = (integer << 1) ^ (integer >> 63)
```
## Potential Fix
The issue appears to be in the type handling in `partition_record_value`
function when initial `PartitionKey` with the `PartitionFieldValue`.
https://github.com/apache/iceberg-python/blob/996a7ba4dbf4afdb3d46689f1715206b1c355f2a/pyiceberg/partitioning.py#L385-L406
Would add Union type for `value` to handle the **transformed** value.
https://github.com/apache/iceberg-python/blob/996a7ba4dbf4afdb3d46689f1715206b1c355f2a/pyiceberg/partitioning.py#L469-L471
### Willingness to contribute
- [x] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]