ypsah opened a new issue, #1937:
URL: https://github.com/apache/iceberg-python/issues/1937
### Apache Iceberg version
0.9.0 (latest release)
### Please describe the bug 🐞
Hi, thanks for writing `pyiceberg`.
The bug is pretty described in the title: `table.scan(row_filter="x IN (0,
1)")` does not include the values for which `x=0` when `x` is a `DoubleType`
and a partition column.
Here is a reproducer:
```bash
pip install pyiceberg[sql-sqlite,pyarrow]
```
```python
from pathlib import Path
from tempfile import TemporaryDirectory
import pyarrow
from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.schema import Schema
from pyiceberg.transforms import IdentityTransform
from pyiceberg.types import DoubleType, NestedField
from pyiceberg.partitioning import PartitionSpec, PartitionField
schema = Schema(
NestedField(field_id=1, name="x", field_type=DoubleType()),
NestedField(field_id=2, name="y", field_type=DoubleType()),
)
partition_spec = PartitionSpec(PartitionField(source_id=1, field_id=1001,
transform=IdentityTransform(), name="x"))
with TemporaryDirectory() as tmpdir:
catalog = SqlCatalog(
"local",
uri=f"sqlite:///{tmpdir}/catalog.db",
warehouse=f"file://{tmpdir}/warehouse",
)
catalog.create_namespace("test")
table = catalog.create_table(
"test.test", schema=schema, partition_spec=partition_spec
)
data = pyarrow.table(
{
"x": [0.0, 1.0, 2.0],
"y": [0.0, 0.0, 0.0],
}
)
table.overwrite(data)
print("=== no filter ===")
print(table.scan().to_arrow())
print("=== x IN (0) ===")
print(table.scan(row_filter="x IN (0)").to_arrow())
print("=== x IN (0, 1, 2) ===")
print(table.scan(row_filter="x IN (0, 1, 2)").to_arrow())
```
Output:
```
/tmp/tmp.l2MLQFjC7C-05duO9h5/lib/python3.13/site-packages/pyiceberg/table/__init__.py:686:
UserWarning: Delete operation did not match any records
warnings.warn("Delete operation did not match any records")
=== no filter ===
pyarrow.Table
x: double
y: double
----
x: [[0],[1],[2]]
y: [[0],[0],[0]]
=== x IN (0) ===
pyarrow.Table
x: double
y: double
----
x: [[0]]
y: [[0]]
=== x IN (0, 1, 2) ===
pyarrow.Table
x: double
y: double
----
x: [[1],[2]]
y: [[0],[0]]
```
I expect output for `x in (0, 1, 2)` to match that of the `no filter` scan.
Note that I could not reproduce when `x` is a `LongType` instead of a
`DoubleType`.
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]