srilman opened a new issue, #1778:
URL: https://github.com/apache/iceberg-python/issues/1778
### Apache Iceberg version
0.9.0 (latest release)
### Please describe the bug 🐞
When attempting to apply filter to top-level struct columns, such as null /
not-null, an error occurs. For example:
```py
from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, StructType, IntegerType, StringType
import pyiceberg.expressions as pe
import pyarrow as pa
catalog = SqlCatalog("sql_catalog", uri="sqlite:///:memory:")
catalog.create_namespace("ns")
schema = Schema(
NestedField(1, "structs", StructType(
NestedField(2, "id", IntegerType(), required=True),
NestedField(3, "name", StringType(), required=True),
)),
)
table = catalog.create_table("ns.struct_table", schema,
"/tmp/wh/ns/struct_table")
df = pa.Table.from_pydict({
"structs": [
{"id": 1, "name": "a"},
{"id": 2, "name": "b"},
{"id": 3, "name": "c"},
]
}, schema=schema.as_arrow())
table.append(df)
print(list(table.scan(row_filter=pe.NotNull("structs")).plan_files()))
```
```
Traceback (most recent call last):
File "/Users/slade/bodo/mono/develop/test.py", line 27, in <module>
print(list(table.scan(row_filter=pe.NotNull("structs")).plan_files()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/table/__init__.py",
line 1697, in plan_files
if manifest_evaluators[manifest_file.partition_spec_id](manifest_file)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
File
"/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/expressions/__init__.py",
line 201, in bind
accessor = schema.accessor_for_field(field.field_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/slade/bodo/mono/develop/.pixi/envs/default/lib/python3.12/site-packages/pyiceberg/schema.py",
line 280, in accessor_for_field
raise ValueError(f"Could not find accessor for field with id:
{field_id}")
ValueError: Could not find accessor for field with id: 1
```
It looks like the cause is an intention feature of the field to accessor map
for schemas. See the docstring of class `_BuildPositionAccessors`:
```py
class _BuildPositionAccessors(SchemaVisitor[Dict[Position, Accessor]]):
"""A schema visitor for generating a field ID to accessor index.
Example:
>>> from pyiceberg.schema import Schema
>>> from pyiceberg.types import *
>>> schema = Schema(
... NestedField(field_id=2, name="id", field_type=IntegerType(),
required=False),
... NestedField(field_id=1, name="data",
field_type=StringType(), required=True),
... NestedField(
... field_id=3,
... name="location",
... field_type=StructType(
... NestedField(field_id=5, name="latitude",
field_type=FloatType(), required=False),
... NestedField(field_id=6, name="longitude",
field_type=FloatType(), required=False),
... ),
... required=True,
... ),
... schema_id=1,
... identifier_field_ids=[1],
... )
>>> result = build_position_accessors(schema)
>>> expected = {
... 2: Accessor(position=0, inner=None),
... 1: Accessor(position=1, inner=None),
... 5: Accessor(position=2, inner=Accessor(position=0,
inner=None)),
... 6: Accessor(position=2, inner=Accessor(position=1,
inner=None))
... }
>>> result == expected
True
"""
```
But I'm not exactly sure why. Looking at all uses, I don't see a reason why
the id_to_accessor map shouldn't include top-level structs. Is there a reason
why, or is this just a bug? If its just a bug, I think this is a 1-2 line fix
in `_BuildPositionAccessors`.
### Willingness to contribute
- [x] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]