cgbur commented on issue #716:
URL: https://github.com/apache/iceberg-python/issues/716#issuecomment-2101251170
Ah, confusingly there appears to be writer differences that cause the issue.
My Rust pyarrow implementation matches when polars has `pyarrow=True`.
```python
import polars as pl
import pyarrow.parquet as pq
df = pl.DataFrame(
{
"a": [[{"a": 1}, {"a": 2}], [{"a": 3}]],
}
)
def print_schema_path(path, col_name):
metadata = pq.read_metadata(path)
for group_number in range(metadata.num_row_groups):
row_group = metadata.row_group(group_number)
for column_number in range(row_group.num_columns):
column = row_group.column(column_number)
if column.path_in_schema.startswith(col_name):
print(f"path_in_schema: {column.path_in_schema}")
df.write_parquet("example.parquet", use_pyarrow=False)
print("with polars")
print(pq.read_schema("example.parquet"))
print_schema_path("example.parquet", "a")
df.write_parquet("example.parquet", use_pyarrow=True)
print("with pyarrow")
print(pq.read_schema("example.parquet"))
print_schema_path("example.parquet", "a")
```
```
with polars
a: large_list<item: struct<a: int64>>
child 0, item: struct<a: int64>
child 0, a: int64
path_in_schema: a.list.item.a
with pyarrow
a: large_list<element: struct<a: int64>>
child 0, element: struct<a: int64>
child 0, a: int64
path_in_schema: a.list.element.a
```
Perhaps the visitor is not respecting the name used in the schema? Or there
is a mismatch in the method used to acquire between the iceberg and parquet
change?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]