nicor88 opened a new issue, #6647:
URL: https://github.com/apache/iceberg/issues/6647
### Apache Iceberg version
None
### Query engine
None
### Please describe the bug 🐞
I'm trying to read an iceberg table written by Athena (engine v3), not sure
which iceberg version it uses.
When running this code:
```
from pyiceberg import catalog
from pyiceberg.expressions import GreaterThanOrEqual
glue_catalog = catalog.load_glue(name='glue', conf={})
glue_catalog.list_namespaces()
glue_catalog.list_tables('silver_marketing')
table = glue_catalog.load_table("silver_marketing.performance_kpis")
scan = table.scan(
row_filter=GreaterThanOrEqual("report_date", "2023-01-01")
)
files = [task.file.file_path for task in scan.plan_files()]
print(files)
df_iceberg = scan.to_pandas()
print(len(df_iceberg))
```
If fails on the `df_iceberg = scan.to_pandas()` (I tried also with
`scan.to_arrow()`.
The error is the following:
```
Traceback (most recent call last):
File "/Users/nicor88/deng-swiss-knife/icerberg/get_data.py", line 31, in
<module>
df_iceberg = scan.to_arrow()
File
"/Users/nicor88/deng-swiss-knife/venv/lib/python3.9/site-packages/pyiceberg/table/__init__.py",
line 341, in to_arrow
return project_table(
File
"/Users/nicor88/deng-swiss-knife/venv/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py",
line 508, in project_table
schema_raw = parquet_schema.metadata.get(ICEBERG_SCHEMA)
AttributeError: 'NoneType' object has no attribute 'get'
```
an example table can be created like that:
```
create table
data_engineering.iceberg_example_1
with (
table_type='iceberg',
is_external=false,
location='s3://xxxx/iceberg_1',
partitioning=ARRAY['creation_date', 'bucket(user_id, 5)'],
format='parquet',
vacuum_max_snapshot_age_seconds=86400,
optimize_rewrite_delete_file_threshold=2
)
as
with data as (
select
1 as user_id,
'pi' as user_name,
'active' as status,
17.89 as cost,
1 as quantity,
100000000 as quantity_big,
cast(cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as
date) as creation_date,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as
created_at,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as updated_at
union all
select
2 as user_id,
'beta' as user_name,
'inactive' as status,
3 as cost,
5 as quantity,
100000000 as quantity_big,
cast(cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as
date) as creation_date,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as
created_at,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as updated_at
)
select
user_id,
user_name,
status,
cost,
quantity,
quantity_big,
creation_date,
created_at,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as inserted_at
from data
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]