jkleinkauff opened a new issue, #1032:
URL: https://github.com/apache/iceberg-python/issues/1032
### Question
Hey, thanks for this very convenient library.
This is not a bug, just want to better understand something.
I have a question regarding the performance - ie time to query the table (?)
- for such methods.
```python
if __name__ == "__main__":
catalog = SqlCatalog(
"default",
**{
"uri":
f"postgresql+psycopg2://postgres:Password1@localhost/postgres",
},
)
table = catalog.load_table("bronze.curitiba_starts_june")
df = table.scan(limit=100)
pa_table = df.to_arrow()
````
The code above will run ok. My question is regarding the last command,
to_arrow() transformation takes around 50s (+-) to execute. I believe this is
mostly because of the network itself?
The execution time will stay roughly the same with different row limit (10,
100, 1000).
Querying the same table in motherduck - using iceberg_scan - is faster:
<img width="836" alt="image"
src="https://github.com/user-attachments/assets/21a05d45-ebcd-4323-ba31-2689d2d12fe7">
When running the same query locally - without motherduck - the execution
time will be similar to what pyiceberg takes, actually it will be a little bit
slower. That's why I think this is mostly like a network "issue". Can you help
be understand what's happening? Thank you!
#### Table Data
The table has two parquet files (110mb, 127mb)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]