[GitHub] [iceberg] Fokko commented on issue #7067: Polars Based Compute Engine

via GitHub Wed, 05 Apr 2023 01:51:43 -0700


Fokko commented on issue #7067:
URL: https://github.com/apache/iceberg/issues/7067#issuecomment-1497141806


   Thanks @chitralverma for chiming in here.
   
   > I was looking to do this integration over the weekend. It will be a quick 
addition because py-iceberg already allows a table to be converted to a pyarrow 
table which can be fed to Polars' eager read API. No need to rely on to_pandas 
which may incur additional overhead.
   
   That sounds like a great first step. The important part is that we push down 
the predicate from Polars into PyIceberg. Iceberg is designed to work with 
large tables, and not being able to prune files would result in very poor 
performance.
   
   > However, it would be great to support the lazy scan API as well, because 
most internal optimisation take place over there.
   
   I fully agree. I think that would be a great second step, but would probably 
be a bit more complex. We don't integrate in the way with arrow that would be 
ideal, but we're working on this (probably would take some time). This would 
require when an action is being done on a dataset, it would need to call 
pyiceberg to do the planning (and do all the Iceberg optimizations).
   
   I'm happy to help, but I'm less familiar with Polars, so it would be awesome 
if you could work on the integration on that side 🚀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Fokko commented on issue #7067: Polars Based Compute Engine

Reply via email to