Re: [I] Scan Iceberg table sorted on partition key without sort order [iceberg-python]

via GitHub Wed, 31 Jul 2024 12:59:43 -0700


kevinjqliu commented on issue #966:
URL: https://github.com/apache/iceberg-python/issues/966#issuecomment-2261342573


   Taking a stab at this,
   
   > support reading a PyArrow Batch Reader
   
   Looking at the code for [`to_arrow_batch_reader` 
](https://github.com/apache/iceberg-python/blob/3809708074480a0a5d3a02738a76aafe2b3e3eb5/pyiceberg/table/__init__.py#L2025)
   
   > fetches batches according to the partition key
   
   I believe this can be done in the 
[`plan_files`](https://github.com/apache/iceberg-python/blob/3809708074480a0a5d3a02738a76aafe2b3e3eb5/pyiceberg/table/__init__.py#L1941)
 function by specifying a partition field as a row_filter in scan
   
   > sorts those batches in-memory by another table column provided to the 
client
   
   It's possible to sort the batches in-memory. However, I think all the data 
needs to be read into memory in order to perform a sort based on another table 
column. 
   Sort based on partition field can be done without reading all the data into 
memory since we can just work off the table metadata. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Scan Iceberg table sorted on partition key without sort order [iceberg-python]

Reply via email to