kevinjqliu commented on issue #966:
URL: https://github.com/apache/iceberg-python/issues/966#issuecomment-2261342573

   Taking a stab at this,
   
   > support reading a PyArrow Batch Reader
   
   Looking at the code for [`to_arrow_batch_reader` 
](https://github.com/apache/iceberg-python/blob/3809708074480a0a5d3a02738a76aafe2b3e3eb5/pyiceberg/table/__init__.py#L2025)
   
   > fetches batches according to the partition key
   
   I believe this can be done in the 
[`plan_files`](https://github.com/apache/iceberg-python/blob/3809708074480a0a5d3a02738a76aafe2b3e3eb5/pyiceberg/table/__init__.py#L1941)
 function by specifying a partition field as a row_filter in scan
   
   > sorts those batches in-memory by another table column provided to the 
client
   
   It's possible to sort the batches in-memory. However, I think all the data 
needs to be read into memory in order to perform a sort based on another table 
column. 
   Sort based on partition field can be done without reading all the data into 
memory since we can just work off the table metadata. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to