kevinjqliu commented on issue #966: URL: https://github.com/apache/iceberg-python/issues/966#issuecomment-2261342573
Taking a stab at this, > support reading a PyArrow Batch Reader Looking at the code for [`to_arrow_batch_reader` ](https://github.com/apache/iceberg-python/blob/3809708074480a0a5d3a02738a76aafe2b3e3eb5/pyiceberg/table/__init__.py#L2025) > fetches batches according to the partition key I believe this can be done in the [`plan_files`](https://github.com/apache/iceberg-python/blob/3809708074480a0a5d3a02738a76aafe2b3e3eb5/pyiceberg/table/__init__.py#L1941) function by specifying a partition field as a row_filter in scan > sorts those batches in-memory by another table column provided to the client It's possible to sort the batches in-memory. However, I think all the data needs to be read into memory in order to perform a sort based on another table column. Sort based on partition field can be done without reading all the data into memory since we can just work off the table metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
