kevinjqliu commented on code in PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#discussion_r1716133239
##########
pyiceberg/io/pyarrow.py:
##########
@@ -1194,6 +1194,7 @@ def _task_to_record_batches(
case_sensitive: bool,
name_mapping: Optional[NameMapping] = None,
use_large_types: bool = True,
+ limit: Optional[int] = None,
Review Comment:
I like the #1043 way where `_task_to_record_batches` is kept as is; its just
generating more batches. The function which calls `_task_to_record_batches` can
keep track of the `limit`
##########
pyiceberg/io/pyarrow.py:
##########
@@ -1366,6 +1373,7 @@ def project_table(
case_sensitive,
table_metadata.name_mapping(),
use_large_types,
+ limit,
Review Comment:
I think this change is enough to "pushdown the limit". `_task_to_table` will
get batches from `_task_to_record_batches` and check the limit, returning a
pyarrow Table when the limit is met.
If the limit is not met, `_task_to_table` will just return all the batches
as a pyarrow Table
##########
pyiceberg/io/pyarrow.py:
##########
@@ -1265,6 +1270,7 @@ def _task_to_table(
case_sensitive: bool,
name_mapping: Optional[NameMapping] = None,
use_large_types: bool = True,
+ limit: Optional[int] = None,
) -> Optional[pa.Table]:
batches = list(
Review Comment:
similar to the comment above, i think we can process the `limit` here
instead of passing it down to `_task_to_record_batches`
##########
pyiceberg/io/pyarrow.py:
##########
@@ -1194,6 +1194,7 @@ def _task_to_record_batches(
case_sensitive: bool,
name_mapping: Optional[NameMapping] = None,
use_large_types: bool = True,
+ limit: Optional[int] = None,
Review Comment:
"technically" we can also push down the limit to the
`ds.Scanner.from_fragment`'s `batch_size` parameter, but that's currently out
of scope
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html#pyarrow.dataset.Scanner.from_fragment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]