pitrou commented on code in PR #13799:
URL: https://github.com/apache/arrow/pull/13799#discussion_r958493238
##########
cpp/src/arrow/dataset/scanner.h:
##########
@@ -384,6 +381,24 @@ class ARROW_DS_EXPORT ScannerBuilder {
/// This option provides a control limiting the memory owned by any
RecordBatch.
Status BatchSize(int64_t batch_size);
+ /// \brief Set the number of batches to read ahead within a fragment.
+ ///
+ /// \param[in] batch_readahead How many batches to read ahead within a
fragment
+ /// \returns an error if this number is less than 0.
+ ///
+ /// This option provides a control on the RAM vs I/O tradeoff.
+ /// It might not be supported by all file formats, in which case it will
+ /// simply be ignored.
+ Status BatchReadahead(int32_t batch_readahead);
+
+ /// \brief Set the number of fragments to read ahead
+ ///
+ /// \param[in] fragment_readahead How many fragments to read ahead
+ /// \returns an error if this number is less than 0.
+ ///
+ /// This option provides a control on the RAM vs IO tradeoff.
Review Comment:
Nit :-)
```suggestion
/// This option provides a control on the RAM vs I/O tradeoff.
```
##########
python/pyarrow/_dataset.pyx:
##########
@@ -2254,6 +2259,12 @@ cdef class Scanner(_Weakrefable):
The maximum row count for scanned record batches. If scanned
record batches are overflowing memory then this method can be
called to reduce their size.
+ batch_readahead : int, default 16
+ The number of batches to read ahead in a file. Increasing this number
+ will increase RAM usage but could also improve IO utilization.
+ fragment_readahead : int, default 4
+ The number of files to read ahead. Increasing this number will increase
+ RAM usage but could also improve IO utilization.
Review Comment:
Could you add a simple Python test passing these parameters?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]