[GitHub] [arrow] westonpace commented on pull request #13799: ARROW-17299: [C++][Python] Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters

GitBox Tue, 09 Aug 2022 09:13:29 -0700


westonpace commented on PR #13799:
URL: https://github.com/apache/arrow/pull/13799#issuecomment-1209587810


   Yes, scanner builder is on its way out, I hope, as part of #13782 (well, 
probably a follow-up).  At the moment it still serves a slight purpose in that 
the projection option is a little hard to specify and it is something of a 
thorn when it comes to augmented fields.
   
   I also agree with your other point.  We spent considerable effort at one 
point making various things look like a dataset because datasets were the 
primary interface to the compute engine (e.g. filtering & projection).  The 
record batch reader example is a good example.  I'd even go so far as to say 
the InMemoryDataset is probably superfluous and a better option in the future 
would be a "table_source" node.  The scanner should be reserved for the case 
where you have multiple sources of data, with the same (or devolved versions of 
the same) schema.
   
   All that being said, I don't think readahead is going away.  However, in the 
near future (again, #13782) I was pondering if we should reframe readahead as 
"roughly how many bytes of data should the scanner attempt to read ahead" 
instead of "batch readahead and fragment readahead".
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on pull request #13799: ARROW-17299: [C++][Python] Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters

Reply via email to