westonpace commented on PR #13799: URL: https://github.com/apache/arrow/pull/13799#issuecomment-1209587810
Yes, scanner builder is on its way out, I hope, as part of #13782 (well, probably a follow-up). At the moment it still serves a slight purpose in that the projection option is a little hard to specify and it is something of a thorn when it comes to augmented fields. I also agree with your other point. We spent considerable effort at one point making various things look like a dataset because datasets were the primary interface to the compute engine (e.g. filtering & projection). The record batch reader example is a good example. I'd even go so far as to say the InMemoryDataset is probably superfluous and a better option in the future would be a "table_source" node. The scanner should be reserved for the case where you have multiple sources of data, with the same (or devolved versions of the same) schema. All that being said, I don't think readahead is going away. However, in the near future (again, #13782) I was pondering if we should reframe readahead as "roughly how many bytes of data should the scanner attempt to read ahead" instead of "batch readahead and fragment readahead". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
