westonpace opened a new pull request #11616: URL: https://github.com/apache/arrow/pull/11616
**This is still very much a WIP** This PR attempts to address several issues: * Memory mapped IPC reads always call WillNeed on the data and the user has no way to avoid this * Projection pushdown is only available in the synchronous API * Coalescing / readahead is only available via the generators API * There is a lot of duplicate code in the generators path It adds two new methods to RecordBatchFileReader: ``` /// \brief Begin loading metadata for the desired batches into memory. /// /// This method will also begin loading all dictionaries messages into memory. /// /// For a regular file this will immediately begin disk I/O in the background on a /// thread on the IOContext's thread pool. If the file is memory mapped this will /// ensure the memory needed for the metadata is paged from disk into memory /// /// \param indices Indices of the batches to prefetch /// If empty then all batches will be prefetched. virtual Status WillNeedMetadata(const std::vector<int>& indices) = 0; /// \brief Begin loading metadata for the desired batches into memory and indicate /// that the data itself should be prefetched when it is requested /// /// This method should not be called in combination with WillNeedMetadata. If you want /// to prefetch the data then use this method. If you do not want to prefetch the data /// (because you are only accessing a small # of items in the batch's arrays) then you /// should use WillNeedMetadata /// /// This method will immediately start the I/O for the metadata and dictionaries. /// /// This method will not immediately start the I/O for the data. The data I/O will be /// started when you call ReadRecordBatch. /// /// If you want to read multiple batches in parallel then you can make concurrent calls /// to ReadRecordBatch or ReadRecordBatchAsync /// \param indices /// \return virtual Status WillNeedBatches(const std::vector<int>& indices) = 0; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org