westonpace commented on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-961598217
So I think the work I'm doing on ARROW-14577 / #11616 will help here. I am adding two new methods to RecordBatchFileReader, `WillNeedMetadata(vector<int> indices)` and `WillNeedBatches(vector<int> indices)`. The first tells the reader to preload the metadata but not preload the data blocks. The second tells the reader to preload the metadata and the data blocks. Once those are in we can remove the automatic WILLNEED advice from MemoryMappedFile. Instead `WillNeedBatches` will call WILLNEED on the data ranges before loading them and `WillNeedMetadata` will not do this (nor will the default). So if you want to randomly hop through a file (e.g. for a binary search) simply use `WillNeedMetadata`. If you want to efficiently load entire blocks of data then call `WillNeedBatches`. I'm still a few days away from finishing so any feedback on the approach now would be appreciated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org