westonpace commented on pull request #11588:
URL: https://github.com/apache/arrow/pull/11588#issuecomment-961598217


   So I think the work I'm doing on ARROW-14577 / #11616 will help here.  I am 
adding two new methods to RecordBatchFileReader, `WillNeedMetadata(vector<int> 
indices)` and `WillNeedBatches(vector<int> indices)`.  The first tells the 
reader to preload the metadata but not preload the data blocks.  The second 
tells the reader to preload the metadata and the data blocks.
   
   Once those are in we can remove the automatic WILLNEED advice from 
MemoryMappedFile.  Instead `WillNeedBatches` will call WILLNEED on the data 
ranges before loading them and `WillNeedMetadata` will not do this (nor will 
the default).
   
   So if you want to randomly hop through a file (e.g. for a binary search) 
simply use `WillNeedMetadata`.  If you want to efficiently load entire blocks 
of data then call `WillNeedBatches`.
   
   I'm still a few days away from finishing so any feedback on the approach now 
would be appreciated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to