pitrou commented on pull request #11588:
URL: https://github.com/apache/arrow/pull/11588#issuecomment-960614235


   I'd like to reboot the discussion and stop discussing flag combinations 
without regard for the original issue.
   
   Here is the complaint:
   
   > In my test, if the access pattern is random access (binary searching an 
array in a memory mapped arrow IPC file in my case), I find OS (Linux) will 
prefetch data, and lots of IO are wasted (90% in my test), page cache is full 
of never used data as well.
   
   So, to sum it up:
   * the Arrow IPC layer issues `madvise` calls for record batches that are 
read by the user, so that the OS prefetches them in the background
   * here, the user doesn't _want_ the record batches to be prefetched, because 
they are only doing very sparse reads and ignoring most of the remaining data
   
   @niyue Am I right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to