niyue commented on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-958612133
> Thanks for the PR. > I'm not sure if exposing madvise options is beneficial. If we expose `RANDOM`, then why not `SEQUENTIAL`? > I prefer don't expose these access pattern related options, just let OS do the job. > `WILLNEED` is a useful option IMO, I suppose OS won't invent a reading without hints. I think probably someone else may need `SEQUENTIAL` as well. In my test, if the access pattern is random access (binary searching an array in a memory mapped arrow IPC file in my case), I find OS (Linux) will do read ahead, and lots of IO are wasted (90% in my test), page cache is full of never used data as well. I wrote a program to access an array in a mmap record batch with binary exponential indexes (`arr[1], arr[2], arr[4], arr[8], ...`) to visualize what I found, before applying the random advice, the page cache for this file looks like this:  After applying the random advice, the page cache for this file looks like this:  This probably not a problem for a fast storage (SSD), but the IO will become bottleneck if the storage bandwidth is limited. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
