dinoocch commented on PR #13721: URL: https://github.com/apache/pinot/pull/13721#issuecomment-2261087393
> But we can't change the default without a wide spectrum of consequences and I'd discourage that. Though it's obviously good to have this feature and make it configurable. Good point. Let's keep the current behavior for now to avoid surprises. > But they dropped this in Lucene 9 (AFAICT) and they are now using MemorySegment. Seems like they wrote some blog posts regarding this: https://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html https://www.elastic.co/search-labs/blog/lucene-and-java-moving-forward-together The new version of lucene is using the panama apis, which offer a lot of interesting potential once support for java < 21 is dropped -- https://github.com/apache/lucene/blob/main/lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java > Logically, a reasonably high read ahead should be quite useful in most cases. From my limited understanding read ahead is extremely useful in systems which benefit from large read operations (for example nfs) or more practically managed disks like those in [azure](https://learn.microsoft.com/en-us/azure/virtual-machines/disks-types) where the max throughput can only be achieved properly by batching io operations into a single read. In an ideal world, there's a lot we could potentially do (and also a lot of limitations currently imposed by the page-cache and mmap on us), some examples: * Smartly madvise buffers based on their size -- "medium" sized indexes which use binary search might benefit from NORMAL, while very large or small such indexes likely would prefer RANDOM (I would guess) * I think there's some potential for WILLNEED to be useful to start async-reads of pages to preemptively reduce the chances of page fault I am honestly a bit more interested in if we would benefit more from direct io and managing the cache internally though... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org