Hi Alexander,

I understand that NIOFSDirectory also uses the FS cache, but doesn't MMapDirectory tend to fill up the cache with unnecessary data for random access pattern due to sequential read-ahead? Our concern is that it can potentially lead to evicting hot pages used by another process on the same host, affecting its performance.
no this is not the case (at least not on Linux or Solaris). It is no difference between read() and a pagefault by mmap. It will read the same pages and put them into cache. It won't read more pages for mmap. What gets read depends on fadvice or madvise, which Lucene does not change (the OS decides).
As far as I can Elasticsearch also avoids using MMap for everything by default, e.g stored fields and term vectors are not MMAPed.

The Elasticsearch reason is different: It does it because of the limited number of mappings available by current kernels. Elasticsearch clusters tend to have many indexes and to avoid too many mappings they do this. It has nothing to do with caching.

stored fields and term vectors are valid candidates to not mmapping them if you have pressure on number of mappings. The access pattern is completely different. So what Elasticserach does is a valid thing to do. If you really want to spare mappings, use the stored fields / term vectors approach. But then you also need to disable CFS files which is contra-productive, as it raises the number of mappings and file handles.

Does it make sense or am I missing something? Is my understanding correct that it still makes sense to avoid MMAPing files with the random access pattern on the most recent Lucene and JVM versions?

Who said this? This is simply not true! Myths....

One last word: With the next Lucene version after Java 19 came out you will be able to work around the "too many mappings" problem for huge clouds of Elasticsearch clusters due to a new MMAP implementation choosen using MultiRelease lucene-core.jar file. This will allow them to mmap everything when Java 19+ is used (and the preview features of Java are enabled). This works by having huger blocks of virtual memory (currently limited to 1 Gigabyte per mapping) => https://github.com/apache/lucene/pull/912

Uwe


Thank you,
Alex


On Fri, Aug 19, 2022 at 2:42 AM Robert Muir <rcm...@gmail.com> wrote:

    On Thu, Aug 18, 2022 at 1:47 PM Alexander Lukyanchikov
    <alexanderlukyanchi...@gmail.com> wrote:

    >
    > Currently we are trying to avoid switching to MMAP because there
    is another process running on the same host and extensively
    utilizes the FS cache.
    >

    This makes no sense, NIOFSDirectory uses the FS cache the exact same
    way as mmap. it just uses read() interface instead.

    A self-created problem!

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
    For additional commands, e-mail: dev-h...@lucene.apache.org

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de

Reply via email to