[ 
https://issues.apache.org/jira/browse/LUCENE-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726773#comment-16726773
 ] 

Uwe Schindler commented on LUCENE-8618:
---------------------------------------

IMHO,
this 2 MB read ahead is operating system specific and may be changed based on 
the sysctl settings.

As there is the easy possible way to use FileSwitchDirectory for those specific 
update-use-cases, I don't think that's something we need to change in Lucene's 
lower level, that's more a operating system config. This was always an option 
to use FileSwitchDirectory, if you know the file extensions. Maybe we should 
have some getter on all codecs, so you can quickly get the file extensions uses 
by the codec as a Set, so you can easily build your fileswitch directory.

If we can access madvise/fadvise at some point, we can use IOContext for this, 
but that's not possible yet.

> MMapDirectory's read ahead on random-access files might trash the OS cache
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-8618
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8618
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> At Elastic we were reported a case which runs significantly slower with 
> MMapDirectory than with NIOFSDirectory. After a long analysis, we discovered 
> that it had to do with MMapDirectory's read ahead of 2MB, which doesn't help 
> and even trashes the OS cache on stored fields and term vectors files which 
> have a fully random access pattern (except at merge time).
> The particular use-case that exhibits the slow-down is performing updates, 
> ie. we first look up a document based on its id, fetch stored fields, compute 
> new stored fields (eg. after adding or changing the value of a field) and add 
> the document back to the index. We were able to reproduce the workload that 
> this Elasticsearch user described and measured a median throughput of 3600 
> updates/s with MMapDirectory and 5000 updates/s with NIOFSDirectory. It even 
> goes up to 5600 updates/s if you configure a FileSwitchDirectory to use 
> MMapDirectory for the terms dictionary and NIOFSDirectory for stored fields 
> (postings files are not relevant here since postings are inlined in the terms 
> dict when docFreq=1 and indexOptions=DOCS).
> While it is possible to work around this issue on top of Lucene, maybe this 
> is something that we could improve directly in Lucene, eg. by propagating 
> information about the expected access pattern and avoiding mmap on files that 
> have a fully random access pattern (until Java exposes madvise in some way)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to