Hi all,

when trying to identify bottlenecks in our application, I found that each search which involves multiple indexes is performing lots of mmap()/open() syscalls. This is a natural consequence of using MmapDirectory. So even if file system caches are properly warmed, this might add couple of seconds (depending on operating system or virtualization technology) to the request handling time, especially when the number of searched indexes is in hundreds (see https://github.com/OpenGrok/OpenGrok/issues/1116 for the gory detail).

I was wondering if we can amortize the syscall load by caching IndexReader objects. The search (which is done in webapp) looks like this:


https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/search/SearchEngine.java#L203

and the idea would be to reuse each IndexReader until the next refresh of its pertaining index. This would avoid the syscalls during MmapDirectory.open().

My worry is what happens if indexer runs and writes to the index files while they are mmap'ed in memory - could this lead to corrupted search ?

The reindex work is visible here:


https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/index/IndexDatabase.java#L341

The documents are added or removed in the call to indexDown() which is basically recursive traversal of directory tree. The commit happens only after the traversal is done.

The IndexWriter is setup with CREATE_OR_APPEND which I am not sure is desired for the reuse. If we can avoid index files to be written into (or at least make sure they are appended only) while reindexing, this should make the reuse possible I think.

Any comments are welcome,


v.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to