Hi all,
when trying to identify bottlenecks in our application, I found that
each search which involves multiple indexes is performing lots of
mmap()/open() syscalls. This is a natural consequence of using
MmapDirectory. So even if file system caches are properly warmed, this
might add couple of seconds (depending on operating system or
virtualization technology) to the request handling time, especially when
the number of searched indexes is in hundreds (see
https://github.com/OpenGrok/OpenGrok/issues/1116 for the gory detail).
I was wondering if we can amortize the syscall load by caching
IndexReader objects. The search (which is done in webapp) looks like this:
https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/search/SearchEngine.java#L203
and the idea would be to reuse each IndexReader until the next refresh
of its pertaining index. This would avoid the syscalls during
MmapDirectory.open().
My worry is what happens if indexer runs and writes to the index files
while they are mmap'ed in memory - could this lead to corrupted search ?
The reindex work is visible here:
https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/index/IndexDatabase.java#L341
The documents are added or removed in the call to indexDown() which is
basically recursive traversal of directory tree. The commit happens only
after the traversal is done.
The IndexWriter is setup with CREATE_OR_APPEND which I am not sure is
desired for the reuse. If we can avoid index files to be written into
(or at least make sure they are appended only) while reindexing, this
should make the reuse possible I think.
Any comments are welcome,
v.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org