Marcel Reutegger wrote:
2. Lucene uses the FieldCaches to speed up sorting and range queries
which is exactly what we are after. Those FieldCaches are per
IndexReader.
Jackrabbit uses an IndexSearcher which searches on a single
IndexReader which is most likely to be an instance of
CachingMultiReader. So on every search which builds up a FieldCache
this FieldCache instance is associated with this instance of a
CachingMultiReader. On successive queries which operate on this
CachingMultiReader you will get a tremendous speedup for queries which
can reuse those associated FieldCache instances.
The problem is that Jackrabbit creates a new CachingMultiReader
_everytime_ one of the underlying indexes are modified. This means if
you just change _one_ item in the repository you will need to rebuild
all those FieldCaches because the existing FieldCaches are associated
with the old instance of CachingMultiReader.
This does not only lead to slow search response times for queries
which contains range queries or are sorted by a field but also leads
to massive memory consumption (depending on the size of your indexes)
because there might be multiple instances of CachingMultiReaders in
use if you have a scenario where a lot of queries and item
modifications are executed concurrently.
As far as I understand the solution is to use a MultiSearcher which
uses multiple IndexReaders. Since due to the merging strategy most of
the indexes are stable this means the FieldCaches can be used for a
much longer time.
Using a multi searcher means that you must be able to execute a query on
each of the index segments independently. this is not possible because
hierarchy information is always spread across multiple segments. e.g. a
node in one segment may reference a parent in another segment.
I just created an issue [1] to which I attached an initial patch which works
quite well for us. It doesn't use MultiSearcher but extends
SharedFieldSortComparator to be aware of the underlying index segments. Could
you please review the patch?
Cheers,
Christoph
[1] http://issues.apache.org/jira/browse/JCR-974