Re: Commented: (JCR-791) Cache BitSet per IndexReader in MatchAllScorer

Christoph Kiehl Fri, 16 Mar 2007 08:09:32 -0800

Marcel Reutegger (JIRA) wrote:

Here's what I've done so far:


- Introduced a MultiIndexReader interface that allows to access the sub index 
readers.
- CachingMultiReader and SearchIndex.CombinedIndexReader now implement 
MultiIndexReader
- Created a MultiScorer which spans multiple sub scorers and combines. The 
MultiScorer exposes the sub scorers as if there is just a single scorer.
- Changed MatchAllWeight to create individual scorers for each sub IndexReader 
contained in a MultiIndexReader and finally combines them into a MultiScorer.
- Introduced a BitSet cache in MatchAllScorer


Great. Thanks a lot!

I then conducted the following tests:

Setup:

- 50'000 nodes
- resultFetchSize: 50
- respectDocumentOrder: false

100 queries: //element(*, nt:unstructured)[EMAIL PROTECTED]
 (only size of NodeIterator is read, no node access)

Results:

1) with jackrabbit 1.2.3:
    82078 ms

2) with MatchAllScorer per index segment
  combined with MultiScorer without caching:
    10297 ms

3) with MatchAllScorer per index segment
  combined with MultiScorer with caching:
     6156 ms

My conclusion is that the the lucene MultiTermDocs implementation adds 
significant cost when a single MatchAllScorer is used in test scenario 1). And 
it actually makes sense. If a single MatchAllScorer is used, lucene has to 
merge sort the @foo terms of several index segments, while in the test 
scenarios 2) and 3) no merge sort is needed for the @foo terms.

With the changes the query performance even seems good enough even without caching.

I'm tempted to only check the changes without caching because the additional 
performance improvement with caching does not seem to warrant the memory 
consumption of the cache: 2) decreases the query time compared to the current 
implementation by 87% while 3) decreases query time by 92%.

The effect of caching should increase if you use queries which test an attributemore than once, like:


//element(*, nt:unstructured)[EMAIL PROTECTED]'1' or @foo!='2' or @foo!='3']

May be we can add a configuration option to SearchIndex which allows to enablecaching? This, way one can choose if his/her focus is on memory or on processingtime. We have a situation for example where a lot of memory is available butprocessing time is the bottleneck.

Would you mind sharing a patch for the caching you implemented? Do you may beeven have a testcase which generates this test repository? I could do somefurther tests here ..


Cheers,
Christoph

Re: Commented: (JCR-791) Cache BitSet per IndexReader in MatchAllScorer

Reply via email to