Re: Query Performance and Optimization

Christoph Kiehl Tue, 13 Mar 2007 14:50:31 -0800

David Johnson wrote:

Out of the Jackrabbit code,
DescendantSelfAxisQuery.DescendantSelfAxisScorer.next()
is now taking the most time while executing my query suite - taking 68% of
the time, within it, calls to
DescendantSelfAxisQuery.DescendantSelfAxisScorer.calculateSubHits() taking
the majority of time (basically all of the time).  Then calls to
BooleanScorer2.score(HitCollector) - back to Lucene code - is taking the
majority of time.  If more specific profiling data is desired, please feel
free to ask.  I can also share the profile data in the form of a Netbeans
profile snapshot.


To my understanding, calculateSubHits() can be divided into to parts:

- The first part queries all nodes that are directly addressed by your xpath(for /foo/bar//* this will be /foo/bar[1], /foo/bar[2], ...). This query isquite fast in my experience.- The second part does the actual work, i.e. the lucene query on the nodeattributes. I don't think there is much potential for improvement here unlessyou dig into lucene itself.

On the contrary to DescendantSelfAxisScorer.next(). This method takes the resultfrom part two (subHits) and filters all nodes that are not part of the result ofpart one (contextHits) or a child node of one of the nodes in contextHits. Tofilter these nodes a lot of parent-child relations have to resolved. I thinkthere should be some caching potential for contextHits here if you use the samebasis like /foo/bar//* for a lot of queries. But this cache would only be validfor a particular IndexReader, that is to say it will only be beneficial if yourrepository is quite stable.

I was digging a bit into Jackrabbit today and found another place where somecaching did provide a substantial performance gain to queries which check oneattribute for more than one value (like /foo/[EMAIL PROTECTED]:bar='john' orfoo:bar='doe']). The BitSet in calculateDocFilter() is right now created twicefor the query above. On large repositories this takes about 200ms per BitSet onmy machine for a particular field. Caching these BitSets per IndexReader andfield in a WeakHashMap with the IndexReader as a key gave me some realimprovements. But this caching is as well only beneficial for repositories thatare not heavily changing, as this will lead to the creation of new IndexReadersand invalidate the Cache.

As both mentioned caches rely heavily on IndexReader reuse it would probably bebetter to have caches per index segment as someone mentioned in the thread aboutusing Lucene filters, as segment are relatively stable.

That's what I've found out until now. I'll do some more research the next days,as we definitely need to improve query performance for our application.

I would like to hear some comments from the JackRabbit gurus and feel free tocorrect me - I just started ;)


Cheers,
Christoph

Re: Query Performance and Optimization

Reply via email to