Any pointers and thoughts from the developers who have worked on the
LuceneQueryBuilder would be very appreciated.  As an idea, I was thinking of
running the Query AST through an optimization before it is passed the the
query builder.  Perhaps in
org.apache.jackrabbit.core.query.lucene.QueryImpl.execute() right before the
LueceneQueryBuilder.createQuery call.

Has anyone done any profiling on queries?  I have some data that I have
gathered with the Netbeans profiler that I could share if anyone is
interested.  Some highlights:

org.apache.lucene.search.Searcher.search(...) and children are taking 96%
time
of the children the first "hit" into jackrabbit code is at
org.apache.jackrabbit.core.query.lucene.SharedFiledSortComparator.newComparator(...)
with 58% time
with its child -
org.apache.jackrabbit.core.query.lucene.SharedFieldCache.getStringIndex(...)
taking all of its time.

At that point the biggest child is
org.apache.lucene.index.MultiTermDocs.next() taking the majority of the time
from then on out.

Any pointers/thoughts on either writing an optimizer for Lucene, alternate
indexing engines or even how to optimize queries would be appreciated.

-Dave

On 3/1/07, Christoph Kiehl <[EMAIL PROTECTED]> wrote:

David Johnson wrote:

> Digging into the internals of Jackrabbit, we have noticed that there is
an
> implementation of RangeQuery that essentially walks the results if the #
of
> query terms is greater than what Lucene can handle.  Reading the Lucene
> documentation, it looks like Filters are the recommended method of
> implementing "large" range queries, and also seem like a natural for
> matching node types - i.e., select * from Column

As we are expecting to reach a count of 1.000.000+ nodes in one of our
repositories I'm always interested in any performance improvements. Is
anyone
investigating in this proposal? Or could at least anyone tell me if it's
worth
investigating? ;)

Cheers,
Christoph


Reply via email to