Any pointers and thoughts from the developers who have worked on the LuceneQueryBuilder would be very appreciated. As an idea, I was thinking of running the Query AST through an optimization before it is passed the the query builder. Perhaps in org.apache.jackrabbit.core.query.lucene.QueryImpl.execute() right before the LueceneQueryBuilder.createQuery call.
Has anyone done any profiling on queries? I have some data that I have gathered with the Netbeans profiler that I could share if anyone is interested. Some highlights: org.apache.lucene.search.Searcher.search(...) and children are taking 96% time of the children the first "hit" into jackrabbit code is at org.apache.jackrabbit.core.query.lucene.SharedFiledSortComparator.newComparator(...) with 58% time with its child - org.apache.jackrabbit.core.query.lucene.SharedFieldCache.getStringIndex(...) taking all of its time. At that point the biggest child is org.apache.lucene.index.MultiTermDocs.next() taking the majority of the time from then on out. Any pointers/thoughts on either writing an optimizer for Lucene, alternate indexing engines or even how to optimize queries would be appreciated. -Dave On 3/1/07, Christoph Kiehl <[EMAIL PROTECTED]> wrote:
David Johnson wrote: > Digging into the internals of Jackrabbit, we have noticed that there is an > implementation of RangeQuery that essentially walks the results if the # of > query terms is greater than what Lucene can handle. Reading the Lucene > documentation, it looks like Filters are the recommended method of > implementing "large" range queries, and also seem like a natural for > matching node types - i.e., select * from Column As we are expecting to reach a count of 1.000.000+ nodes in one of our repositories I'm always interested in any performance improvements. Is anyone investigating in this proposal? Or could at least anyone tell me if it's worth investigating? ;) Cheers, Christoph