Re: DescendantSelfAxisWeight ChildAxisQuery performance

Marcel Reutegger Fri, 30 Nov 2007 02:55:37 -0800

Ard Schrijvers wrote:

Ard Schrijvers wrote:
Query q = qm.createQuery("stuff//[EMAIL PROTECTED]", Query.XPATH); if (qinstanceof QueryImpl) {
    // limit the result set
    ((QueryImpl) q).setLimit(1);
}
Since my "stuff//[EMAIL PROTECTED]" gives me 1.200.000, it makes
perfect sense
to users I think, that even with our patches and a working
cache, that
retaining them all would be slow. But if I set the limit to
1 or 10, I
would expect to have performance (certainly when you have notimplemented any AccessManager).
But, if I set limit to 1, why would we have to check all 1.200.000parents wether the path is correct?
I'm not quite sure if this is a valid/common use case. Ican't imagine doing a query like this without using an "orderby" clause. Because without an "order by" you will just get arandom node. But if you use an "order by" you need to get allnodes first anyway.


see my comments below.

This is not my point. Wether you have an order by or not, lucene will
compute the score of all hits anyway. So, no order by, does not mean
that lucene does not order: it orders on score (but ofcourse you already
know that :-) )

So, my thing holds with and without order by.

WRT lucene this is correct. but the same is not true for JCR. if there is noorder by the implementation is free to return the nodes in any order.

I did a quick test and wrote a custom IndexSearcher (see below), which collectsonly the first n matching documents. the test query then executed much fasterbecause the number of DescendantSelfAxisScorer.isValid() calls dropped drastically.

There is one drawback though. you don't know the total number of results. inthis case it might be OK to return -1 for the RangeIterator.getSize().

the order by is more difficult to solve. what we could try is order the resultof the sub query first and then run the descendant axis test against the contextnodes. DescendantSelfAxisQuery does not add nodes to the sub query but onlylimits the set subsequent ordering can be skipped. this requires that we need topass along ordering information with the scorer. e.g. index-order, relevance,property.


In any case we should create a jira issue for it.

regards
 marcel


public class JackrabbitIndexSearcher extends IndexSearcher {

    private final IndexReader reader;

    public JackrabbitIndexSearcher(IndexReader r) {
        super(r);
        this.reader = r;
    }

    // inherit javadoc
    public TopDocs search(Weight weight, Filter filter, int nDocs)
            throws IOException {
        TopDocCollector collector = new TopDocCollector(nDocs);
        Scorer scorer = weight.scorer(reader);
        if (scorer != null) {
            while (scorer.next() && nDocs-- > 0) {
              collector.collect(scorer.doc(), scorer.score());
            }
        }
        return collector.topDocs();
    }
}

Re: DescendantSelfAxisWeight ChildAxisQuery performance

Reply via email to