Re: DescendantSelfAxisWeight ChildAxisQuery performance

Christoph Kiehl Fri, 30 Nov 2007 01:00:22 -0800

Ard Schrijvers wrote:

Query q = qm.createQuery("stuff//[EMAIL PROTECTED]", Query.XPATH);
if (q instanceof QueryImpl) {
    // limit the result set
    ((QueryImpl) q).setLimit(1);
}


Since my "stuff//[EMAIL PROTECTED]" gives me 1.200.000, it makes perfect
sense to users I think, that even with our patches and a working
cache, that retaining them all would be slow. But if I set the limit
to 1 or 10, I would expect to have performance (certainly when you
have not implemented any AccessManager).

But, if I set limit to 1, why would we have to check all 1.200.000
parents wether the path is correct?

I'm not quite sure if this is a valid/common use case. I can't imaginedoing a query like this without using an "order by" clause. Becausewithout an "order by" you will just get a random node. But if you use an"order by" you need to get all nodes first anyway.

If I get a sorted hits by lucene (only on the "//[EMAIL PROTECTED]" part
(perhaps with an order by as well), so without the initial path), I
would want to start with the first one, and check the parent, then
the second, etc, untill I have a hit that is correct according its
path. If I have a limit of 10, we would need to get 10 successes.
Obviously, in the worst case scenario, we would still have to check
every hit for its parents, but this would be rather exceptional i
think.

Ok, I see. You would like to check parent-child relations lazily? Wellthis has to drawbacks I think:

1) The total result size will be very inaccurate until you fetched thewhole result set. Even now it might be inaccurate because ofAccessManager checks but doing lazy parent-child relation check willmake it almost unusable.2) DescendantSelfAxisQueries and ChildAxisQueries are not only used as afinal selector but can also be used inside a query like this:


        stuff//[EMAIL PROTECTED]'text' and @foo/count]

You probably can't calculate @foo/count lazyily.

and I have > 1.000.000 hits, and I have to wait, even in the cached
version, a few seconds, but changing "stuff//[EMAIL PROTECTED]" into
"//[EMAIL PROTECTED]" reduces it to a couple of ms, that does not make sense.

I know what you are talking about. That's why I don't use anyhierarchical queries at all. My queries all look like:


        //element(*, nt:specific-node-type)[EMAIL PROTECTED]

So I'm distinguishing my nodes only by node type or sometimes mixinsinstead of by paths.I would really love to optimize Jackrabbits search to make the twosearches you mentioned above perform equally. You would even expect thesecond one to be faster because it already reduces the number ofpotentially matching nodes.

But I don't think the "lazy" solution will work. WDOT?

Cheers,
Christoph

Re: DescendantSelfAxisWeight ChildAxisQuery performance

Reply via email to