Ard Schrijvers wrote:

Query q = qm.createQuery("stuff//[EMAIL PROTECTED]", Query.XPATH);
if (q instanceof QueryImpl) {
    // limit the result set
    ((QueryImpl) q).setLimit(1);
}

Since my "stuff//[EMAIL PROTECTED]" gives me 1.200.000, it makes perfect
sense to users I think, that even with our patches and a working
cache, that retaining them all would be slow. But if I set the limit
to 1 or 10, I would expect to have performance (certainly when you
have not implemented any AccessManager).

But, if I set limit to 1, why would we have to check all 1.200.000
parents wether the path is correct?

I'm not quite sure if this is a valid/common use case. I can't imagine doing a query like this without using an "order by" clause. Because without an "order by" you will just get a random node. But if you use an "order by" you need to get all nodes first anyway.

If I get a sorted hits by lucene (only on the "//[EMAIL PROTECTED]" part
(perhaps with an order by as well), so without the initial path), I
would want to start with the first one, and check the parent, then
the second, etc, untill I have a hit that is correct according its
path. If I have a limit of 10, we would need to get 10 successes.
Obviously, in the worst case scenario, we would still have to check
every hit for its parents, but this would be rather exceptional i
think.

Ok, I see. You would like to check parent-child relations lazily? Well this has to drawbacks I think:

1) The total result size will be very inaccurate until you fetched the whole result set. Even now it might be inaccurate because of AccessManager checks but doing lazy parent-child relation check will make it almost unusable. 2) DescendantSelfAxisQueries and ChildAxisQueries are not only used as a final selector but can also be used inside a query like this:

        stuff//[EMAIL PROTECTED]'text' and @foo/count]

You probably can't calculate @foo/count lazyily.

and I have > 1.000.000 hits, and I have to wait, even in the cached
version, a few seconds, but changing "stuff//[EMAIL PROTECTED]" into
"//[EMAIL PROTECTED]" reduces it to a couple of ms, that does not make sense.

I know what you are talking about. That's why I don't use any hierarchical queries at all. My queries all look like:

        //element(*, nt:specific-node-type)[EMAIL PROTECTED]

So I'm distinguishing my nodes only by node type or sometimes mixins instead of by paths. I would really love to optimize Jackrabbits search to make the two searches you mentioned above perform equally. You would even expect the second one to be faster because it already reduces the number of potentially matching nodes.
But I don't think the "lazy" solution will work. WDOT?

Cheers,
Christoph

Reply via email to