Ard Schrijvers wrote:
Query q = qm.createQuery("stuff//[EMAIL PROTECTED]", Query.XPATH);
if (q instanceof QueryImpl) {
// limit the result set
((QueryImpl) q).setLimit(1);
}
Since my "stuff//[EMAIL PROTECTED]" gives me 1.200.000, it makes perfect
sense to users I think, that even with our patches and a working
cache, that retaining them all would be slow. But if I set the limit
to 1 or 10, I would expect to have performance (certainly when you
have not implemented any AccessManager).
But, if I set limit to 1, why would we have to check all 1.200.000
parents wether the path is correct?
I'm not quite sure if this is a valid/common use case. I can't imagine
doing a query like this without using an "order by" clause. Because
without an "order by" you will just get a random node. But if you use an
"order by" you need to get all nodes first anyway.
If I get a sorted hits by lucene (only on the "//[EMAIL PROTECTED]" part
(perhaps with an order by as well), so without the initial path), I
would want to start with the first one, and check the parent, then
the second, etc, untill I have a hit that is correct according its
path. If I have a limit of 10, we would need to get 10 successes.
Obviously, in the worst case scenario, we would still have to check
every hit for its parents, but this would be rather exceptional i
think.
Ok, I see. You would like to check parent-child relations lazily? Well
this has to drawbacks I think:
1) The total result size will be very inaccurate until you fetched the
whole result set. Even now it might be inaccurate because of
AccessManager checks but doing lazy parent-child relation check will
make it almost unusable.
2) DescendantSelfAxisQueries and ChildAxisQueries are not only used as a
final selector but can also be used inside a query like this:
stuff//[EMAIL PROTECTED]'text' and @foo/count]
You probably can't calculate @foo/count lazyily.
and I have > 1.000.000 hits, and I have to wait, even in the cached
version, a few seconds, but changing "stuff//[EMAIL PROTECTED]" into
"//[EMAIL PROTECTED]" reduces it to a couple of ms, that does not make sense.
I know what you are talking about. That's why I don't use any
hierarchical queries at all. My queries all look like:
//element(*, nt:specific-node-type)[EMAIL PROTECTED]
So I'm distinguishing my nodes only by node type or sometimes mixins
instead of by paths.
I would really love to optimize Jackrabbits search to make the two
searches you mentioned above perform equally. You would even expect the
second one to be faster because it already reduces the number of
potentially matching nodes.
But I don't think the "lazy" solution will work. WDOT?
Cheers,
Christoph