Both of these proposals sound great - particularly the additional caching in
DescendantSelfAxisQuery.  I think this would address the scenario that I
suggested additional indexing earlier in this thread.  As I mentioned, in my
query test set DescendantSelfAxisQuery.DescendantSelfAxisScorer.next() is
taking the most time, so any speed-up there would be great.

-Dave


On 3/14/07, Christoph Kiehl <[EMAIL PROTECTED]> wrote:

Marcel Reutegger wrote:
> Christoph Kiehl wrote:
>>> I've created a jira issue:
http://issues.apache.org/jira/browse/JCR-791
>>
>> Are you working on this issue? Or should I try to implement something?
>
> I just started working on it ;)

Great news ;)

Now that you are working on implementing this cache on a per index reader
basis,
I got another suggestion for improvement ;)

As I understand in DescendantSelfAxisQuery.DescendantSelfAxisScorer the
contextHits are used to filter the subHits result to only include nodes of
the
given context. The context is something like /foo/bar//*, which means all
descendents of /foo/bar. Is that right?
In our application the context for most of our queries is the same, so it
would
make a lot of sense to cache the contextHits for this context. There is
already
a todo in the constructor of DescendantSelfAxisScorer which probably aims
at this.
I would go even further and not only cache these contextHits, but cache
contextHits per _node_ in a hierarchy, which means there is a BitSet for
/foo/bar/bla[1], /foo/bar/bla[2] and so on. If I need the BitSet for
/foo/bar//*
I could just join the BitSets of the descendents. This would allow reuse
the
BitSets for different contexts. What do you think about this? It should
improve
performance a lot the larger the resultset is an the less specific your
context is.

>> It seems like if I rewrite the following query from
>>
>> /foo/[EMAIL PROTECTED]:bar!='john' and @foo:bar!='doe']
>>
>> to
>>
>> /foo/*[not(@foo:bar='john' or @foo:bar='doe')]
>>
>> I get a better performance. Can you confirm this?
>
> Yes, I can. Basically because any != comparison is translated into: get
> all nodes with the given property, then exclude the ones that match the
> literal. Which is obviously much more expensive than just: get all nodes
> that match a given literal.

Wouldn't it make sense to rewrite all @foo:bar!='john' queries to
not(@foo:bar!='john') by default instead of using creating a
MatchAllQuery?

Cheers,
Christoph


Reply via email to