Both of these proposals sound great - particularly the additional caching in DescendantSelfAxisQuery. I think this would address the scenario that I suggested additional indexing earlier in this thread. As I mentioned, in my query test set DescendantSelfAxisQuery.DescendantSelfAxisScorer.next() is taking the most time, so any speed-up there would be great.
-Dave On 3/14/07, Christoph Kiehl <[EMAIL PROTECTED]> wrote:
Marcel Reutegger wrote: > Christoph Kiehl wrote: >>> I've created a jira issue: http://issues.apache.org/jira/browse/JCR-791 >> >> Are you working on this issue? Or should I try to implement something? > > I just started working on it ;) Great news ;) Now that you are working on implementing this cache on a per index reader basis, I got another suggestion for improvement ;) As I understand in DescendantSelfAxisQuery.DescendantSelfAxisScorer the contextHits are used to filter the subHits result to only include nodes of the given context. The context is something like /foo/bar//*, which means all descendents of /foo/bar. Is that right? In our application the context for most of our queries is the same, so it would make a lot of sense to cache the contextHits for this context. There is already a todo in the constructor of DescendantSelfAxisScorer which probably aims at this. I would go even further and not only cache these contextHits, but cache contextHits per _node_ in a hierarchy, which means there is a BitSet for /foo/bar/bla[1], /foo/bar/bla[2] and so on. If I need the BitSet for /foo/bar//* I could just join the BitSets of the descendents. This would allow reuse the BitSets for different contexts. What do you think about this? It should improve performance a lot the larger the resultset is an the less specific your context is. >> It seems like if I rewrite the following query from >> >> /foo/[EMAIL PROTECTED]:bar!='john' and @foo:bar!='doe'] >> >> to >> >> /foo/*[not(@foo:bar='john' or @foo:bar='doe')] >> >> I get a better performance. Can you confirm this? > > Yes, I can. Basically because any != comparison is translated into: get > all nodes with the given property, then exclude the ones that match the > literal. Which is obviously much more expensive than just: get all nodes > that match a given literal. Wouldn't it make sense to rewrite all @foo:bar!='john' queries to not(@foo:bar!='john') by default instead of using creating a MatchAllQuery? Cheers, Christoph