Christoph Kiehl wrote:
As I understand in DescendantSelfAxisQuery.DescendantSelfAxisScorer the contextHits are used to filter the subHits result to only include nodes of the given context. The context is something like /foo/bar//*, which means all descendents of /foo/bar. Is that right?

yes, that's correct.

In our application the context for most of our queries is the same, so it would make a lot of sense to cache the contextHits for this context. There is already a todo in the constructor of DescendantSelfAxisScorer which probably aims at this.

no, not exactly. because the size of the BitSets used are equal to the overall size of the index they may become quite large. that's why I once thought it may be useful to reuse BitSet instances, which is not the same as caching the result of a query.

I would go even further and not only cache these contextHits, but cache contextHits per _node_ in a hierarchy, which means there is a BitSet for /foo/bar/bla[1], /foo/bar/bla[2] and so on. If I need the BitSet for /foo/bar//* I could just join the BitSets of the descendents. This would allow reuse the BitSets for different contexts. What do you think about this? It should improve performance a lot the larger the resultset is an the less specific your context is.

hmm, I'm not sure how you would implement that. joining the BitSets you mentioned may as well be expensive if you reach a certain amount of them.

furthermore a BitSet for /foo/bar//* is very unstable in a sense that it will change very frequently. with every change under /foo/bar a node gets a new document number and we would have to create a new BitSet. I guess we would need to find a way to efficiently modify an existing BitSet when:

- the index is updated (because of a change)
- index segments are merged (caused by a background thread)

Wouldn't it make sense to rewrite all @foo:bar!='john' queries to not(@foo:bar!='john') by default instead of using creating a MatchAllQuery?

do you mean rewrite: @foo:bar!='john' to not(@foo:bar='john') ?

regards
 marcel

Reply via email to