We have something like that in Elasticsearch that wraps queries in order to
be able to report cost, matchCost and the number of calls to
nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in
the query tree.

It's not perfect as it needs to disable some optimizations in order to work
properly. For instance bulk scorers are disabled and conjunctions are not
inlined, which means that clauses may run in a different order. So results
need to be interpreted carefully as the way the query gets executed when
observed may differ a bit from how it gets executed normally. That said it
has still been useful in a number of cases. I don't think our
implementation works when IndexSearcher is configured with an executor but
we could maybe put it in sandbox and iterate from there?

For your case, do you think it could be attributed to deleted docs? Deleted
docs are checked before two-phase confirmation and collectors but after
disjunctions/conjunctions of postings.

Le jeu. 6 mai 2021 à 20:20, Michael Sokolov <[email protected]> a écrit :

> Do we have a way to understand how BooleanQuery (and other composite
> queries) are advancing their child queries? For example, a simple
> conjunction of two queries advances the more restrictive (lower
> cost()) query first, enabling the more costly query to skip over more
> documents. But we may not be making the best choice in every case, and
> I would like to know, for some query, how we are doing. For example,
> we could execute in a debugging mode, interposing something that wraps
> or observes the Scorers in some way, gathering statistics about how
> many documents are visited by each Scorer, which can be aggregated for
> later analysis.
>
> This is motivated by a use case we have in which we currently
> post-filter our query results in a custom collector using some filters
> that we know to be expensive (they must be evaluated on every
> document), but we would rather express these post-filters as Queries
> and have them advanced during the main Query execution. However when
> we tried to do that, we saw some slowdowns (in spite of marking these
> Queries as high-cost) and I suspect it is due to the iteration order,
> but I'm not sure how to debug.
>
> Suggestions welcome!
>
> -Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to