We have something like that in Elasticsearch that wraps queries in order to be able to report cost, matchCost and the number of calls to nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in the query tree.
It's not perfect as it needs to disable some optimizations in order to work properly. For instance bulk scorers are disabled and conjunctions are not inlined, which means that clauses may run in a different order. So results need to be interpreted carefully as the way the query gets executed when observed may differ a bit from how it gets executed normally. That said it has still been useful in a number of cases. I don't think our implementation works when IndexSearcher is configured with an executor but we could maybe put it in sandbox and iterate from there? For your case, do you think it could be attributed to deleted docs? Deleted docs are checked before two-phase confirmation and collectors but after disjunctions/conjunctions of postings. Le jeu. 6 mai 2021 à 20:20, Michael Sokolov <[email protected]> a écrit : > Do we have a way to understand how BooleanQuery (and other composite > queries) are advancing their child queries? For example, a simple > conjunction of two queries advances the more restrictive (lower > cost()) query first, enabling the more costly query to skip over more > documents. But we may not be making the best choice in every case, and > I would like to know, for some query, how we are doing. For example, > we could execute in a debugging mode, interposing something that wraps > or observes the Scorers in some way, gathering statistics about how > many documents are visited by each Scorer, which can be aggregated for > later analysis. > > This is motivated by a use case we have in which we currently > post-filter our query results in a custom collector using some filters > that we know to be expensive (they must be evaluated on every > document), but we would rather express these post-filters as Queries > and have them advanced during the main Query execution. However when > we tried to do that, we saw some slowdowns (in spite of marking these > Queries as high-cost) and I suspect it is due to the iteration order, > but I'm not sure how to debug. > > Suggestions welcome! > > -Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
