Indeed this is code is ASL2 pre-7.10, but I wouldn't have expected any concerns regardless. Jack volunteered to bring this code to Lucene by removing the Elasticsearch-specific bits.
On Mon, May 10, 2021 at 4:55 PM Michael McCandless < [email protected]> wrote: > +1 to start from the Elasticsearch implementation for low-level query > execution tracing, which I think is from (pre-7.10) ASL2 licensed code? > > That sounds helpful, even with the Heisenberg caveats. > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, May 6, 2021 at 4:24 PM Adrien Grand <[email protected]> wrote: > >> We have something like that in Elasticsearch that wraps queries in order >> to be able to report cost, matchCost and the number of calls to >> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in >> the query tree. >> >> It's not perfect as it needs to disable some optimizations in order to >> work properly. For instance bulk scorers are disabled and conjunctions are >> not inlined, which means that clauses may run in a different order. So >> results need to be interpreted carefully as the way the query gets executed >> when observed may differ a bit from how it gets executed normally. That >> said it has still been useful in a number of cases. I don't think our >> implementation works when IndexSearcher is configured with an executor but >> we could maybe put it in sandbox and iterate from there? >> >> For your case, do you think it could be attributed to deleted docs? >> Deleted docs are checked before two-phase confirmation and collectors but >> after disjunctions/conjunctions of postings. >> >> Le jeu. 6 mai 2021 à 20:20, Michael Sokolov <[email protected]> a >> écrit : >> >>> Do we have a way to understand how BooleanQuery (and other composite >>> queries) are advancing their child queries? For example, a simple >>> conjunction of two queries advances the more restrictive (lower >>> cost()) query first, enabling the more costly query to skip over more >>> documents. But we may not be making the best choice in every case, and >>> I would like to know, for some query, how we are doing. For example, >>> we could execute in a debugging mode, interposing something that wraps >>> or observes the Scorers in some way, gathering statistics about how >>> many documents are visited by each Scorer, which can be aggregated for >>> later analysis. >>> >>> This is motivated by a use case we have in which we currently >>> post-filter our query results in a custom collector using some filters >>> that we know to be expensive (they must be evaluated on every >>> document), but we would rather express these post-filters as Queries >>> and have them advanced during the main Query execution. However when >>> we tried to do that, we saw some slowdowns (in spite of marking these >>> Queries as high-cost) and I suspect it is due to the iteration order, >>> but I'm not sure how to debug. >>> >>> Suggestions welcome! >>> >>> -Mike >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> -- Adrien
