FYI this got just checked in: https://issues.apache.org/jira/browse/LUCENE-9965.
I'd be curious to know if it helps with your problem, Mike. On Wed, May 12, 2021 at 1:54 PM Adrien Grand <[email protected]> wrote: > Indeed this is code is ASL2 pre-7.10, but I wouldn't have expected any > concerns regardless. Jack volunteered to bring this code to Lucene by > removing the Elasticsearch-specific bits. > > On Mon, May 10, 2021 at 4:55 PM Michael McCandless < > [email protected]> wrote: > >> +1 to start from the Elasticsearch implementation for low-level query >> execution tracing, which I think is from (pre-7.10) ASL2 licensed code? >> >> That sounds helpful, even with the Heisenberg caveats. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Thu, May 6, 2021 at 4:24 PM Adrien Grand <[email protected]> wrote: >> >>> We have something like that in Elasticsearch that wraps queries in order >>> to be able to report cost, matchCost and the number of calls to >>> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in >>> the query tree. >>> >>> It's not perfect as it needs to disable some optimizations in order to >>> work properly. For instance bulk scorers are disabled and conjunctions are >>> not inlined, which means that clauses may run in a different order. So >>> results need to be interpreted carefully as the way the query gets executed >>> when observed may differ a bit from how it gets executed normally. That >>> said it has still been useful in a number of cases. I don't think our >>> implementation works when IndexSearcher is configured with an executor but >>> we could maybe put it in sandbox and iterate from there? >>> >>> For your case, do you think it could be attributed to deleted docs? >>> Deleted docs are checked before two-phase confirmation and collectors but >>> after disjunctions/conjunctions of postings. >>> >>> Le jeu. 6 mai 2021 à 20:20, Michael Sokolov <[email protected]> a >>> écrit : >>> >>>> Do we have a way to understand how BooleanQuery (and other composite >>>> queries) are advancing their child queries? For example, a simple >>>> conjunction of two queries advances the more restrictive (lower >>>> cost()) query first, enabling the more costly query to skip over more >>>> documents. But we may not be making the best choice in every case, and >>>> I would like to know, for some query, how we are doing. For example, >>>> we could execute in a debugging mode, interposing something that wraps >>>> or observes the Scorers in some way, gathering statistics about how >>>> many documents are visited by each Scorer, which can be aggregated for >>>> later analysis. >>>> >>>> This is motivated by a use case we have in which we currently >>>> post-filter our query results in a custom collector using some filters >>>> that we know to be expensive (they must be evaluated on every >>>> document), but we would rather express these post-filters as Queries >>>> and have them advanced during the main Query execution. However when >>>> we tried to do that, we saw some slowdowns (in spite of marking these >>>> Queries as high-cost) and I suspect it is due to the iteration order, >>>> but I'm not sure how to debug. >>>> >>>> Suggestions welcome! >>>> >>>> -Mike >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> > > -- > Adrien > -- Adrien
