Yes, I'm looking forward to checking it out, and really appreciate the work to bring it here! I'm going to be out in the woods for a few days, but it's on my list to try this out when I get back
On Wed, Jun 9, 2021 at 5:41 PM Adrien Grand <[email protected]> wrote: > > FYI this got just checked in: > https://issues.apache.org/jira/browse/LUCENE-9965. > > I'd be curious to know if it helps with your problem, Mike. > > On Wed, May 12, 2021 at 1:54 PM Adrien Grand <[email protected]> wrote: >> >> Indeed this is code is ASL2 pre-7.10, but I wouldn't have expected any >> concerns regardless. Jack volunteered to bring this code to Lucene by >> removing the Elasticsearch-specific bits. >> >> On Mon, May 10, 2021 at 4:55 PM Michael McCandless >> <[email protected]> wrote: >>> >>> +1 to start from the Elasticsearch implementation for low-level query >>> execution tracing, which I think is from (pre-7.10) ASL2 licensed code? >>> >>> That sounds helpful, even with the Heisenberg caveats. >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Thu, May 6, 2021 at 4:24 PM Adrien Grand <[email protected]> wrote: >>>> >>>> We have something like that in Elasticsearch that wraps queries in order >>>> to be able to report cost, matchCost and the number of calls to >>>> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in >>>> the query tree. >>>> >>>> It's not perfect as it needs to disable some optimizations in order to >>>> work properly. For instance bulk scorers are disabled and conjunctions are >>>> not inlined, which means that clauses may run in a different order. So >>>> results need to be interpreted carefully as the way the query gets >>>> executed when observed may differ a bit from how it gets executed >>>> normally. That said it has still been useful in a number of cases. I don't >>>> think our implementation works when IndexSearcher is configured with an >>>> executor but we could maybe put it in sandbox and iterate from there? >>>> >>>> For your case, do you think it could be attributed to deleted docs? >>>> Deleted docs are checked before two-phase confirmation and collectors but >>>> after disjunctions/conjunctions of postings. >>>> >>>> Le jeu. 6 mai 2021 à 20:20, Michael Sokolov <[email protected]> a écrit : >>>>> >>>>> Do we have a way to understand how BooleanQuery (and other composite >>>>> queries) are advancing their child queries? For example, a simple >>>>> conjunction of two queries advances the more restrictive (lower >>>>> cost()) query first, enabling the more costly query to skip over more >>>>> documents. But we may not be making the best choice in every case, and >>>>> I would like to know, for some query, how we are doing. For example, >>>>> we could execute in a debugging mode, interposing something that wraps >>>>> or observes the Scorers in some way, gathering statistics about how >>>>> many documents are visited by each Scorer, which can be aggregated for >>>>> later analysis. >>>>> >>>>> This is motivated by a use case we have in which we currently >>>>> post-filter our query results in a custom collector using some filters >>>>> that we know to be expensive (they must be evaluated on every >>>>> document), but we would rather express these post-filters as Queries >>>>> and have them advanced during the main Query execution. However when >>>>> we tried to do that, we saw some slowdowns (in spite of marking these >>>>> Queries as high-cost) and I suspect it is due to the iteration order, >>>>> but I'm not sure how to debug. >>>>> >>>>> Suggestions welcome! >>>>> >>>>> -Mike >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >> >> >> -- >> Adrien > > > > -- > Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
