FYI this got just checked in:
https://issues.apache.org/jira/browse/LUCENE-9965.

I'd be curious to know if it helps with your problem, Mike.

On Wed, May 12, 2021 at 1:54 PM Adrien Grand <[email protected]> wrote:

> Indeed this is code is ASL2 pre-7.10, but I wouldn't have expected any
> concerns regardless. Jack volunteered to bring this code to Lucene by
> removing the Elasticsearch-specific bits.
>
> On Mon, May 10, 2021 at 4:55 PM Michael McCandless <
> [email protected]> wrote:
>
>> +1 to start from the Elasticsearch implementation for low-level query
>> execution tracing, which I think is from (pre-7.10) ASL2 licensed code?
>>
>> That sounds helpful, even with the Heisenberg caveats.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, May 6, 2021 at 4:24 PM Adrien Grand <[email protected]> wrote:
>>
>>> We have something like that in Elasticsearch that wraps queries in order
>>> to be able to report cost, matchCost and the number of calls to
>>> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in
>>> the query tree.
>>>
>>> It's not perfect as it needs to disable some optimizations in order to
>>> work properly. For instance bulk scorers are disabled and conjunctions are
>>> not inlined, which means that clauses may run in a different order. So
>>> results need to be interpreted carefully as the way the query gets executed
>>> when observed may differ a bit from how it gets executed normally. That
>>> said it has still been useful in a number of cases. I don't think our
>>> implementation works when IndexSearcher is configured with an executor but
>>> we could maybe put it in sandbox and iterate from there?
>>>
>>> For your case, do you think it could be attributed to deleted docs?
>>> Deleted docs are checked before two-phase confirmation and collectors but
>>> after disjunctions/conjunctions of postings.
>>>
>>> Le jeu. 6 mai 2021 à 20:20, Michael Sokolov <[email protected]> a
>>> écrit :
>>>
>>>> Do we have a way to understand how BooleanQuery (and other composite
>>>> queries) are advancing their child queries? For example, a simple
>>>> conjunction of two queries advances the more restrictive (lower
>>>> cost()) query first, enabling the more costly query to skip over more
>>>> documents. But we may not be making the best choice in every case, and
>>>> I would like to know, for some query, how we are doing. For example,
>>>> we could execute in a debugging mode, interposing something that wraps
>>>> or observes the Scorers in some way, gathering statistics about how
>>>> many documents are visited by each Scorer, which can be aggregated for
>>>> later analysis.
>>>>
>>>> This is motivated by a use case we have in which we currently
>>>> post-filter our query results in a custom collector using some filters
>>>> that we know to be expensive (they must be evaluated on every
>>>> document), but we would rather express these post-filters as Queries
>>>> and have them advanced during the main Query execution. However when
>>>> we tried to do that, we saw some slowdowns (in spite of marking these
>>>> Queries as high-cost) and I suspect it is due to the iteration order,
>>>> but I'm not sure how to debug.
>>>>
>>>> Suggestions welcome!
>>>>
>>>> -Mike
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>
> --
> Adrien
>


-- 
Adrien

Reply via email to