Thanks Adrien, that is something like what I had in mind. If you are
able to share, that could be very helpful. And -- deleted docs is not
something I had considered, it's possibly a problem here. I'd have to
go check - I think these "filter" Queries were implemented in the
second part of the two-phase iteration.

On Thu, May 6, 2021 at 4:24 PM Adrien Grand <[email protected]> wrote:
>
> We have something like that in Elasticsearch that wraps queries in order to 
> be able to report cost, matchCost and the number of calls to 
> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in 
> the query tree.
>
> It's not perfect as it needs to disable some optimizations in order to work 
> properly. For instance bulk scorers are disabled and conjunctions are not 
> inlined, which means that clauses may run in a different order. So results 
> need to be interpreted carefully as the way the query gets executed when 
> observed may differ a bit from how it gets executed normally. That said it 
> has still been useful in a number of cases. I don't think our implementation 
> works when IndexSearcher is configured with an executor but we could maybe 
> put it in sandbox and iterate from there?
>
> For your case, do you think it could be attributed to deleted docs? Deleted 
> docs are checked before two-phase confirmation and collectors but after 
> disjunctions/conjunctions of postings.
>
> Le jeu. 6 mai 2021 à 20:20, Michael Sokolov <[email protected]> a écrit :
>>
>> Do we have a way to understand how BooleanQuery (and other composite
>> queries) are advancing their child queries? For example, a simple
>> conjunction of two queries advances the more restrictive (lower
>> cost()) query first, enabling the more costly query to skip over more
>> documents. But we may not be making the best choice in every case, and
>> I would like to know, for some query, how we are doing. For example,
>> we could execute in a debugging mode, interposing something that wraps
>> or observes the Scorers in some way, gathering statistics about how
>> many documents are visited by each Scorer, which can be aggregated for
>> later analysis.
>>
>> This is motivated by a use case we have in which we currently
>> post-filter our query results in a custom collector using some filters
>> that we know to be expensive (they must be evaluated on every
>> document), but we would rather express these post-filters as Queries
>> and have them advanced during the main Query execution. However when
>> we tried to do that, we saw some slowdowns (in spite of marking these
>> Queries as high-cost) and I suspect it is due to the iteration order,
>> but I'm not sure how to debug.
>>
>> Suggestions welcome!
>>
>> -Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to