Yes, I'm looking forward to checking it out, and really appreciate the
work to bring it here! I'm going to be out in the woods for a few
days, but it's on my list to try this out when I get back

On Wed, Jun 9, 2021 at 5:41 PM Adrien Grand <[email protected]> wrote:
>
> FYI this got just checked in: 
> https://issues.apache.org/jira/browse/LUCENE-9965.
>
> I'd be curious to know if it helps with your problem, Mike.
>
> On Wed, May 12, 2021 at 1:54 PM Adrien Grand <[email protected]> wrote:
>>
>> Indeed this is code is ASL2 pre-7.10, but I wouldn't have expected any 
>> concerns regardless. Jack volunteered to bring this code to Lucene by 
>> removing the Elasticsearch-specific bits.
>>
>> On Mon, May 10, 2021 at 4:55 PM Michael McCandless 
>> <[email protected]> wrote:
>>>
>>> +1 to start from the Elasticsearch implementation for low-level query 
>>> execution tracing, which I think is from (pre-7.10) ASL2 licensed code?
>>>
>>> That sounds helpful, even with the Heisenberg caveats.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Thu, May 6, 2021 at 4:24 PM Adrien Grand <[email protected]> wrote:
>>>>
>>>> We have something like that in Elasticsearch that wraps queries in order 
>>>> to be able to report cost, matchCost and the number of calls to 
>>>> nextDoc/advance/matches/score/advanceShallow/getMaxScore for every node in 
>>>> the query tree.
>>>>
>>>> It's not perfect as it needs to disable some optimizations in order to 
>>>> work properly. For instance bulk scorers are disabled and conjunctions are 
>>>> not inlined, which means that clauses may run in a different order. So 
>>>> results need to be interpreted carefully as the way the query gets 
>>>> executed when observed may differ a bit from how it gets executed 
>>>> normally. That said it has still been useful in a number of cases. I don't 
>>>> think our implementation works when IndexSearcher is configured with an 
>>>> executor but we could maybe put it in sandbox and iterate from there?
>>>>
>>>> For your case, do you think it could be attributed to deleted docs? 
>>>> Deleted docs are checked before two-phase confirmation and collectors but 
>>>> after disjunctions/conjunctions of postings.
>>>>
>>>> Le jeu. 6 mai 2021 à 20:20, Michael Sokolov <[email protected]> a écrit :
>>>>>
>>>>> Do we have a way to understand how BooleanQuery (and other composite
>>>>> queries) are advancing their child queries? For example, a simple
>>>>> conjunction of two queries advances the more restrictive (lower
>>>>> cost()) query first, enabling the more costly query to skip over more
>>>>> documents. But we may not be making the best choice in every case, and
>>>>> I would like to know, for some query, how we are doing. For example,
>>>>> we could execute in a debugging mode, interposing something that wraps
>>>>> or observes the Scorers in some way, gathering statistics about how
>>>>> many documents are visited by each Scorer, which can be aggregated for
>>>>> later analysis.
>>>>>
>>>>> This is motivated by a use case we have in which we currently
>>>>> post-filter our query results in a custom collector using some filters
>>>>> that we know to be expensive (they must be evaluated on every
>>>>> document), but we would rather express these post-filters as Queries
>>>>> and have them advanced during the main Query execution. However when
>>>>> we tried to do that, we saw some slowdowns (in spite of marking these
>>>>> Queries as high-cost) and I suspect it is due to the iteration order,
>>>>> but I'm not sure how to debug.
>>>>>
>>>>> Suggestions welcome!
>>>>>
>>>>> -Mike
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>
>>
>> --
>> Adrien
>
>
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to