[jira] [Commented] (LUCENE-6585) Make ConjunctionDISI flatten sub ConjunctionDISI instances

Robert Muir (JIRA) Fri, 19 Jun 2015 09:58:50 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593634#comment-14593634
 ]


Robert Muir commented on LUCENE-6585:
-------------------------------------

My primary concern is if the flattening optimization is safe.

The test is great for testing this DISI itself, but some of our scorers are 
complicated and have things like coord() at play. 

So it would be great to see some simple nested boolean cases added to 
TestBooleanCoord, so we know the correct scores are still produced when DISIs 
are flattened. Maybe we should also add "conjunction-of-phrases" tests to 
*SearchEquivalence to know that phrase/sloppyphrase/spans/etc are ok with the 
change too.

> Make ConjunctionDISI flatten sub ConjunctionDISI instances
> ----------------------------------------------------------
>
>                 Key: LUCENE-6585
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6585
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6585.patch
>
>
> Today ConjunctionDISI wraps some sub (two-phase) iterators. I would like to 
> improve it by flattening sub iterators when they implement ConjunctionDISI. 
> In practice, this would make "+A +(+B +C)" be executed more like "+A +B +C" 
> (only in terms of matching, scoring would not change).
> My motivation for this is that if we don't flatten and are unlucky, we can 
> sometimes hit some worst cases. For instance consider the 3 following 
> postings lists (sorted by increasing cost):
> A: 1, 1001, 2001, 3001, ...
> C: 0, 2, 4, 6, 8, 10, 12, 14, ...
> B: 1, 3, 5, 7, 9, 11, 13, 15, ...
> If we run "+A +B +C", then everything works fine, we use A as a lead, and 
> advance B 1000 by 1000 to find the next match (if any).
> However if we run "+A +(+B +C)", then we would iterate B and C 2 by 2 over 
> the entire doc ID space when trying to find the first match which occurs on 
> or after A:1.
> This is an extreme example which is unlikely to happen in practice, but 
> flattening would also help a bit on some more common cases. For instance 
> imagine that A, B and C have respective costs of 100, 10 and 1000. If you 
> search for "+A +(+B +C)", then we will use the most costly iterator (C) to 
> confirm matches of B (the least costly iterator, used as a lead) while it 
> would have been more efficient to confirm matches of B with A first, since A 
> is less costly than C.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6585) Make ConjunctionDISI flatten sub ConjunctionDISI instances

Reply via email to