[jira] [Commented] (LUCENE-7132) BooleanQuery scores can be diff for same docs+sim when using coord (disagree with Explanation which doesn't change)

Hoss Man (JIRA) Fri, 03 Jun 2016 11:08:28 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314505#comment-15314505
 ]


Hoss Man commented on LUCENE-7132:
----------------------------------


{noformat}
10:21 <@hoss:#lucene-dev> mikemccand: ping?
...
10:44 <@mikemccand:#lucene-dev> hoss: here
10:45 <@hoss:#lucene-dev> oh yeah ... just writting up a jira response ... i 
think you generated your patch just using "git diff" so it missed 
                          the "new" test files?
10:45 <@hoss:#lucene-dev> i've got a unified patch i'm about to post, so we 
have both the fix andthe tests that reliably demonstrate the 
                          problem
10:45 <@mikemccand:#lucene-dev> oh yeah sorry i did!
10:45 <@mikemccand:#lucene-dev> ++ thanks
10:46 <@hoss:#lucene-dev> no worries ... what i really wanted to ping you about 
was writting a better test
10:46 <@hoss:#lucene-dev> right now that test shouldn't be committed as is -- 
data fro ma user i'm certain we don't have rights to
10:46 <@mikemccand:#lucene-dev> ahh yeah that should be fun :)
10:46 <@mikemccand:#lucene-dev> yeah i saw the comment about that ...
10:46 <@hoss:#lucene-dev> i'm wondering if you could give me some pointers on 
the hueristics that lead to this optimizatio, so i can try to 
                          write a tighter test case that hits it?
10:46 <@hoss:#lucene-dev> (to prevent a regression)
10:46 <@mikemccand:#lucene-dev> ok lemme try
10:47 <@mikemccand:#lucene-dev> right, we need a test
10:47 <@mikemccand:#lucene-dev> you need a 2 clause BQ
10:47 <@mikemccand:#lucene-dev> where a document with docID 0 -- 2047 matches 
only one term
10:47 <@mikemccand:#lucene-dev> and then another docID > 2047 matches two terms
10:47 <@mikemccand:#lucene-dev> in that case the 2nd document should get the 
wrong (disagrees w/ explain) score i think
10:48 <@hoss:#lucene-dev> docID 0 .. as in literally docID 0 in teh index? ... 
it was that magical?
10:48 <@mikemccand:#lucene-dev> yes!
10:48 <@mikemccand:#lucene-dev> BS1 scores in windows of 2048 documents
10:48 <@hoss:#lucene-dev> holy fuck that's an edge case
10:48 <@mikemccand:#lucene-dev> the bug is that if 1 window uses an 
"optimization" because only 1 clauses matches ...
10:48 <@mikemccand:#lucene-dev> then that optimization messes up the state
10:48 <@mikemccand:#lucene-dev> and subsequent windows get the wrong scores
10:48 <@mikemccand:#lucene-dev> yeah serious edge case :)
10:48 <@mikemccand:#lucene-dev> i'm glad you pushed on this :)
10:48 <@mikemccand:#lucene-dev> thanks
10:49 <@hoss:#lucene-dev> oh ... so like, doc ID 2049 matching, but no other 
doc matches until after 2048 * 2 would also hit this bug?
10:49 <@hoss:#lucene-dev> actually ... it sounds like any doc matching as long 
as it's the only doc in it's window, and then another doc in a 
                          alater window?
10:49 <@mikemccand:#lucene-dev> right!
10:50 <@mikemccand:#lucene-dev> (where that later window's doc had more than 1 
clause matching)
10:50 <@hoss:#lucene-dev> ok .. so really, we just need more tests with lots of 
docs, so that we force matches across the windows ... because 
                          2048 is hardcoded, not somethign we can randomize to 
small values via LTC
10:50 <@mikemccand:#lucene-dev> yeah ..
10:51 <@hoss:#lucene-dev> hmmm... why did forceMerge change things then?
10:51 <@hoss:#lucene-dev> with no deletions wy did the windows change?
10:51 <@mikemccand:#lucene-dev> hmmm i'm not sure?
10:51 <@mikemccand:#lucene-dev> the forceMerge is crazy: the index already had 
one segment
10:51 <@mikemccand:#lucene-dev> at least for your first seed
10:52 <@mikemccand:#lucene-dev> yet forceMerge DID run, because CFS wanted to 
change
10:52 <@mikemccand:#lucene-dev> but this should not have altered the docID order
10:52 <@mikemccand:#lucene-dev> so yeah i can't explain why forceMerge 
"matters" here
10:53 <@hoss:#lucene-dev> and yet - if it wasn't for the forceMerge, the only 
indication of the bug would be that the Explanations don't match 
                          -- unless we hardcoded scores in a test, which is 
hard for randomized data
10:53 <@hoss:#lucene-dev> nee impossible
10:54 <@mikemccand:#lucene-dev> yes
10:57 <@hoss:#lucene-dev> weird.... maybe there's another factor to the 
optimization we need to consider? ... i'll let you ponder while i try 
                          to figure out a test bsaed on what we know :)
10:57 <@mikemccand:#lucene-dev> LOL ok
...
11:04 <@hoss:#lucene-dev> mikemccand: BTW, you mind if i transcribe this conv 
to jira so i don't lose it?
11:04 <@mikemccand:#lucene-dev> ++ great
{noformat}

> BooleanQuery scores can be diff for same docs+sim when using coord (disagree 
> with Explanation which doesn't change)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7132
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7132
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 5.5
>            Reporter: Ahmet Arslan
>            Assignee: Steve Rowe
>         Attachments: LUCENE-7132.patch, LUCENE-7132.patch, LUCENE-7132.patch, 
> LUCENE-7132.patch, LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, 
> debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7132) BooleanQuery scores can be diff for same docs+sim when using coord (disagree with Explanation which doesn't change)

Reply via email to