[ https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314505#comment-15314505 ]
Hoss Man commented on LUCENE-7132: ---------------------------------- {noformat} 10:21 <@hoss:#lucene-dev> mikemccand: ping? ... 10:44 <@mikemccand:#lucene-dev> hoss: here 10:45 <@hoss:#lucene-dev> oh yeah ... just writting up a jira response ... i think you generated your patch just using "git diff" so it missed the "new" test files? 10:45 <@hoss:#lucene-dev> i've got a unified patch i'm about to post, so we have both the fix andthe tests that reliably demonstrate the problem 10:45 <@mikemccand:#lucene-dev> oh yeah sorry i did! 10:45 <@mikemccand:#lucene-dev> ++ thanks 10:46 <@hoss:#lucene-dev> no worries ... what i really wanted to ping you about was writting a better test 10:46 <@hoss:#lucene-dev> right now that test shouldn't be committed as is -- data fro ma user i'm certain we don't have rights to 10:46 <@mikemccand:#lucene-dev> ahh yeah that should be fun :) 10:46 <@mikemccand:#lucene-dev> yeah i saw the comment about that ... 10:46 <@hoss:#lucene-dev> i'm wondering if you could give me some pointers on the hueristics that lead to this optimizatio, so i can try to write a tighter test case that hits it? 10:46 <@hoss:#lucene-dev> (to prevent a regression) 10:46 <@mikemccand:#lucene-dev> ok lemme try 10:47 <@mikemccand:#lucene-dev> right, we need a test 10:47 <@mikemccand:#lucene-dev> you need a 2 clause BQ 10:47 <@mikemccand:#lucene-dev> where a document with docID 0 -- 2047 matches only one term 10:47 <@mikemccand:#lucene-dev> and then another docID > 2047 matches two terms 10:47 <@mikemccand:#lucene-dev> in that case the 2nd document should get the wrong (disagrees w/ explain) score i think 10:48 <@hoss:#lucene-dev> docID 0 .. as in literally docID 0 in teh index? ... it was that magical? 10:48 <@mikemccand:#lucene-dev> yes! 10:48 <@mikemccand:#lucene-dev> BS1 scores in windows of 2048 documents 10:48 <@hoss:#lucene-dev> holy fuck that's an edge case 10:48 <@mikemccand:#lucene-dev> the bug is that if 1 window uses an "optimization" because only 1 clauses matches ... 10:48 <@mikemccand:#lucene-dev> then that optimization messes up the state 10:48 <@mikemccand:#lucene-dev> and subsequent windows get the wrong scores 10:48 <@mikemccand:#lucene-dev> yeah serious edge case :) 10:48 <@mikemccand:#lucene-dev> i'm glad you pushed on this :) 10:48 <@mikemccand:#lucene-dev> thanks 10:49 <@hoss:#lucene-dev> oh ... so like, doc ID 2049 matching, but no other doc matches until after 2048 * 2 would also hit this bug? 10:49 <@hoss:#lucene-dev> actually ... it sounds like any doc matching as long as it's the only doc in it's window, and then another doc in a alater window? 10:49 <@mikemccand:#lucene-dev> right! 10:50 <@mikemccand:#lucene-dev> (where that later window's doc had more than 1 clause matching) 10:50 <@hoss:#lucene-dev> ok .. so really, we just need more tests with lots of docs, so that we force matches across the windows ... because 2048 is hardcoded, not somethign we can randomize to small values via LTC 10:50 <@mikemccand:#lucene-dev> yeah .. 10:51 <@hoss:#lucene-dev> hmmm... why did forceMerge change things then? 10:51 <@hoss:#lucene-dev> with no deletions wy did the windows change? 10:51 <@mikemccand:#lucene-dev> hmmm i'm not sure? 10:51 <@mikemccand:#lucene-dev> the forceMerge is crazy: the index already had one segment 10:51 <@mikemccand:#lucene-dev> at least for your first seed 10:52 <@mikemccand:#lucene-dev> yet forceMerge DID run, because CFS wanted to change 10:52 <@mikemccand:#lucene-dev> but this should not have altered the docID order 10:52 <@mikemccand:#lucene-dev> so yeah i can't explain why forceMerge "matters" here 10:53 <@hoss:#lucene-dev> and yet - if it wasn't for the forceMerge, the only indication of the bug would be that the Explanations don't match -- unless we hardcoded scores in a test, which is hard for randomized data 10:53 <@hoss:#lucene-dev> nee impossible 10:54 <@mikemccand:#lucene-dev> yes 10:57 <@hoss:#lucene-dev> weird.... maybe there's another factor to the optimization we need to consider? ... i'll let you ponder while i try to figure out a test bsaed on what we know :) 10:57 <@mikemccand:#lucene-dev> LOL ok ... 11:04 <@hoss:#lucene-dev> mikemccand: BTW, you mind if i transcribe this conv to jira so i don't lose it? 11:04 <@mikemccand:#lucene-dev> ++ great {noformat} > BooleanQuery scores can be diff for same docs+sim when using coord (disagree > with Explanation which doesn't change) > ------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-7132 > URL: https://issues.apache.org/jira/browse/LUCENE-7132 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 5.5 > Reporter: Ahmet Arslan > Assignee: Steve Rowe > Attachments: LUCENE-7132.patch, LUCENE-7132.patch, LUCENE-7132.patch, > LUCENE-7132.patch, LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, > debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org