[ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096344#comment-13096344 ]
Michael Ryan commented on LUCENE-3412: -------------------------------------- Here's the debugQuery output from when it matched both docs: {noformat} <lst name="explain"><str name="2"> 1.1890696 = (MATCH) weight(text:"dog dog dog dog"~1 in 1) [DefaultSimilarity], result of: 1.1890696 = score(doc=1,freq=1.0 = phraseFreq=1.0 ), product of: 0.99999994 = queryWeight, product of: 2.3781395 = idf(), sum of: 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.42049676 = queryNorm 1.1890697 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 2.3781395 = idf(), sum of: 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5 = fieldNorm(doc=1) </str><str name="1"> 0.8407992 = (MATCH) weight(text:"dog dog dog dog"~1 in 0) [DefaultSimilarity], result of: 0.8407992 = score(doc=0,freq=0.5 = phraseFreq=0.5 ), product of: 0.99999994 = queryWeight, product of: 2.3781395 = idf(), sum of: 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.42049676 = queryNorm 0.8407993 = fieldWeight in 0, product of: 0.70710677 = tf(freq=0.5), with freq of: 0.5 = phraseFreq=0.5 2.3781395 = idf(), sum of: 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5 = fieldNorm(doc=0) </str></lst> {noformat} Sometimes when it matches both docs I'll get "no matching term" for the second one: {noformat} <lst name="explain"><str name="2"> 1.1890696 = (MATCH) weight(text:"dog dog dog dog"~1 in 1) [DefaultSimilarity], result of: 1.1890696 = score(doc=1,freq=1.0 = phraseFreq=1.0 ), product of: 0.99999994 = queryWeight, product of: 2.3781395 = idf(), sum of: 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.42049676 = queryNorm 1.1890697 = fieldWeight in 1, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 2.3781395 = idf(), sum of: 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5945349 = idf(docFreq=2, maxDocs=2) 0.5 = fieldNorm(doc=1) </str><str name="1"> 0.0 = (NON-MATCH) no matching term </str></lst> {noformat} > SloppyPhraseScorer returns non-deterministic results for queries with many > repeats > ---------------------------------------------------------------------------------- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search > Affects Versions: 3.1, 3.2, 3.3, 4.0 > Reporter: Michael Ryan > > Proximity queries with many repeats (four or more, based on my testing) > return non-deterministic results. I run the same query multiple times with > the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 > trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". > http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. > http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with > id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog > dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in > SloppyPhraseScorer.initPhrasePositions() - depending on the order of the > elements in this array, the document may or may not match. I think the > HashSet may be to blame, but I'm not sure - that at least seems to be where > the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org