SloppyPhraseScorer returns non-deterministic results for queries with many
repeats
----------------------------------------------------------------------------------
Key: LUCENE-3412
URL: https://issues.apache.org/jira/browse/LUCENE-3412
Project: Lucene - Java
Issue Type: Bug
Components: core/search
Affects Versions: 3.3, 3.2, 3.1, 4.0
Reporter: Michael Ryan
Proximity queries with many repeats (four or more, based on my testing) return
non-deterministic results. I run the same query multiple times with the same
data set and get different results.
So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0
trunk.
Steps to reproduce (using the Solr example):
1) In solrconfig.xml, set queryResultCache size to 0.
2) Add some documents with text "dog dog dog" and "dog dog dog dog".
http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
3) Do a "dog dog dog dog"~1 query.
http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
4) Repeat step 3 many times.
Expected results: The document with id 2 should be returned.
Actual results: The document with id 2 is always returned. The document with id
1 is sometimes returned.
Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog
dog dog"~100, etc show the same behavior.
So far I've traced it down to the "repeats" array in
SloppyPhraseScorer.initPhrasePositions() - depending on the order of the
elements in this array, the document may or may not match. I think the HashSet
may be to blame, but I'm not sure - that at least seems to be where the
non-determinism is coming from.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]