[ https://issues.apache.org/jira/browse/SOLR-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jack Krupansky updated SOLR-4277: --------------------------------- Component/s: (was: SearchComponents - other) spellchecker > Spellchecker sometimes falsely reports a spelling error and correction > ---------------------------------------------------------------------- > > Key: SOLR-4277 > URL: https://issues.apache.org/jira/browse/SOLR-4277 > Project: Solr > Issue Type: Bug > Components: spellchecker > Affects Versions: 4.0 > Reporter: Jack Krupansky > > In some cases, the Solr spell checker improperly reports query terms as being > misspelled. > Using the Solr example for 4.0, I added these mini documents: > {code} > curl http://localhost:8983/solr/update?commit=true -H > 'Content-type:application/csv' -d ' > id,name > spel-1,aardvark abacus ball bill cat cello > spel-2,abate accord band bell cattle check > spel-3,adorn border clean clock' > {code} > I then issued this request: > {code} > curl "http://localhost:8983/solr/spell/?q=check&indent=true" > {code} > The spell checker falsely concluded that "check" was misspelled and > improperly corrected it to "clock": > {code} > <lst name="spellcheck"> > <lst name="suggestions"> > <lst name="check"> > <int name="numFound">1</int> > <int name="startOffset">0</int> > <int name="endOffset">5</int> > <int name="origFreq">1</int> > <arr name="suggestion"> > <lst> > <str name="word">clock</str> > <int name="freq">1</int> > </lst> > </arr> > </lst> > <bool name="correctlySpelled">false</bool> > <lst name="collation"> > <str name="collationQuery">clock</str> > <int name="hits">1</int> > <lst name="misspellingsAndCorrections"> > <str name="check">clock</str> > </lst> > </lst> > </lst> > </lst> > {code} > And if I query for "clock", it gets corrected to "check"! > {code} > curl "http://localhost:8983/solr/spell/?q=clock&indent=true" > {code} > {code} > <lst name="suggestions"> > <lst name="clock"> > <int name="numFound">1</int> > <int name="startOffset">0</int> > <int name="endOffset">5</int> > <int name="origFreq">1</int> > <arr name="suggestion"> > <lst> > <str name="word">check</str> > <int name="freq">1</int> > </lst> > </arr> > </lst> > <bool name="correctlySpelled">false</bool> > <lst name="collation"> > <str name="collationQuery">check</str> > <int name="hits">1</int> > <lst name="misspellingsAndCorrections"> > <str name="clock">check</str> > </lst> > </lst> > </lst> > {code} > Note: This appears to be only because "clock" is so close to "check". With > other terms I don't see the problem: > {code} > curl "http://localhost:8983/solr/spell/?q=cattle+abate+check&indent=true" > {code} > {code} > <lst name="suggestions"> > <lst name="check"> > <int name="numFound">1</int> > <int name="startOffset">13</int> > <int name="endOffset">18</int> > <int name="origFreq">1</int> > <arr name="suggestion"> > <lst> > <str name="word">clock</str> > <int name="freq">1</int> > </lst> > </arr> > </lst> > <bool name="correctlySpelled">false</bool> > <lst name="collation"> > <str name="collationQuery">cattle abate clock</str> > <int name="hits">2</int> > <lst name="misspellingsAndCorrections"> > <str name="cattle">cattle</str> > <str name="abate">abate</str> > <str name="check">clock</str> > </lst> > </lst> > </lst> > {code} > Although, it inappropriately lists "cattle" and "abate" in the "misspellings" > section even though no suggestions were offered. > Finally, I can workaround this issue by removing the following line from > solrconfig.xml: > {code} > <str name="spellcheck.alternativeTermCount">5</str> > {code} > Which responds to the previous request with: > {code} > <lst name="suggestions"> > <bool name="correctlySpelled">false</bool> > </lst> > {code} > Which makes the original problem go away. Although, it does beg the question > as to why my 100% correct query is still tagged as "correctlySpelled" = > "false", but that's a separate Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org