[ 
https://issues.apache.org/jira/browse/SOLR-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Krupansky updated SOLR-4277:
---------------------------------

    Component/s:     (was: SearchComponents - other)
                 spellchecker
    
> Spellchecker sometimes falsely reports a spelling error and correction
> ----------------------------------------------------------------------
>
>                 Key: SOLR-4277
>                 URL: https://issues.apache.org/jira/browse/SOLR-4277
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 4.0
>            Reporter: Jack Krupansky
>
> In some cases, the Solr spell checker improperly reports query terms as being 
> misspelled.
> Using the Solr example for 4.0, I added these mini documents:
> {code}
> curl http://localhost:8983/solr/update?commit=true -H 
> 'Content-type:application/csv' -d '
> id,name
> spel-1,aardvark abacus ball bill cat cello
> spel-2,abate accord band bell cattle check
> spel-3,adorn border clean clock'
> {code}
> I then issued this request:
> {code}
> curl "http://localhost:8983/solr/spell/?q=check&indent=true";
> {code}
> The spell checker falsely concluded that "check" was misspelled and 
> improperly corrected it to "clock":
> {code}
> <lst name="spellcheck">
>   <lst name="suggestions">
>     <lst name="check">
>       <int name="numFound">1</int>
>       <int name="startOffset">0</int>
>       <int name="endOffset">5</int>
>       <int name="origFreq">1</int>
>       <arr name="suggestion">
>         <lst>
>           <str name="word">clock</str>
>           <int name="freq">1</int>
>         </lst>
>       </arr>
>     </lst>
>     <bool name="correctlySpelled">false</bool>
>     <lst name="collation">
>       <str name="collationQuery">clock</str>
>       <int name="hits">1</int>
>       <lst name="misspellingsAndCorrections">
>         <str name="check">clock</str>
>       </lst>
>     </lst>
>   </lst>
> </lst>
> {code}
> And if I query for "clock", it gets corrected to "check"!
> {code}
> curl "http://localhost:8983/solr/spell/?q=clock&indent=true";
> {code}
> {code}
>   <lst name="suggestions">
>     <lst name="clock">
>       <int name="numFound">1</int>
>       <int name="startOffset">0</int>
>       <int name="endOffset">5</int>
>       <int name="origFreq">1</int>
>       <arr name="suggestion">
>         <lst>
>           <str name="word">check</str>
>           <int name="freq">1</int>
>         </lst>
>       </arr>
>     </lst>
>     <bool name="correctlySpelled">false</bool>
>     <lst name="collation">
>       <str name="collationQuery">check</str>
>       <int name="hits">1</int>
>       <lst name="misspellingsAndCorrections">
>         <str name="clock">check</str>
>       </lst>
>     </lst>
>   </lst>
> {code}
> Note: This appears to be only because "clock" is so close to "check". With 
> other terms I don't see the problem:
> {code}
> curl "http://localhost:8983/solr/spell/?q=cattle+abate+check&indent=true";
> {code}
> {code}
>   <lst name="suggestions">
>     <lst name="check">
>       <int name="numFound">1</int>
>       <int name="startOffset">13</int>
>       <int name="endOffset">18</int>
>       <int name="origFreq">1</int>
>       <arr name="suggestion">
>         <lst>
>           <str name="word">clock</str>
>           <int name="freq">1</int>
>         </lst>
>       </arr>
>     </lst>
>     <bool name="correctlySpelled">false</bool>
>     <lst name="collation">
>       <str name="collationQuery">cattle abate clock</str>
>       <int name="hits">2</int>
>       <lst name="misspellingsAndCorrections">
>         <str name="cattle">cattle</str>
>         <str name="abate">abate</str>
>         <str name="check">clock</str>
>       </lst>
>     </lst>
>   </lst>
> {code}
> Although, it inappropriately lists "cattle" and "abate" in the "misspellings" 
> section even though no suggestions were offered.
> Finally, I can workaround this issue by removing the following line from 
> solrconfig.xml:
> {code}
>       <str name="spellcheck.alternativeTermCount">5</str>
> {code}
> Which responds to the previous request with:
> {code}
>   <lst name="suggestions">
>     <bool name="correctlySpelled">false</bool>
>   </lst>
> {code}
> Which makes the original problem go away. Although, it does beg the question 
> as to why my 100% correct query is still tagged as "correctlySpelled" = 
> "false", but that's a separate Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to