[jira] [Commented] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations

James Dyer (JIRA) Mon, 19 Sep 2011 08:00:34 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107880#comment-13107880
 ]


James Dyer commented on SOLR-2585:
----------------------------------

Robert,

I was thinking of maybe eventually submitting some refactorings as a follow-up 
to this issue.  But if you want, we could do some things first then come back 
to this.  Here were my initial thoughts, none of which are very well-though-out 
at this point...

1. Maybe move "FileBasedSpellChecker" to Lucene for consistency (each spell 
checker in Solr refers to a Spell checker in Lucene).  Also, this makes it 
available to Lucene users.

2. Perhaps SpellingOptions could somehow be deleted.

3. If the Lucene Spell Checkers all inherited a common interface and/or 
Abstract Class, all of the *SolrSpellChecker classes could probably be reduced 
to 1 class (or 1 parent class with just a few overrides here and there...) (I 
know you feel we're not ready for this, but we could annotate the Lucene parent 
(class and/or interface) like this for now "@lucene.internal - external users 
should use the appropriate subclass directly / @lucene.experimental - this 
[class|interface] may change or be removed in a future version").  

4. Clarify the code in SpellCheckComponent.  Wasn't thinking about this now, 
but I do see where you're coming from, especially with the distributed code in 
"finishStage".  I think there is some code duplication between "finishStage" 
(distributed) and "process" (non-dist / 1st stage dist) that can maybe be 
eliminated.  Probably some good code comments would help de-mystify this too.  
Maybe rename a method or two for additional clarity.

5. Now that you point out that "instanceof" check in "finishStage", we probably 
should write a test case with DirectSpellChecker in a distributed environment.  
Possibly a revamped (set of) *SolrSpellChecker class(es) could eliminate the 
need for such checks?

6. I think SpellingParams should be for parameters the user can put in their 
query.  I'm not sure you can do this with "accuracy".  This one should probably 
be somewhere else as this is a SearchComponent config param, not a request 
param.  Maybe there are others like this.



> Context-Sensitive Spelling Suggestions & Collations
> ---------------------------------------------------
>
>                 Key: SOLR-2585
>                 URL: https://issues.apache.org/jira/browse/SOLR-2585
>             Project: Solr
>          Issue Type: Improvement
>          Components: spellchecker
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch, 
> SOLR-2585.patch, SOLR-2585.patch
>
>
> Solr currently cannot offer what I'm calling here a "context-sensitive" 
> spelling suggestion.  That is, if a user enters one or more words that have 
> docFrequency > 0, but nevertheless are misspelled, then no suggestions are 
> offered.  Currently, Solr will always consider a word "correctly spelled" if 
> it is in the index and/or dictionary, regardless of context.  This issue & 
> patch add support for context-sensitive spelling suggestions. 
> See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical 
> use case for this functionality.  This tests both using 
> IndexBasedSepllChecker and DirectSolrSpellChecker. 
> Two new Spelling Parameters were added:
>   - spellcheck.alternativeTermCount - The count of suggestions to return for 
> each query term existing in the index and/or dictionary.  Presumably, users 
> will want fewer suggestions for words with docFrequency>0.  Also setting this 
> value turns "on" context-sensitive spell suggestions. 
>   - spellcheck.maxResultsForSuggest - The maximum number of hits the request 
> can return in order to both generate spelling suggestions and set the 
> "correctlySpelled" element to "false".  For example, if this is set to 5 and 
> the user's query returns 5 or fewer results, the spellchecker will report 
> "correctlySpelled=false" and also offer suggestions (and collations if 
> requested).  Setting this greater than zero is useful for creating 
> "did-you-mean" suggestions for queries that return a low number of hits.
> I have also included a test using shards.  See additions to 
> DistributedSpellCheckComponentTest. 
> In Lucene, SpellChecker.java can already support this functionality (by 
> passing a null IndexReader and field-name).  The DirectSpellChecker, however, 
> needs a minor enhancement.  This gives the option to allow DirectSpellChecker 
> to return suggestions for all query terms regardless of frequency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations

Reply via email to