Re: spellcheck with misspelled words in index

2009-07-16 Thread Peter Wolanin
I think you can just tell the spellchecker to only supply more
popular suggestions, which would naturally omit these rare
misspellings:

  str name=spellcheck.onlyMorePopulartrue/str

-Peter

On Wed, Jul 15, 2009 at 7:30 PM, Jay Hilljayallenh...@gmail.com wrote:
 We had the same thing to deal with recently, and a great solution was posted
 to the list. Create a stopwords filter on the field your using for your
 spell checking, and then populate a custom stopwords file with known
 misspelled words:

    fieldType name=textSpell class=solr.TextField
 positionIncrementGap=100 
      analyzer
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=misspelled_words.txt/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

 Your spell field would look like this:
   field name=spell type=textSpell indexed=true stored=true
 multiValued=true/

 Then add words like cusine to messpelled_words.txt

 -Jay


 On Tue, Jul 14, 2009 at 11:40 PM, Chris Williams cswilli...@gmail.comwrote:

 Hi,
 I'm having some trouble getting the correct results from the
 spellcheck component.  I'd like to use it to suggest correct product
 titles on our site, however some of our products have misspellings in
 them outside of our control.  For example, there's 2 products with the
 misspelled word cusine (and 25k with the correct spelling
 cuisine).  So if someone searches for the word cusine on our site,
 I would like to show the 2 misspelled products, and a suggestion with
 Did you mean cuisine?.

 However, I can't seem to ever get any spelling suggestions when I
 search by the word cusine, and correctlySpelled is always true.
 Misspelled words that don't appear in the index work fine.

 I noticed that setting onlyMorePopular to true will return suggestions
 for the misspelled word, but I've found that it doesn't work great for
 other words and produces suggestions too often for correctly spelled
 words.

 I incorrectly had thought that by setting thresholdTokenFrequency
 higher on my spelling dictionary that these misspellings would not
 appear in my spelling index and thus I would get suggestions for them,
 but as I see now, the spellcheck doesn't quite work like that.

 Is there any way to somehow get spelling suggestions to work for these
 misspellings in my index if they have a low frequency?

 Thanks in advance,
 Chris





-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


spellcheck with misspelled words in index

2009-07-15 Thread Chris Williams
Hi,
I'm having some trouble getting the correct results from the
spellcheck component.  I'd like to use it to suggest correct product
titles on our site, however some of our products have misspellings in
them outside of our control.  For example, there's 2 products with the
misspelled word cusine (and 25k with the correct spelling
cuisine).  So if someone searches for the word cusine on our site,
I would like to show the 2 misspelled products, and a suggestion with
Did you mean cuisine?.

However, I can't seem to ever get any spelling suggestions when I
search by the word cusine, and correctlySpelled is always true.
Misspelled words that don't appear in the index work fine.

I noticed that setting onlyMorePopular to true will return suggestions
for the misspelled word, but I've found that it doesn't work great for
other words and produces suggestions too often for correctly spelled
words.

I incorrectly had thought that by setting thresholdTokenFrequency
higher on my spelling dictionary that these misspellings would not
appear in my spelling index and thus I would get suggestions for them,
but as I see now, the spellcheck doesn't quite work like that.

Is there any way to somehow get spelling suggestions to work for these
misspellings in my index if they have a low frequency?

Thanks in advance,
Chris


Re: spellcheck with misspelled words in index

2009-07-15 Thread Jay Hill
We had the same thing to deal with recently, and a great solution was posted
to the list. Create a stopwords filter on the field your using for your
spell checking, and then populate a custom stopwords file with known
misspelled words:

fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=misspelled_words.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

Your spell field would look like this:
   field name=spell type=textSpell indexed=true stored=true
multiValued=true/

Then add words like cusine to messpelled_words.txt

-Jay


On Tue, Jul 14, 2009 at 11:40 PM, Chris Williams cswilli...@gmail.comwrote:

 Hi,
 I'm having some trouble getting the correct results from the
 spellcheck component.  I'd like to use it to suggest correct product
 titles on our site, however some of our products have misspellings in
 them outside of our control.  For example, there's 2 products with the
 misspelled word cusine (and 25k with the correct spelling
 cuisine).  So if someone searches for the word cusine on our site,
 I would like to show the 2 misspelled products, and a suggestion with
 Did you mean cuisine?.

 However, I can't seem to ever get any spelling suggestions when I
 search by the word cusine, and correctlySpelled is always true.
 Misspelled words that don't appear in the index work fine.

 I noticed that setting onlyMorePopular to true will return suggestions
 for the misspelled word, but I've found that it doesn't work great for
 other words and produces suggestions too often for correctly spelled
 words.

 I incorrectly had thought that by setting thresholdTokenFrequency
 higher on my spelling dictionary that these misspellings would not
 appear in my spelling index and thus I would get suggestions for them,
 but as I see now, the spellcheck doesn't quite work like that.

 Is there any way to somehow get spelling suggestions to work for these
 misspellings in my index if they have a low frequency?

 Thanks in advance,
 Chris