Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw
Hi,

I have been trying to get spellcheck to work in the Chinese language. So far
I have not had any luck. Can someone shed some light here as a general guide
line in terms of what need to happen?

I am using the CJKAnalyzer in the text field type and searching works fine,
but spelling does not work. Here are the things I have tried:

1. Put CJKAnalyzer in the textSpell field type.
2. Set the characterEncoding param to utf-8 in the spellcheck search
component.
3. Using Luke, I can see the Chinese characters in the spell field in the
main index.
4. After building the spelling index, I don't see Chinese characters in the
spellchecker index, only terms in English.
5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck
either.

Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Otis Gospodnetic
Hi,

Does spellchecking in Chinese actually make sense?  I once asked a native 
Chinese speaker about that and the person told me it didn't really make sense.
Anyhow, with n-grams, I don't think this could technically work even if it made 
sense for Chinese, could it?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: alexw aw...@crossview.com
 To: solr-user@lucene.apache.org
 Sent: Tue, April 12, 2011 3:07:48 PM
 Subject: Spellchecking in the Chinese Lanugage
 
 Hi,
 
 I have been trying to get spellcheck to work in the Chinese language.  So far
 I have not had any luck. Can someone shed some light here as a general  guide
 line in terms of what need to happen?
 
 I am using the CJKAnalyzer  in the text field type and searching works fine,
 but spelling does not work.  Here are the things I have tried:
 
 1. Put CJKAnalyzer in the textSpell  field type.
 2. Set the characterEncoding param to utf-8 in the spellcheck  search
 component.
 3. Using Luke, I can see the Chinese characters in the  spell field in the
 main index.
 4. After building the spelling index, I  don't see Chinese characters in the
 spellchecker index, only terms in  English.
 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no  luck
 either.
 
 Thanks!
 
 
 --
 View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html

 Sent  from the Solr - User mailing list archive at Nabble.com.
 


Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Luke Lu
It doesn't make sense to spell check individual character sized words,
but makes a lot of sense for phrases. Due to pervasive use of pinyin
IM, it's very easy to write phrases that are totally wrong in
semantics and but sounds correct. n-gram should work if it doesn't
mangle the characters.

On Tue, Apr 12, 2011 at 12:47 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi,

 Does spellchecking in Chinese actually make sense?  I once asked a native
 Chinese speaker about that and the person told me it didn't really make sense.
 Anyhow, with n-grams, I don't think this could technically work even if it 
 made
 sense for Chinese, could it?

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
 From: alexw aw...@crossview.com
 To: solr-user@lucene.apache.org
 Sent: Tue, April 12, 2011 3:07:48 PM
 Subject: Spellchecking in the Chinese Lanugage

 Hi,

 I have been trying to get spellcheck to work in the Chinese language.  So far
 I have not had any luck. Can someone shed some light here as a general  guide
 line in terms of what need to happen?

 I am using the CJKAnalyzer  in the text field type and searching works fine,
 but spelling does not work.  Here are the things I have tried:

 1. Put CJKAnalyzer in the textSpell  field type.
 2. Set the characterEncoding param to utf-8 in the spellcheck  search
 component.
 3. Using Luke, I can see the Chinese characters in the  spell field in the
 main index.
 4. After building the spelling index, I  don't see Chinese characters in the
 spellchecker index, only terms in  English.
 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no  luck
 either.

 Thanks!


 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html

 Sent  from the Solr - User mailing list archive at Nabble.com.




Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw
Thanks Otis and Luke.

Yes it does make sense to spellcheck phrases in Chinese. Looks like the
default Solr spellCheck component is already doing some kind of NGram-ing.
When examining the spellCheck index, I did see gram1, gram2, gram3, gram4...
The problem is no Chinese terms were indexed into the spellChecker index,
only English terms.

Regards,

Alex

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2813149.html
Sent from the Solr - User mailing list archive at Nabble.com.