subject:"Spellchecking in the Chinese Lanugage"

Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw

Hi,

I have been trying to get spellcheck to work in the Chinese language. So far
I have not had any luck. Can someone shed some light here as a general guide
line in terms of what need to happen?

I am using the CJKAnalyzer in the text field type and searching works fine,
but spelling does not work. Here are the things I have tried:

1. Put CJKAnalyzer in the textSpell field type.
2. Set the characterEncoding param to utf-8 in the spellcheck search
component.
3. Using Luke, I can see the Chinese characters in the spell field in the
main index.
4. After building the spelling index, I don't see Chinese characters in the
spellchecker index, only terms in English.
5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck
either.

Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Otis Gospodnetic

Hi,

Does spellchecking in Chinese actually make sense? I once asked a native
Chinese speaker about that and the person told me it didn't really make sense.
Anyhow, with n-grams, I don't think this could technically work even if it made
sense for Chinese, could it?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

- Original Message
From: alexw aw...@crossview.com
To: solr-user@lucene.apache.org
Sent: Tue, April 12, 2011 3:07:48 PM
Subject: Spellchecking in the Chinese Lanugage

Hi,

I have been trying to get spellcheck to work in the Chinese language. So far
I have not had any luck. Can someone shed some light here as a general guide
line in terms of what need to happen?

I am using the CJKAnalyzer in the text field type and searching works fine,
but spelling does not work. Here are the things I have tried:

1. Put CJKAnalyzer in the textSpell field type.
2. Set the characterEncoding param to utf-8 in the spellcheck search
component.
3. Using Luke, I can see the Chinese characters in the spell field in the
main index.
4. After building the spelling index, I don't see Chinese characters in the
spellchecker index, only terms in English.
5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck
either.

Thanks!

--
View this message in context:
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Luke Lu

It doesn't make sense to spell check individual character sized words,
but makes a lot of sense for phrases. Due to pervasive use of pinyin
IM, it's very easy to write phrases that are totally wrong in
semantics and but sounds correct. n-gram should work if it doesn't
mangle the characters.

On Tue, Apr 12, 2011 at 12:47 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hi,

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

- Original Message
From: alexw aw...@crossview.com
To: solr-user@lucene.apache.org
Sent: Tue, April 12, 2011 3:07:48 PM
Subject: Spellchecking in the Chinese Lanugage

Hi,

I have been trying to get spellcheck to work in the Chinese language. So far
I have not had any luck. Can someone shed some light here as a general guide
line in terms of what need to happen?

I am using the CJKAnalyzer in the text field type and searching works fine,
but spelling does not work. Here are the things I have tried:

Thanks!

--
View this message in context:
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw

Thanks Otis and Luke.

Yes it does make sense to spellcheck phrases in Chinese. Looks like the
default Solr spellCheck component is already doing some kind of NGram-ing.
When examining the spellCheck index, I did see gram1, gram2, gram3, gram4...
The problem is no Chinese terms were indexed into the spellChecker index,
only English terms.

Regards,

Alex

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2813149.html
Sent from the Solr - User mailing list archive at Nabble.com.

Spellchecking in the Chinese Lanugage

Re: Spellchecking in the Chinese Lanugage

Re: Spellchecking in the Chinese Lanugage

Re: Spellchecking in the Chinese Lanugage

4 matches

Site Navigation

Mail list logo

Footer information