Mathieu Lecarme schrieb: > Le mardi 24 juillet 2007 à 13:01 -0700, Shaw, James a écrit : >> Hi, guys, >> I found Analyzers for Japanese, Korean and Chinese, but not stemmers; >> the Snowball stemmers only include European languages. Does stemming >> not make sense for ideograph-based languages (i.e., no stemming is >> needed for Japanese, Korean and Chinese)? > No.
This not quite correct, Chinese doesn't need any stemming but Japanese is not completely ideograph-based and it could use stemming. I doubt anyone has done this, besides some commercial software for the japanese market. I don't know for Korean. >> Also for spell checking, does the default Lucene SpellChecker work for >> Japanese, Korean and Chinese? Does edit distance make sense for these >> languages? > Japanese used group of ideogram, but levenstein distance don't make > sense with few letters but I'm not a CJK expert. > > M. Edit distance only seems to work with latin character based (writen) languages. Spell checking Chinese, Japanese (and Korean?) is more or less pointless, as they are inputed using input methods, which should produce "correct" words. Best regards, Max -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel : (+49) 0711 - 45 10 17 578 Fax : (+49) 0711 - 45 10 17 573 e-mail : [EMAIL PROTECTED] Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]