I have used this analyzer with Japanese and it works fine. In fact, I'm currently doing English, several western European languages, traditional and simplified Chinese and Japanese. I throw them all in the same index and have had no problem other than my users wanted the search limited by language. I solved that problem by simply adding a keyword field to the Document which has the 2-letter language code. I then automatically add the term indicating the language as an additional constraint when the user specifies the search.
You do need to be sure that the Shift-JIS gets converted to unicode before you put it in the Document (and pass it to the analyzer). Internally, I believe lucene wants everything in unicode (as any good java program would). Originally, I had problems with Asian languages and eventually determined my xml parser wasn't translating my Shift-JIS, Big5, etc. to unicode. Once I fixed that, life was good. -----Original Message----- From: Che Dong [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 16, 2004 8:31 AM To: Lucene Users List Subject: Re: CJK Analyzer indexing japanese word document some Korean friends tell me they use it successfully for Korean. So I think its also work for Japanese. mostly the problem is locale settings Please check weblucene project for xml indexing samples: http://sourceforge.net/projects/weblucene/ Che Dong ----- Original Message ----- From: "Chandan Tamrakar" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, March 16, 2004 4:31 PM Subject: CJK Analyzer indexing japanese word document > > I am using a CJKAnalyzer from apache sandbox , I have set the java > file.encoding setting to SJIS > and i am able to index and search the japanese html page . I can see the > index dumps as i expected , However when i index a word document containing > japanese characters it is not indexing as expected . Do I need to change > anything with CJKTokenizer and CJKAnalyzer classes? > I have been able to index a word document with StandardAnalyzers. > > thanks in advace > chandan > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]