I have used this analyzer with Japanese and it works fine.  In fact, I'm
currently doing English, several western European languages, traditional
and simplified Chinese and Japanese.  I throw them all in the same index
and have had no problem other than my users wanted the search limited by
language.  I solved that problem by simply adding a keyword field to the
Document which has the 2-letter language code.  I then automatically add
the term indicating the language as an additional constraint when the
user specifies the search.  

You do need to be sure that the Shift-JIS gets converted to unicode
before you put it in the Document (and pass it to the analyzer).
Internally, I believe lucene wants everything in unicode (as any good
java program would). Originally, I had problems with Asian languages and
eventually determined my xml parser wasn't translating my Shift-JIS,
Big5, etc. to unicode.  Once I fixed that, life was good.

-----Original Message-----
From: Che Dong [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 16, 2004 8:31 AM
To: Lucene Users List
Subject: Re: CJK Analyzer indexing japanese word document

some Korean friends tell me they use it successfully for Korean. So I
think its also work for Japanese. mostly the problem is locale settings

Please check weblucene project for xml indexing samples:
http://sourceforge.net/projects/weblucene/ 

Che Dong
----- Original Message -----
From: "Chandan Tamrakar" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, March 16, 2004 4:31 PM
Subject: CJK Analyzer indexing japanese word document


> 
> I am using a CJKAnalyzer from apache sandbox , I have set the java
> file.encoding setting to SJIS
> and  i am able to index and search the japanese html page . I can see
the
> index dumps as i expected , However when i index a word document
containing
> japanese characters it is not indexing as expected . Do I need to
change
> anything with CJKTokenizer and CJKAnalyzer classes?
> I have been able to index a word document with StandardAnalyzers.
> 
> thanks in advace
> chandan
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to