Thanks for your answer. No those words are not part of the stop word file (I'm using the one that comes with the Japanese analyzer in lucene-kuromoji-3.6.1.jar.
My Japanese contact told me that the first sentence means "I am Japanese" and the second one is a unit of length. Jerome From: Swapnil Patil <[email protected]> To: [email protected], Date: 01/18/2013 02:33 PM Subject: Re: Japanese analyzer Hi, I just translated these words, using google translate look like Japanese I [ Can you check if these words are in your stopword file. if these words exits in your stop word file than you will not get them in token stream. -Swapnil On Fri, Jan 18, 2013 at 6:58 PM, Jerome Lanneluc <[email protected] > wrote: > [私 日本人 Sauf indication contraire ci-dessus:/ Unless stated otherwise above: Compagnie IBM France Siège Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex RCS Nanterre 552 118 465 Forme Sociale : S.A.S. Capital Social : 653.242.306,20 � SIREN/SIRET : 552 118 465 03644 - Code NAF 6202A
