Re: RE : encoding

Erik Hatcher Thu, 26 Jan 2006 08:03:59 -0800


On Jan 26, 2006, at 7:26 PM, arnaudbuffet wrote:

I do not find the ISOLatin1AccentFilter class in my lucene jar, butI find one on google attach to this mail, could you tell me if itis the good one?

This used to be in contrib/analyzers but has been moved into the core(Subversion only for now):

http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/analysis/

I do not see anything in this class which can help me. This programwill replace some accent characters but my problem is:
if I try to index a text file encoded in Western 1252 for exemplewith the Turkish text "düzenlediğimiz kampanyamıza" the luceneindex will contain re encoded data with k ....

Reading encoding files is your applications responsibility. You needto be sure to read the files in using the proper encoding. Once readproperly into Java all will be well as far as Lucene indexing thecharacters.


        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: RE : encoding

Reply via email to