Hello and thanks for your answer. I do not find the ISOLatin1AccentFilter class in my lucene jar, but I find one on google attach to this mail, could you tell me if it is the good one?
I do not see anything in this class which can help me. This program will replace some accent characters but my problem is: if I try to index a text file encoded in Western 1252 for exemple with the Turkish text "düzenlediğimiz kampanyamıza" the lucene index will contain re encoded data with �k�� .... Thanks & regards A. -----Message d'origine----- De : John Haxby [mailto:[EMAIL PROTECTED] Envoyé : jeudi 26 janvier 2006 03:01 À : java-user@lucene.apache.org Objet : Re: encoding arnaudbuffet wrote: >For text files, data could be in different languages so different >encoding. If data are in Turkish for exemple, all special characters and >accents are not recognized in my lucene index. Is there a way to resolve >problem? How do I work with the encoding ? > > I've been looking at a similar problem recently. There's org.apache.lucene.analysis.ISOLatin1AccentFilter on the svn trunk which may be quite close to what you want. I have a perl script here that I used to generate downgrading table for a C program. I can let you have the perl script as is, but if there's enough interest(*) I'll use it to generate, say, CompoundAsciiFilter since it converts compound characters like á, æ, ffi (ffi-ligature, in case it doesn't display) to a, ae and ffi. It's actually built from http://www.unicode.org/Public/4.1.0/ucd/UnicodeData.txt so it winds up having nearly 1200 entries. An earlier version converted all compound characters to their constient parts, but this version just converts characters that are made up entirely of ASCII and modifiers. jch (*) Any interest, actually. Might be enough for me to be interested. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]