Hello and thanks for your answer.

I do not find the ISOLatin1AccentFilter class in my lucene jar, but I find one 
on google attach to this mail, could you tell me if it is the good one?

I do not see anything in this class which can help me. This program will 
replace some accent characters but my problem is:

if I try to index a text file encoded in Western 1252 for exemple with the 
Turkish text "düzenlediğimiz kampanyamıza" the lucene index will contain re 
encoded data with �k�� ....

Thanks & regards

A.

-----Message d'origine-----
De : John Haxby [mailto:[EMAIL PROTECTED] 
Envoyé : jeudi 26 janvier 2006 03:01
À : java-user@lucene.apache.org
Objet : Re: encoding

arnaudbuffet wrote:

>For text files, data could be in different languages so different
>encoding. If data are in Turkish for exemple, all special characters and
>accents are not recognized in my lucene index. Is there a way to resolve
>problem? How do I work with the encoding ?
>  
>
I've been looking at a similar problem recently. There's 
org.apache.lucene.analysis.ISOLatin1AccentFilter on the svn trunk which 
may be quite close to what you want. I have a perl script here that I 
used to generate downgrading table for a C program. I can let you have 
the perl script as is, but if there's enough interest(*) I'll use it to 
generate, say, CompoundAsciiFilter since it converts compound characters 
like á, æ, ffi (ffi-ligature, in case it doesn't display) to a, ae and 
ffi. It's actually built from 
http://www.unicode.org/Public/4.1.0/ucd/UnicodeData.txt so it winds up 
having nearly 1200 entries. An earlier version converted all compound 
characters to their constient parts, but this version just converts 
characters that are made up entirely of ASCII and modifiers.

jch

(*) Any interest, actually. Might be enough for me to be interested.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to