: http://www.blardone.org/2008/10/12/lucene-query-accented-character/

thta post appears to be specificly about a PHP function to convert UTF-8 
characters to their HTML equivilents ... which doesn'trelaly seem relevant 
to the posters question ...

: > I'm use FrenchAnalyzer for index 
        ...
: > in my document i have the word "lumiere" and "lumière"
: > 
: > when i search lumière only document match lumière but "lumiere" is not
: > return
: > 
: > and if search "lumiere" the result is lumiere, lumieres ,lumiére,lumiéres
: > but not lumière

1) you should take a look at the Luke tool to help make sense of exactly 
what is getting indexed and how your query is getting parsed -- or just 
write a simple java program to look at the tokens produced by your 
analyzer.

2) the FrenchAnalyzer doesn't by default do any accent normalization (so 
i'm not sure why your search for lumiere is even matching lumiére ... but 
you may want to make your own Analyzer wrapping the FrenchAnalyzer that 
also uses the ISOLatin1AccentFilter to deal with this.

-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to