Re: Question about special characters

Chris Hostetter Thu, 25 May 2006 11:25:45 -0700

I think I'm missing something here.  the whole point of the
ISOLatin1AccentFilter is to replace accented characters with their
unaccented equivalent -- it sounds like that's working just fine, If you
want teh words in teh term vector to contain the accents, why don't you
stop using that filter?


if the problem is that you need to be able to match on both the accented
form and the non accented form, perhaps you should have two fields, or
modify the ISOLatin1AccentFilter so it puts both versions of the token in
the TokenStream with the same position?


: > The problem is special characters like à, ä , ç or ñ latin characters in
: > the text.
: > Now I use iso latin filter, but the problem is when I want to obtain most
: > term used. These term are stored without ` ´ ^ or another "character
: > attribute".
: > For example "plàntïuç" (it isn't a real word) is stored like the term
: > "plantiuc".
: > How can I do to have in term vector the word "plàntïuç".
: >
: > thks for all replies.
: > PD: excuse if this question is solved somewhere, but I don't saw it.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about special characters

Reply via email to