: Thks for the reply, ut I don't know how to do this change in : SOLatin1AccentFilter. : Can you give me some advice in this action?
I've never really looked at the internals of ISOLatin1AccentFilter, but the basic idea is to subclass it with a new TokenFilter that maintains a one token "buffer" of the token stream, and every other time next is called you either return the token from the buffer (as is) or you return a token with the accents striped. sinve ISOLatin1AccentFilter has a method called removeAccents i'm guessing it would look soemthing like this... public class YourTokenFilter extends private Token bufToken = null; public Token next() { if (null != bufToken) { Token t = bufToken; bufToken=null; return t; } Token t = input.next bufToken = new Token(removeAccents(t.termText()), t.startOffset(),t.endOffset(),t.type()); bufToken.setPositionIncrement(0); return t; } } ...but i haven't tested that (or ever written a TokenFilter of my own for that matter.) : : 2006/5/25, Chris Hostetter <[EMAIL PROTECTED]>: : > : > : > I think I'm missing something here. the whole point of the : > ISOLatin1AccentFilter is to replace accented characters with their : > unaccented equivalent -- it sounds like that's working just fine, If you : > want teh words in teh term vector to contain the accents, why don't you : > stop using that filter? : > : > if the problem is that you need to be able to match on both the accented : > form and the non accented form, perhaps you should have two fields, or : > modify the ISOLatin1AccentFilter so it puts both versions of the token in : > the TokenStream with the same position? : > : > : > : > The problem is special characters like à, ä , ç or ñ latin characters : > in : > : > the text. : > : > Now I use iso latin filter, but the problem is when I want to obtain : > most : > : > term used. These term are stored without ` ´ ^ or another "character : > : > attribute". : > : > For example "plàntïuç" (it isn't a real word) is stored like the term : > : > "plantiuc". : > : > How can I do to have in term vector the word "plàntïuç". : > : > : > : > thks for all replies. : > : > PD: excuse if this question is solved somewhere, but I don't saw it. : > : > : > : > -Hoss : > : > : > --------------------------------------------------------------------- : > To unsubscribe, e-mail: [EMAIL PROTECTED] : > For additional commands, e-mail: [EMAIL PROTECTED] : > : > : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]