On Nov 6, 2006, at 11:27 AM, hans meiser wrote:
Hi,
Did you take a look at IsoLatin1AccentFilter ?
It nearly do the same i need, but not perfectly.
public final Token next() throws java.io.IOException {
final Token t = input.next();
if (t == null)
return null;
return new Token(removeAccents(t.termText()), t.startOffset(),
t.endOffset(), t.type());
}
Here also a new Token is created. The question i have, why the
endoffset is not
corrected for the new created token? Some times the new token is
bigger than before.
Complete code link:
http://developer.spikesource.com/spikewatch.logs/fedora-3-
i386/2221/lucene/reports/clover/org/apache/lucene/analysis/
ISOLatin1AccentFilter.html
For highlighting purposes, it's best to keep the offsets in the
original text, not adjusted for token mutation.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]