Re: whats the correct way to do normalisation?

Erik Hatcher Mon, 06 Nov 2006 12:49:29 -0800


On Nov 6, 2006, at 11:27 AM, hans meiser wrote:

Hi,
Did you take a look at IsoLatin1AccentFilter ?
  It nearly do the same i need, but not perfectly.

   public final Token next() throws java.io.IOException {
 final Token t = input.next();
   if (t == null)
   return null;
return new Token(removeAccents(t.termText()), t.startOffset(),t.endOffset(), t.type());
 }
Here also a new Token is created. The question i have, why theendoffset is notcorrected for the new created token? Some times the new token isbigger than before.
  Complete code link:
http://developer.spikesource.com/spikewatch.logs/fedora-3-i386/2221/lucene/reports/clover/org/apache/lucene/analysis/ISOLatin1AccentFilter.html

For highlighting purposes, it's best to keep the offsets in theoriginal text, not adjusted for token mutation.


        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: whats the correct way to do normalisation?

Reply via email to