Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?

Robert Muir Mon, 16 Sep 2013 10:06:52 -0700

Mostly because our tokenizers like StandardTokenizer will tokenize the
same way regardless of normalization form or whether its normalized at
all?


But for other tokenizers, such a charfilter should be useful: there is
a JIRA for it, but it has some unresolved issues

https://issues.apache.org/jira/browse/LUCENE-4072

On Sun, Sep 15, 2013 at 7:05 PM, Benson Margulies <[email protected]> wrote:
> Can anyone shed light as to why this is a token filter and not a char
> filter? I'm wishing for one of these _upstream_ of a tokenizer, so that the
> tokenizer's lookups in its dictionaries are seeing normalized contents.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?

Reply via email to