Mostly because our tokenizers like StandardTokenizer will tokenize the same way regardless of normalization form or whether its normalized at all?
But for other tokenizers, such a charfilter should be useful: there is a JIRA for it, but it has some unresolved issues https://issues.apache.org/jira/browse/LUCENE-4072 On Sun, Sep 15, 2013 at 7:05 PM, Benson Margulies <[email protected]> wrote: > Can anyone shed light as to why this is a token filter and not a char > filter? I'm wishing for one of these _upstream_ of a tokenizer, so that the > tokenizer's lookups in its dictionaries are seeing normalized contents. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
