Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?
That would be great! On Mon, Sep 16, 2013 at 1:41 PM, Benson Margulies wrote: > Thanks, I might pitch in. > > > On Mon, Sep 16, 2013 at 12:58 PM, Robert Muir wrote: > >> Mostly because our tokenizers like StandardTokenizer will tokenize the >> same way regardless of normalization form or whether its normalized at >> all? >> >> But for other tokenizers, such a charfilter should be useful: there is >> a JIRA for it, but it has some unresolved issues >> >> https://issues.apache.org/jira/browse/LUCENE-4072 >> >> On Sun, Sep 15, 2013 at 7:05 PM, Benson Margulies >> wrote: >> > Can anyone shed light as to why this is a token filter and not a char >> > filter? I'm wishing for one of these _upstream_ of a tokenizer, so that >> the >> > tokenizer's lookups in its dictionaries are seeing normalized contents. >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?
Can anyone shed light as to why this is a token filter and not a char filter? I'm wishing for one of these _upstream_ of a tokenizer, so that the tokenizer's lookups in its dictionaries are seeing normalized contents.
Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?
Thanks, I might pitch in. On Mon, Sep 16, 2013 at 12:58 PM, Robert Muir wrote: > Mostly because our tokenizers like StandardTokenizer will tokenize the > same way regardless of normalization form or whether its normalized at > all? > > But for other tokenizers, such a charfilter should be useful: there is > a JIRA for it, but it has some unresolved issues > > https://issues.apache.org/jira/browse/LUCENE-4072 > > On Sun, Sep 15, 2013 at 7:05 PM, Benson Margulies > wrote: > > Can anyone shed light as to why this is a token filter and not a char > > filter? I'm wishing for one of these _upstream_ of a tokenizer, so that > the > > tokenizer's lookups in its dictionaries are seeing normalized contents. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?
Mostly because our tokenizers like StandardTokenizer will tokenize the same way regardless of normalization form or whether its normalized at all? But for other tokenizers, such a charfilter should be useful: there is a JIRA for it, but it has some unresolved issues https://issues.apache.org/jira/browse/LUCENE-4072 On Sun, Sep 15, 2013 at 7:05 PM, Benson Margulies wrote: > Can anyone shed light as to why this is a token filter and not a char > filter? I'm wishing for one of these _upstream_ of a tokenizer, so that the > tokenizer's lookups in its dictionaries are seeing normalized contents. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org