Re: ICUFoldingFilter

Robert Muir Mon, 04 Jun 2018 07:41:51 -0700

This cannot be "tweaked" at runtime, it is implemented as custom normalization.


You can modify the sources / build your own ruleset or use a different
tokenfilter to normalize characters.

On Mon, Jun 4, 2018 at 9:07 AM, Michael Sokolov <msoko...@gmail.com> wrote:
> Hi, I'm using ICUFoldingFilter and for the most part it does exactly what I
> want. However there are some behaviors I'd like to tweak. For example it
> maps "aaa^bbb" to "aaabbb". I am trying to understand why it does that, and
> whether there is any way to prevent it.
>
> I spent a little time with
> http://www.unicode.org/reports/tr30/tr30-4.html#UnicodeData which I guess
> is the basis for what this filter does (it's referenced in the javadocs),
> but that didn't answer my questions. As an aside, it seems this tech report
> was withdfrawn by the unicode consortium? Not sure what that means if
> anything, but it seems ominous.
>
> Anyway, I would appreciate pointers to more info, and specifically, whether
> there are any alternatives to the utr30.nrm data file, or any possibility
> to select among the many transformations this filter applies.
>
> Thanks!
>
> Mike S

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: ICUFoldingFilter

Reply via email to