[ https://issues.apache.org/jira/browse/LUCENE-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ere Maijala updated LUCENE-8129: -------------------------------- Attachment: LUCENE-8129.patch Thanks, it was indeed bad. I checked that Normalizer2.getInstance calls Norm2AllModes.getInstance which returns a cached instance if available, so I believe you're right about it being immutable. An improved patch is attached. > Support for defining a Unicode set filter when using ICUFoldingFilter > --------------------------------------------------------------------- > > Key: LUCENE-8129 > URL: https://issues.apache.org/jira/browse/LUCENE-8129 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Reporter: Ere Maijala > Priority: Minor > Labels: ICUFoldingFilterFactory, patch-available, patch-with-test > Attachments: LUCENE-8129.patch > > > While ICUNormalizer2FilterFactory supports a filter attribute to define a > Unicode set filter, ICUFoldingFilterFactory does not support it. A filter > allows one to e.g. exclude a set of characters from being folded. E.g. for > Finnish and Swedish the filter could be defined like this: > <filter class="solr.ICUFoldingFilterFactory" filter="[^åäöÅÄÖ]"/> > Note: An additional MappingCharFilterFactory or solr.LowerCaseFilterFactory > would be needed for lowercasing the characters excluded from folding. This is > similar to what ElasticSearch provides (see > https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html). > I'll add a patch that does this similar to ICUNormalizer2FilterFactory. > Applies at least to master and branch_7x. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org