[jira] [Updated] (LUCENE-8129) Support for defining a Unicode set filter when using ICUFoldingFilter
[ https://issues.apache.org/jira/browse/LUCENE-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ere Maijala updated LUCENE-8129: Lucene Fields: New,Patch Available (was: New) > Support for defining a Unicode set filter when using ICUFoldingFilter > - > > Key: LUCENE-8129 > URL: https://issues.apache.org/jira/browse/LUCENE-8129 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Ere Maijala >Priority: Minor > Labels: ICUFoldingFilterFactory, patch-available, patch-with-test > Attachments: LUCENE-8129.patch, LUCENE-8129.patch > > > While ICUNormalizer2FilterFactory supports a filter attribute to define a > Unicode set filter, ICUFoldingFilterFactory does not support it. A filter > allows one to e.g. exclude a set of characters from being folded. E.g. for > Finnish and Swedish the filter could be defined like this: > > Note: An additional MappingCharFilterFactory or solr.LowerCaseFilterFactory > would be needed for lowercasing the characters excluded from folding. This is > similar to what ElasticSearch provides (see > https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html). > I'll add a patch that does this similar to ICUNormalizer2FilterFactory. > Applies at least to master and branch_7x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8129) Support for defining a Unicode set filter when using ICUFoldingFilter
[ https://issues.apache.org/jira/browse/LUCENE-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ere Maijala updated LUCENE-8129: Attachment: LUCENE-8129.patch Updated patch with {{normalizer}} renamed to {{NORMALIZER}}. > Support for defining a Unicode set filter when using ICUFoldingFilter > - > > Key: LUCENE-8129 > URL: https://issues.apache.org/jira/browse/LUCENE-8129 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Ere Maijala >Priority: Minor > Labels: ICUFoldingFilterFactory, patch-available, patch-with-test > Attachments: LUCENE-8129.patch, LUCENE-8129.patch > > > While ICUNormalizer2FilterFactory supports a filter attribute to define a > Unicode set filter, ICUFoldingFilterFactory does not support it. A filter > allows one to e.g. exclude a set of characters from being folded. E.g. for > Finnish and Swedish the filter could be defined like this: > > Note: An additional MappingCharFilterFactory or solr.LowerCaseFilterFactory > would be needed for lowercasing the characters excluded from folding. This is > similar to what ElasticSearch provides (see > https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html). > I'll add a patch that does this similar to ICUNormalizer2FilterFactory. > Applies at least to master and branch_7x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8129) Support for defining a Unicode set filter when using ICUFoldingFilter
[ https://issues.apache.org/jira/browse/LUCENE-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ere Maijala updated LUCENE-8129: Attachment: (was: SOLR-11811.patch) > Support for defining a Unicode set filter when using ICUFoldingFilter > - > > Key: LUCENE-8129 > URL: https://issues.apache.org/jira/browse/LUCENE-8129 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Ere Maijala >Priority: Minor > Labels: ICUFoldingFilterFactory, patch-available, patch-with-test > Attachments: LUCENE-8129.patch > > > While ICUNormalizer2FilterFactory supports a filter attribute to define a > Unicode set filter, ICUFoldingFilterFactory does not support it. A filter > allows one to e.g. exclude a set of characters from being folded. E.g. for > Finnish and Swedish the filter could be defined like this: > > Note: An additional MappingCharFilterFactory or solr.LowerCaseFilterFactory > would be needed for lowercasing the characters excluded from folding. This is > similar to what ElasticSearch provides (see > https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html). > I'll add a patch that does this similar to ICUNormalizer2FilterFactory. > Applies at least to master and branch_7x. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8129) Support for defining a Unicode set filter when using ICUFoldingFilter
[ https://issues.apache.org/jira/browse/LUCENE-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ere Maijala updated LUCENE-8129: Attachment: LUCENE-8129.patch Thanks, it was indeed bad. I checked that Normalizer2.getInstance calls Norm2AllModes.getInstance which returns a cached instance if available, so I believe you're right about it being immutable. An improved patch is attached. > Support for defining a Unicode set filter when using ICUFoldingFilter > - > > Key: LUCENE-8129 > URL: https://issues.apache.org/jira/browse/LUCENE-8129 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Ere Maijala >Priority: Minor > Labels: ICUFoldingFilterFactory, patch-available, patch-with-test > Attachments: LUCENE-8129.patch > > > While ICUNormalizer2FilterFactory supports a filter attribute to define a > Unicode set filter, ICUFoldingFilterFactory does not support it. A filter > allows one to e.g. exclude a set of characters from being folded. E.g. for > Finnish and Swedish the filter could be defined like this: > > Note: An additional MappingCharFilterFactory or solr.LowerCaseFilterFactory > would be needed for lowercasing the characters excluded from folding. This is > similar to what ElasticSearch provides (see > https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html). > I'll add a patch that does this similar to ICUNormalizer2FilterFactory. > Applies at least to master and branch_7x. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org