[
https://issues.apache.org/jira/browse/LUCENE-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968832#comment-15968832
]
ASF GitHub Bot commented on LUCENE-7785:
----------------------------------------
Github user dweiss commented on a diff in the pull request:
https://github.com/apache/lucene-solr/pull/187#discussion_r111553388
--- Diff:
lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java
---
@@ -107,11 +107,18 @@ public UkrainianMorfologikAnalyzer(CharArraySet
stopwords, CharArraySet stemExcl
@Override
protected Reader initReader(String fieldName, Reader reader) {
NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
+ // different apostrophes
builder.add("\u2019", "'");
+ builder.add("\u0218", "'");
builder.add("\u02BC", "'");
+ builder.add("`", "'");
+ builder.add("ยด", "'");
+ // ignored characters
builder.add("\u0301", "");
- NormalizeCharMap normMap = builder.build();
+ builder.add("\u00AD", "");
+ builder.add("\uFEFF", "");
--- End diff --
byte order mark shouldn't be replaced to nothing... if you have a byte
order mark in your character input (reader) then your conversion from bytes is
screwed up somewhere before.
> Move dictionary for Ukrainian analyzer to external dependency
> -------------------------------------------------------------
>
> Key: LUCENE-7785
> URL: https://issues.apache.org/jira/browse/LUCENE-7785
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Andriy Rysin
>
> Currently the dictionary for Ukrainian analyzer is a blob in the source tree.
> We should move it out to external dependency, this allows:
> * to have less binaries in the source
> * easier to update the dictionary and track updates
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]