[
https://issues.apache.org/jira/browse/LUCENE-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969215#comment-15969215
]
ASF GitHub Bot commented on LUCENE-7785:
----------------------------------------
Github user arysin commented on a diff in the pull request:
https://github.com/apache/lucene-solr/pull/187#discussion_r111595706
--- Diff:
lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java
---
@@ -107,11 +107,18 @@ public UkrainianMorfologikAnalyzer(CharArraySet
stopwords, CharArraySet stemExcl
@Override
protected Reader initReader(String fieldName, Reader reader) {
NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
+ // different apostrophes
builder.add("\u2019", "'");
+ builder.add("\u0218", "'");
builder.add("\u02BC", "'");
+ builder.add("`", "'");
+ builder.add("ยด", "'");
+ // ignored characters
builder.add("\u0301", "");
- NormalizeCharMap normMap = builder.build();
+ builder.add("\u00AD", "");
+ builder.add("\uFEFF", "");
--- End diff --
That was from the note [Wikimedia guys
suggested](https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Ukrainian_Morfologik_Analysis#Recommendations_.26_Plan),
but agree it does not make sense here, I'll remove it
> Move dictionary for Ukrainian analyzer to external dependency
> -------------------------------------------------------------
>
> Key: LUCENE-7785
> URL: https://issues.apache.org/jira/browse/LUCENE-7785
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Andriy Rysin
> Assignee: Dawid Weiss
>
> Currently the dictionary for Ukrainian analyzer is a blob in the source tree.
> We should move it out to external dependency, this allows:
> * to have less binaries in the source
> * easier to update the dictionary and track updates
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]