[ https://issues.apache.org/jira/browse/LUCENE-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969215#comment-15969215 ]
ASF GitHub Bot commented on LUCENE-7785: ---------------------------------------- Github user arysin commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/187#discussion_r111595706 --- Diff: lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java --- @@ -107,11 +107,18 @@ public UkrainianMorfologikAnalyzer(CharArraySet stopwords, CharArraySet stemExcl @Override protected Reader initReader(String fieldName, Reader reader) { NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); + // different apostrophes builder.add("\u2019", "'"); + builder.add("\u0218", "'"); builder.add("\u02BC", "'"); + builder.add("`", "'"); + builder.add("ยด", "'"); + // ignored characters builder.add("\u0301", ""); - NormalizeCharMap normMap = builder.build(); + builder.add("\u00AD", ""); + builder.add("\uFEFF", ""); --- End diff -- That was from the note [Wikimedia guys suggested](https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Ukrainian_Morfologik_Analysis#Recommendations_.26_Plan), but agree it does not make sense here, I'll remove it > Move dictionary for Ukrainian analyzer to external dependency > ------------------------------------------------------------- > > Key: LUCENE-7785 > URL: https://issues.apache.org/jira/browse/LUCENE-7785 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Andriy Rysin > Assignee: Dawid Weiss > > Currently the dictionary for Ukrainian analyzer is a blob in the source tree. > We should move it out to external dependency, this allows: > * to have less binaries in the source > * easier to update the dictionary and track updates -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org