[jira] [Commented] (LUCENE-7785) Move dictionary for Ukrainian analyzer to external dependency

ASF GitHub Bot (JIRA) Fri, 14 Apr 2017 02:31:10 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968832#comment-15968832
 ]


ASF GitHub Bot commented on LUCENE-7785:
----------------------------------------

Github user dweiss commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/187#discussion_r111553388
  
    --- Diff: 
lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java
 ---
    @@ -107,11 +107,18 @@ public UkrainianMorfologikAnalyzer(CharArraySet 
stopwords, CharArraySet stemExcl
       @Override
       protected Reader initReader(String fieldName, Reader reader) {
         NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
    +    // different apostrophes
         builder.add("\u2019", "'");
    +    builder.add("\u0218", "'");
         builder.add("\u02BC", "'");
    +    builder.add("`", "'");
    +    builder.add("´", "'");
    +    // ignored characters
         builder.add("\u0301", "");
    -    NormalizeCharMap normMap = builder.build();
    +    builder.add("\u00AD", "");
    +    builder.add("\uFEFF", "");
    --- End diff --
    
    byte order mark shouldn't be replaced to nothing... if you have a byte 
order mark in your character input (reader) then your conversion from bytes is 
screwed up somewhere before.


> Move dictionary for Ukrainian analyzer to external dependency
> -------------------------------------------------------------
>
>                 Key: LUCENE-7785
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7785
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Andriy Rysin
>
> Currently the dictionary for Ukrainian analyzer is a blob in the source tree. 
> We should move it out to external dependency, this allows:
> * to have less binaries in the source
> * easier to update the dictionary and track updates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7785) Move dictionary for Ukrainian analyzer to external dependency

Reply via email to