[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348927#comment-15348927
 ] 

Andriy Rysin commented on LUCENE-7287:
--------------------------------------

Ok, I was able to run solr with Ukrainian analyzer and I can confirm it 
generates unique lemmas.
I've created a pull request https://github.com/apache/lucene-solr/pull/45

I've also added mapping_uk.txt so we can use mapping filter in solr, once it's 
merged we can add this line:
        <charFilter class="solr.MappingCharFilterFactory" 
mapping="org/apache/lucene/analysis/uk/mapping_uk.txt"/>

We could potentially change UkrainianMorfologikAnalyzer to use 
MappingCharFilterFactory to read from the same file (so we don't have the 
mapping both in the code and the file) but not sure how appropriate using of 
factories in lucene is.

Many thanks to Ahmet who helped with solr integration and found duplicate 
tokens!

> New lemma-tizer plugin for ukrainian language.
> ----------------------------------------------
>
>                 Key: LUCENE-7287
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7287
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Dmytro Hambal
>            Priority: Minor
>              Labels: analysis, language, plugin
>             Fix For: master (7.0), 6.2
>
>         Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 
> PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to