[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346875#comment-15346875
 ] 

Ahmet Arslan commented on LUCENE-7287:
--------------------------------------

Hi, 
multiple tokens OK, but multiple identical tokens look weird, no?
Have you checked the screenshot that includes 
RemoveDuplicatesTokenFilterFactory (RDTF)?

bq. Shall I create mappings_uk.txt so we can use it in solr?

Lets ask Michael. 
Either separate file or we can just recommend to use mapping char filter the 
recommended mappings.
May be we can place the uk_mappings.txt file under 
https://github.com/apache/lucene-solr/tree/master/solr/server/solr/configsets/sample_techproducts_configs/conf/lang

> New lemma-tizer plugin for ukrainian language.
> ----------------------------------------------
>
>                 Key: LUCENE-7287
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7287
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: Dmytro Hambal
>            Priority: Minor
>              Labels: analysis, language, plugin
>             Fix For: master (7.0), 6.2
>
>         Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 
> PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to