[ 
https://issues.apache.org/jira/browse/SOLR-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929823#action_12929823
 ] 

Tom Burton-West commented on SOLR-2211:
---------------------------------------

Thanks for all your help Robert.   We will be testing this and the ICUTokenizer 
tomorrow against a few thousand documents to see how it impacts our unique term 
counts.   I'll post results to the list once I have something interesting to 
report.

> Create Solr FilterFactory for Lucene StandardTokenizer with  UAX#29 support
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-2211
>                 URL: https://issues.apache.org/jira/browse/SOLR-2211
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 3.1
>            Reporter: Tom Burton-West
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2211.patch
>
>
> The Lucene 3.x StandardTokenizer with UAX#29 support provides benefits for 
> non-English tokenizing.  Presently it can be invoked by using the 
> StandardTokenizerFactory and setting the Version to 3.1.  However, it would 
> be useful to be able to use the improved unicode processing without 
> necessarily including the ip address and email address processing of 
> StandardAnalyzer.   A FilterFactory that allowed the use of the 
> StandardTokenizer with UAX#29 support on its own would be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to