[jira] Commented: (SOLR-2211) Create Solr FilterFactory for Lucene StandardTokenizer with UAX#29 support

Robert Muir (JIRA) Mon, 08 Nov 2010 16:30:32 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929849#action_12929849
 ]


Robert Muir commented on SOLR-2211:
-----------------------------------

Great, I look forward to the results.

By the way, on SOLR-2210 i also added the ICU filters, you could consider 
replacing LowerCaseFilterFactory with ICUNormalizer2Factory (just use the 
defaults).
In addition to better lowercasing (e.g. ß -> ss), this would also bring the 
advantages described in http://unicode.org/reports/tr15/

Alternatively, if you are already using both LowerCaseFilterFactory and 
ASCIIFoldingFilterFactory, you can replace both with ICUFoldingFilterFactory,
which goes further and also incorporates 
http://www.unicode.org/reports/tr30/tr30-4.html


> Create Solr FilterFactory for Lucene StandardTokenizer with  UAX#29 support
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-2211
>                 URL: https://issues.apache.org/jira/browse/SOLR-2211
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 3.1
>            Reporter: Tom Burton-West
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2211.patch
>
>
> The Lucene 3.x StandardTokenizer with UAX#29 support provides benefits for 
> non-English tokenizing.  Presently it can be invoked by using the 
> StandardTokenizerFactory and setting the Version to 3.1.  However, it would 
> be useful to be able to use the improved unicode processing without 
> necessarily including the ip address and email address processing of 
> StandardAnalyzer.   A FilterFactory that allowed the use of the 
> StandardTokenizer with UAX#29 support on its own would be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2211) Create Solr FilterFactory for Lucene StandardTokenizer with UAX#29 support

Reply via email to