[jira] Commented: (LUCENE-1466) CharFilter - normalize characters before tokenizer

Robert Muir (JIRA) Wed, 10 Jun 2009 20:49:39 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718291#action_12718291
 ]


Robert Muir commented on LUCENE-1466:
-------------------------------------

just as an alternative, i have a different mechanism as part of lucene-1488 
patch I am working on. But maybe its good to have options, as it depends on the 
ICU library.

below is excerpt from javadoc.

A TokenFilter that transforms text with ICU.

ICU provides text-transformation functionality via its Transliteration API.
Although script conversion is its most common use, a transliterator can 
actually perform a more general class of tasks. 
...
Some useful transformations for search are built-in:
* Conversion from Traditional to Simplified Chinese characters
* Conversion from Hiragana to Katakana
* Conversion from Fullwidth to Halfwidth forms.
...
Example usage:
 * stream = new ICUTransformFilter(stream, 
Transliterator.getInstance("Traditional-Simplified"));


> CharFilter - normalize characters before tokenizer
> --------------------------------------------------
>
>                 Key: LUCENE-1466
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1466
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis
>    Affects Versions: 2.4
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1466.patch, LUCENE-1466.patch
>
>
> This proposes to import CharFilter that has been introduced in Solr 1.4.
> Please see for the details:
> - SOLR-822
> - http://www.nabble.com/Proposal-for-introducing-CharFilter-to20327007.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1466) CharFilter - normalize characters before tokenizer

Reply via email to