[jira] [Commented] (LUCENE-4201) Add Japanese character filter to normalize iteration marks

Christian Moen (JIRA) Fri, 06 Jul 2012 09:00:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408101#comment-13408101
 ]


Christian Moen commented on LUCENE-4201:
----------------------------------------

Sequences of iteration marks are supported.  In case an illegal sequence of 
iteration marks is encountered, the implementation emits the illegal source 
character as-is without considering its script.  For example, with input "?ゝ", 
we get "??" even though "?" isn't hiragana.

Note that a full stop punctuation character "。" (U+3002) can not be iterated　
(see below). Iteration marks themselves can be emitted in case they are 
illegal,　i.e. if they go back past the beginning of the character stream.

The implementation buffers input until a full stop punctuation character 
(U+3002) or EOF is reached in order to not keep a copy of the character stream 
in memory. Vertical iteration marks, which are even rarer than horizonal 
iteration marks in contemporary Japanese, are unsupported.

                
> Add Japanese character filter to normalize iteration marks
> ----------------------------------------------------------
>
>                 Key: LUCENE-4201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4201
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.0, 5.0
>            Reporter: Christian Moen
>         Attachments: LUCENE-4201.patch
>
>
> For some applications it might be useful to normalize kanji and kana 
> iteration marks such as 々, ゞ, ゝ, ヽ and ヾ to make sure they are treated 
> uniformly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4201) Add Japanese character filter to normalize iteration marks

Reply via email to