[jira] Commented: (LUCENE-1545) Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E

Robert Muir (JIRA) Fri, 12 Jun 2009 06:04:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718825#action_12718825
 ]


Robert Muir commented on LUCENE-1545:
-------------------------------------

michael, I don't see a way from the manual to do it.

its not just the rules, but the JRE used to compile the rules (and its 
underlying unicode defs) so you might need separate standardtokenizerimpl's to 
really control the thing...

> Standard analyzer does not correctly tokenize combining character U+0364 
> COMBINING LATIN SMALL LETTRE E
> -------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1545
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1545
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.4
>         Environment: Linux x86_64, Sun Java 1.6
>            Reporter: Andreas Hauser
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: AnalyzerTest.java
>
>
> Standard analyzer does not correctly tokenize combining character U+0364 
> COMBINING LATIN SMALL LETTRE E.
> The word "moͤchte" is incorrectly tokenized into "mo" "chte", the combining 
> character is lost.
> Expected result is only on token "moͤchte".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1545) Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E

Reply via email to