[
https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779319#action_12779319
]
Simon Willnauer commented on LUCENE-2069:
-----------------------------------------
bq. Simon, those "wierd" chars are indeed real codepoints that have lowercasing
behavior in Unicode 4.0!
thats what I guessed :D otherwise it would not work though :). I was just
wondering if there are some more expressive once out there.
bq. Mark, true, well give me some consensus so when 3.0 is released, we can
start attacking these issues!
+1
> fix LowerCaseFilter for unicode 4.0
> -----------------------------------
>
> Key: LUCENE-2069
> URL: https://issues.apache.org/jira/browse/LUCENE-2069
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Reporter: Robert Muir
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2069.patch, LUCENE-2069.patch, LUCENE-2069.patch
>
>
> lowercase suppl. characters correctly.
> this only fixes the filter, the LowerCaseTokenizer is part of a more complex
> issue (CharTokenizer)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]