[
https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Willnauer updated LUCENE-2069:
------------------------------------
Attachment: LUCENE-2069.patch
I revised the patch and fixed some issues:
- replaced real characters in tests
- extended tests to boundaries
- Removed "code duplication" in LowercaseFilter
the latter is the most important issue. I figured that if we implement a
factory with the basic codePointAt method based on a version we can implement
the most of the algorithms / methods just by obtaining the version
correspondent instance of CharacterUtils (new class I introduced) What this
class does is pretty simple - if version >= 3.1 it delegates to the Character
correspondent while for earlier versions it convert a character to a codepoint
without checking the for high surrogates. Once we have done this conversion we
can simply use all the Character.*(int) methods as they are.
> fix LowerCaseFilter for unicode 4.0
> -----------------------------------
>
> Key: LUCENE-2069
> URL: https://issues.apache.org/jira/browse/LUCENE-2069
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Reporter: Robert Muir
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2069.patch, LUCENE-2069.patch, LUCENE-2069.patch,
> LUCENE-2069.patch
>
>
> lowercase suppl. characters correctly.
> this only fixes the filter, the LowerCaseTokenizer is part of a more complex
> issue (CharTokenizer)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]