[jira] Commented: (LUCENE-2102) LowerCaseFilter for Turkish language

Simon Willnauer (JIRA) Wed, 02 Dec 2009 09:46:44 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784928#action_12784928
 ]


Simon Willnauer commented on LUCENE-2102:
-----------------------------------------

bq. ok, I will change this. You are right, I would look at this problem 
differently if we didnt have CharacterUtil which makes it just so easy to 
support the old and new behavior.
Actually, a unused Version argument is silly. If we have to add it in the 
future because of some change, you WANT to deprecate the ctor to make users 
aware of it. that is what deprecations are made for. I would not argue about 
consistency as not every TokenFilter has a Version ctor. (EdgeNGramTokenFilter 
for instance - this is just first coming to my mind). I would remove it 
completely! Use Character.codePointAt() and you are good to go.



> LowerCaseFilter for Turkish language
> ------------------------------------
>
>                 Key: LUCENE-2102
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2102
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Ahmet Arslan
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch, 
> LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch
>
>
> java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish 
> alphabet lowercase of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2102) LowerCaseFilter for Turkish language

Reply via email to