[jira] Issue Comment Edited: (LUCENE-2102) LowerCaseFilter for Turkish language

Uwe Schindler (JIRA) Tue, 01 Dec 2009 14:55:46 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784496#action_12784496
 ]


Uwe Schindler edited comment on LUCENE-2102 at 12/1/09 10:54 PM:
-----------------------------------------------------------------

Robert: I understand your problem, but it affects LowerCaseFilter at all and is 
not special to the Turkish lower filter. If you have decomposed characters even 
LowerCaseFilter would fail for *all* languages (even German if you compose ä 
out of a and two dots). In germany really nobody uses composed chars, I do not 
know how this is in Turkey, but the last time I was there, they just used the 
simpliest de-composed chars (like germans), they even have the umlauts which 
they use from the basic latin1 range. And for that this filter works and is a 
quick fix.

But I give up now.

      was (Author: thetaphi):
    Robert: I understand your problem, but it affects LowerCaseFilter at all 
and is not special to the Turkish lower filter. If you have decomposed 
characters even LowerCaseFilter would fail for *all* languages (even German if 
you compose ä out of a and two dots). In germany really nobody uses composed 
chars, I do not know how this is in Turkey, but the last time I was there, they 
just used the simpliest composed chars (like germans), they even have the 
umlauts which they use from the basic latin1 range. And for that this filter 
works and is a quick fix.

But I give up now.
  
> LowerCaseFilter for Turkish language
> ------------------------------------
>
>                 Key: LUCENE-2102
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2102
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Ahmet Arslan
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch
>
>
> java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish 
> alphabet lowercase of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Issue Comment Edited: (LUCENE-2102) LowerCaseFilter for Turkish language

Reply via email to