clean up uses of String.toLowerCase in code
-------------------------------------------

                 Key: LUCENE-2411
                 URL: https://issues.apache.org/jira/browse/LUCENE-2411
             Project: Lucene - Java
          Issue Type: Bug
    Affects Versions: 3.1
            Reporter: Robert Muir
             Fix For: 3.1


Uwe recently fixed this in the ThaiWordFilter, which reminded me to search our 
code for use of String.toLowerCase()

The problem with this method is the following:
* it depends on the "default locale" which is flimsy and should be avoided I 
think, it typically just causes problems.
  This is because there can be hard-to-debug issues if the machine is not 
configured correctly for the same Locale
  at both index and query time.
* lowercasing with locale-sensitive rules is really only suitable for display 
and presentation, 
  if we want international lowercasing for search we should be using case 
folding.
  This is especially important since otherwise people unknowingly using this 
special casing at query-time are
  not going to get results, e.g. if they use a TermRangeQuery from the 
queryparser and it lowercases stuff differently.

in my opinion we should fix all these methods to use Character.toLowerCase
(if possible especially for speed with TokenStreams), otherwise 
String.toLowerCase 
with the ROOT Locale, new Locale(""). This is closer to case folding.

If some things really need locale-sensitivity for some extreme reason I think 
we should just make the Locale 
a mandatory parameter.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to