clean up uses of String.toLowerCase in code
-------------------------------------------
Key: LUCENE-2411
URL: https://issues.apache.org/jira/browse/LUCENE-2411
Project: Lucene - Java
Issue Type: Bug
Affects Versions: 3.1
Reporter: Robert Muir
Fix For: 3.1
Uwe recently fixed this in the ThaiWordFilter, which reminded me to search our
code for use of String.toLowerCase()
The problem with this method is the following:
* it depends on the "default locale" which is flimsy and should be avoided I
think, it typically just causes problems.
This is because there can be hard-to-debug issues if the machine is not
configured correctly for the same Locale
at both index and query time.
* lowercasing with locale-sensitive rules is really only suitable for display
and presentation,
if we want international lowercasing for search we should be using case
folding.
This is especially important since otherwise people unknowingly using this
special casing at query-time are
not going to get results, e.g. if they use a TermRangeQuery from the
queryparser and it lowercases stuff differently.
in my opinion we should fix all these methods to use Character.toLowerCase
(if possible especially for speed with TokenStreams), otherwise
String.toLowerCase
with the ROOT Locale, new Locale(""). This is closer to case folding.
If some things really need locale-sensitivity for some extreme reason I think
we should just make the Locale
a mandatory parameter.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]