> What is the meaning of "the Unicode Policeman" ? Robert Muir :-)
Uwe > Thanks, > Ahmet > > On Thursday, October 22, 2015 2:59 PM, Uwe Schindler <u...@thetaphi.de> > wrote: > > > > Hi, > > > > >> Setting aside the fact that Character.toLowerCase is already > > >> dubious in some locales (e.g. Turkish), > > > > > > This is not true. Character.toLowerCase() works locale-independent. > > > It is only String.toLowerCase that works using default locale. > > So you mean the opposite. You wanted to have it locale-dependent. That’s > already possible: LowercaseFilter is documented to only use default unicode > folding, no locale specific stuff. If you have a turkish lucene field, you > need to > do locale-specific analysis anyways (e.g. use TukishAnalyzer). This one uses > TurkishLowercaseFilter. Having both variant as synonyms needs more work, > but out of the scope of this mail thread. > > > Yet if you have a field like "title" and the user and system are > > Turkish, the user would expect their locale to apply, yet > > LowerCaseFilter will not handle that. So whereas it is "safe" for > > English hard-coded strings, it isn't safe for all fields you might index in > general. > > That's documented like that! > > > Dawid's response shows, though, that at least for the time being, > > there is nothing to worry about. Hopefully Unicode will never add a > > code point which lowercases to one with less code units (or I guess > > changes one of the lower ones to lowercase to more than one...) > > There was a discussion about that in JIRA already at the time of rewriting > LowercaseFilter to allow suppl characters outside BMP. I have to lookup the > issue, but I am quite sure that the Unicode Policeman did a lot of recherche > and found some statement in Unicode spec that the upper and lowercase > letters are always in same block. I will try to look this up. > > > Uwe > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org