> So, without some kind of case translation dictionary that can be > trusted on the particular strings we want to test, can we assume > that's it's not actually a solvable problem? (because, like divide by > zero, the question isn't valid to start with)
Here's the dictionary: http://unicode.org/Public/UNIDATA/SpecialCasing.txt The file defines four case types: Lowecase_Mapping; Titlecase_Mapping; Uppercase_Mapping and Case_Folding. The Unicode Consortium defines a default caseless matching algorithm and it's the Case_Folding form we want for an equivalence test. The very first character is the infamous LATIN SMALL LETTER SHARP S which has a case folding form of "ss" You then take into account normalization where characters are transformed into their decomposed forms - so, e-acute becomes the separate "e" and the acute combining mark. The default canonical caseless match algorithm is: NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y))) (I'm summarising section 3.13 of the Unicode Standard 5.2) Note that SpecialCasing.txt defines rules for different languages, including things like what happens when one letter follows another letter. Sorting (collation) is more complicated. Would you believe that in some languages people expect Z to come after A? Rules for collation are defined in the separate Unicode Technical Standard #10. I would never want to implement these algorithms myself so it's handy that Java does it for us in the java.text classes (which I'm guessing are derived from the icu-project). I'm not an expert by any means - you can learn a lot just by browsing the first couple of chapters of the (surprisingly readable) Unicode Standard: http://www.unicode.org/standard/standard.html -- You received this message because you are subscribed to the Google Groups "The Java Posse" group. To post to this group, send email to javapo...@googlegroups.com. To unsubscribe from this group, send email to javaposse+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.