I'm not sure I agree that it would make sense, in general, for "WEISS" and "weiß" to be considered equal when ignoring case. I don't write any german, so I might be wrong on that, though. Thankfully I didn't have to write my own case-insensitive string comparison code because Oracle already provides equalsIgnoreCase :)
On Sunday, June 21, 2015 at 4:41:49 PM UTC+2, Fluid Dynamics wrote: > > The troubling thing isn't the use of Normalizer to remove accents, but the > use of .toUpper, .toLower, and .equalsIgnoreCase instead of Normalizer, > which may run into problems. For example you probably want "weiß" and > "WEISS" to compare equal when ignoring case. For a case-insensitive > comparison I tend to compare the outputs of this for two strings: > > (defn normalize > "Given a string, normalizes it so that it may be used as a key in a > hashmap > and compare equal to all strings representing the same word/spelling. > There are edge cases that .toLowerCase or .toUpperCase would not handle, > so the actual procedure uses java.text.Normalizer as well as both of the > above." > ; => (= (normalize "ß") (normalize "sS")) > ; true > ; => (= (normalize "é") (normalize "é")) > ; true > ; ; Note that the latter are two different és, if this file encoding > preserved > ; ; the difference. One uses a combining diacritic and one is integral. > [^String s] > (-> s > (java.text.Normalizer/normalize (java.text.Normalizer$Form/NFKC)) > (.toUpperCase) > (.toLowerCase))) > > Of course for some uses you want to compare the results of stripping > accents entirely, such as user text search (so a user input of "desole" > will match "désolé", making it possible for people with en-US keyboards and > operating systems to find it without jumping through hoops; of course this > is most important with name searches, so e.g. one might search for Hervé > Jean-Pierre Villechaize with "herve jean pierre villechaize" and not fail > to discover his role in The Man with the Golden Gun). > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.