I'm not sure I agree that it would make sense, in general, for "WEISS" and 
"weiß" to be considered equal when ignoring case.  I don't write any 
german, so I might be wrong on that, though.  Thankfully I didn't have to 
write my own case-insensitive string comparison code because Oracle already 
provides equalsIgnoreCase :)

On Sunday, June 21, 2015 at 4:41:49 PM UTC+2, Fluid Dynamics wrote:
>
> The troubling thing isn't the use of Normalizer to remove accents, but the 
> use of .toUpper, .toLower, and .equalsIgnoreCase instead of Normalizer, 
> which may run into problems. For example you probably want "weiß" and 
> "WEISS" to compare equal when ignoring case. For a case-insensitive 
> comparison I tend to compare the outputs of this for two strings:
>
> (defn normalize
>   "Given a string, normalizes it so that it may be used as a key in a 
> hashmap
>    and compare equal to all strings representing the same word/spelling.
>    There are edge cases that .toLowerCase or .toUpperCase would not handle,
>    so the actual procedure uses java.text.Normalizer as well as both of the
>    above."
> ; => (= (normalize "ß") (normalize  "sS"))
> ; true
> ; => (= (normalize  "é") (normalize  "é"))
> ; true
> ; ; Note that the latter are two different és, if this file encoding 
> preserved
> ; ; the difference. One uses a combining diacritic and one is integral.
>   [^String s]
>   (-> s
>     (java.text.Normalizer/normalize (java.text.Normalizer$Form/NFKC))
>     (.toUpperCase)
>     (.toLowerCase)))
>
> Of course for some uses you want to compare the results of stripping 
> accents entirely, such as user text search (so a user input of "desole" 
> will match "désolé", making it possible for people with en-US keyboards and 
> operating systems to find it without jumping through hoops; of course this 
> is most important with name searches, so e.g. one might search for Hervé 
> Jean-Pierre Villechaize with "herve jean pierre villechaize" and not fail 
> to discover his role in The Man with the Golden Gun).
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to