[The Java Posse] Re: Dick, that's not how you compare strings!

Reinier Zwitserloot Sun, 08 Aug 2010 15:23:54 -0700

So close.

java's own String.CASE_INSENSITIVE_ORDER uses this tactic, and as far
as case insensitive tactics go, this really isn't such a bad one.
However, they completely bollocks it up by doing this character-by-
character for some completely unfathomable reason. This is dumb, and
explains why STRASSE and straße aren't equal.
Character.toUpperCase('\u00DF') can't very well return "SS", so it has
to return the unicode codepoint for capital eszett.

Nevertheless, as someone else has pointed out to me, both großman and
grossman are somewhat common german surnames and shouldn't be
considered equal, so, in many ways, yes, 'case insensitive' as a
concept doesn't really make sense beyond english.

Doing a canonical comparison to answer the question: "Are these
strings most likely intended to be equal considering that they are
both written in language X", is completely valid though, and that's
exactly what java.text.Collator is for. I don't think this is mission
impossible. It's just crazy complicated.

Many props to A McDowell for teaching us all about the case folding
rules of unicode. I learned something new.

On Aug 8, 9:34 am, Christian Catchpole <christ...@catchpole.net>
wrote:
> So, without some kind of case translation dictionary that can be
> trusted on the particular strings we want to test, can we assume
> that's it's not actually a solvable problem? (because, like divide by
> zero, the question isn't valid to start with)
>
> Could you maybe get better results by (if upperCompare ||
> lowerCompare)?
>
> Was I serious for a second there?
>
> GERBILS!
>
> That's better.

-- 
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to javapo...@googlegroups.com.
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en.

[The Java Posse] Re: Dick, that's not how you compare strings!

Reply via email to