"Jonathan M Davis" <jmdavisp...@gmx.com> wrote in message news:mailman.529.1294620116.4748.digitalmar...@puremagic.com... > On Sunday 09 January 2011 13:52:53 Jim wrote: >> I'm a firm believer of alternative B: Store the string with its original >> case, unless it's particularly important to do otherwise. >> >> The cost of case-insensitive comparison is REALLY small. Anytime you are >> to >> compare two strings ask yourself whether case-sensitive or >> case-insensitive is what you need. Have no inclination to prefer one type >> of comparison to the other. Problem solved. Bloat avoided. >> >> >> Creating specific types of strings that carry with them data on how they >> are to be interpreted is over-engineering, solving a problem that doesn't >> exist.
The problem certainly cropped up for me. See below. > > I don't know that it's over-engineering. I expect that there _are_ cases > where > it makes perfect sense. However, in the general case, I do think that it's > overkill. std.string.icmp() deals with most cases where you need case- > insensitive comparison, but what if you really do need it everywhere as in > Nick's case? Or what about cases like associative arrays, which you can't > give a > comparison function to (it has to be built into the type)? I don't think > that > the cost of the comparison here is really the issue. If that's all you > need, > then there's icmp(). It's when you need the same comparison _everywhere_ > that it > matters. > Right. FWIW, this is the scenario that originally inspired it: I was working on some code that processes a grammar definition (specifically, the BNF-style language that GOLD uses). The grammar definition language includes various pre-defined character sets, and allows user-defined character sets. These character sets are referred to be name (such as "AlphaNumeric", "Whitespace", or "Cyrillic Supplementary"). But those character set names are defined by the language as being case-insensitive. (Come to think of it, all the names of everything are case-insensitive: tokens, char sets, meta-data, etc.) Due to the usage patterns in my program, it made sense to store the character sets as an associative array where the keys were the names of the character sets and the values were the data describing what characters were included in the set. And there were plenty of other AAs for other things that were all indexed by case-insensitive names. Obviously, I needed to ensure that *all* comparisons involving these names were done insensitively (to do otherwise would be a bug). And there were also times when I needed to display one of the character set names (error messages, for instance), and it would be awkward not to show the original capitalization. So I had to follow the convention of always creating lower-case versions to insert into and lookup from the AAs, and also maintain the original names (and be very careful about all of it). This quickly became an awful mess. But as soon as I wrote and started using the "Insensitive" type, the whole thing was simplified enormously. While writing and dealing with all that code I realized something: While programmers are usually heavily conditioned to think of case-sensitivity as an attribute of the comparison, it's very frequent that the deciding factor in which comparison to use is *not* the comparison itself but *what* gets compared. And in those cases, you have to use the awful strategy of "relying on convention" to make sure you get it right in *every* place that particular data gets compared. It's analogous to how Asm has separate operators for signed-integer, unsigned-integer and floating-point math: Many times a specific memory location is *supposed* to be treated as either signed, unsigned or float in *all* operations they participate in. Handling this with separate operators that behave differently is notoriously tedious and error-prone. That's why non-asm languages, even ones as low-level as C, employ a type system which allows the programmer to *force* a variable to always, and automatically, be used with the proper version of the given operator. Heck, it all goes back to the whole original point of a type-system. > > Now, I do wonder if perhaps this idea should be generalized to any type > and/or a > given binary predicate to test for equality rather than making it specific > to > strings and case-insensitive comparison. The issue here (in the general > sense) > is that you want to wrap a type so that it will use a specialized > comparison > function everywhere, and that seems like it should be highly > generalizable, > though doing it right may require alias this, which _is_ rather buggy at > the > moment. Still, it would seem to me to be worthwhile to consider how it > could > and/or should be generalized. > That's a very good thought. Have to say I'm not really sure offhand how I'd do that though.