On Thursday, August 19, 2010 07:13:25 Kagamin wrote: > Jonathan Davis Wrote: > > bytes and shorts are legitimate values on their own, so it wouldn't > > make sense to give the type to foreach as long. > > Having wider integer always has sense. > > > byte or short on its own just fine. > > Yes, but odds are that it's a bug. You can easily hit an overflow.
No, it doesn't hurt to have the iteration type larger than the actual type, but you're not going to have overflow. The value is in the array already. Sure, you could have had overflow putting it in, but when you're taking it out, you know that it fits because it was already in there. You could have overflow issues with math or whatnot inside the body of your loop if you're assigning to the foreach variable, but that has nothing to do with what you're getting out of the loop. With string and wstring, you're almost certainly getting a type that is inappropriate to process by itself. > > > So, it's almost a guarantee that the correct type for iterating over a > > string or wstring is dchar, not char or wchar. String types are just > > weird that way due to how multibyte unicode encodings work. > > If you don't like narrow strings, don't use them. Use dstring. You are free > to write what you want. It's fine with me to use narrow strings. Much as I'd love to avoid a lot of these issues, dstrings take up too much memory if you're going to be doing a lot of string processing. I'm aware of the issues and can program around them. The problem is that the default behavior is the abnormal (and therefore almost certainly buggy) behavior. Generally D tries to make the normal behavior the behavior that is less likely to cause bugs. Obviously, it doesn't always succeed, and this case is one of them. Very few people are actually going to want to deal with code points. They want characters. The result is that it becomes very easy to make mistakes with strings if you ever try and manipulate them character-by-character. > > > So, since it makes so little sense to iterate over chars or wchars by > > default, it would make sense to make the default dchar. > > It's an iteration over array items. This makes perfect sense. It makes perfect sense for general arrays. It makes perfect sense if you don't really care about the contents of the array for your algorithm (that is, whether they're code points or characters or just bytes in memory doesn't matter for what you're doing). However, if you're actually processing characters, it makes no sense at all. This mess with foreach and strings is one of the big reasons why foreach tends to be avoided in std.algorithm. The reality of the matter is that what the container conceptually contains (characters) and what it actually contains aren't the same. That causes problems all over the place. Some reasonable workarounds have been found (for instance, strings are special-cased so that they're not random access ranges), but you have to special case string all over the place. The only way to avoid it completely is to just use dstring everywhere, but that doesn't necessarily scale well, and given the fact that the string module deals almost exclusively with string rather than wstring or dstring, it really doesn't make sense to use dstrings in the general case. Not to mention, the Linux I/O stuff uses UTF-8, and the Windows I/O stuff uses UTF-16, so dstring is less efficient for dealing with I/O. Even just making it an error - or at least a warning - to not give the type for foreach when iterating over UTF-8 and UTF-16 string types would help a lot in fixing string-related coding errors (so, they can choose char, wchar, or dchar, but they can't forget to put in the type and get shot in the foot because what they almost certainly wanted was dchar). However, there's a lot of generic code which runs into trouble because of this as well. The result is that you generally have to avoid foreach in generic code. Perhaps what we need is some way to distinguish between the exact element type on an array and the conceptual element type. So, for most arrays, they'd both be whatever the element type of the array is, but for strings the exact element type would be char, whchar, or dchar while the conceptual type would be dchar. That way, algorithms that don't care what the actual contents mean can use the exact element type, and the algorithms that actually care about processing the contents can use the conceptual element type. - Jonathan M Davis