On 8/19/10, Kagamin <s...@here.lot> wrote: > Jonathan M Davis Wrote: > >> Considering that in all likelihood 99+% of the cases where someone is >> iterating >> over char, they really want dchar > > And when someone is iterating over byte[] or short[], they want long, right? > Yeah, why not? >
The problem is that chars are not characters. They are UTF-8 code units. If all you're using is ASCII, you can get away with treating them like one byte characters, but that doesn't work if you have any characters which aren't in ASCII. dchars _are_ characters. The correct way to iterate over a string or wstring if you want to treat the elements as characters is to give the type as dchar. foreach(dchar c; mystring) { //... } If you use char or wchar, you're going to iterate over code units, which is completely different. It is not generally the case that that is the correct thing to do. If someone does that in their code, odds are that it's a bug. bytes and shorts are legitimate values on their own, so it wouldn't make sense to give the type to foreach as long. You can deal with each byte or short on its own just fine. You can't safely do that with code units unless for some reason, you actually want to operate on code units (which is unlikely), or you don't actually care about the contents of the string for whatever you're doing (since some algorithms don't actually care about the contents of the arrays/ranges that they're dealing with). So, it's almost a guarantee that the correct type for iterating over a string or wstring is dchar, not char or wchar. String types are just weird that way due to how multibyte unicode encodings work. So, since it makes so little sense to iterate over chars or wchars by default, it would make sense to make the default dchar. - Jonathan M Davis