Don Wrote: > We seem to be approaching the point where char[], wchar[] and dchar[] > are all arrays of dchar, but with different levels of compression. > It makes me wonder if the char, wchar types actually make any sense. > If char[] is actually a UTF string, then char[] ~ char should be > permitted ONLY if char can be implicitly converted to dchar. Otherwise, > you're performing cast(char[])(cast(ubyte[])s ~ cast(ubyte)c) which will > not necessarily result in a valid unicode string.
Well, if you're working with a LOT of text, you may be mmapping GB's of UTF-8 text. Yes, this does happen. You better be able to handle it in a sane manner, i.e. not reallocating the memory to read the data in. So, there is a definite need for casting to array of char, and dealing with the inevitable stray non-unicode char in that mess. Real-world text processing can be a messy affair. It probably requires walking such an array and returning slices cast to char after they've been validated.