Re: Making all strings UTF ranges has some risk of WTF

Jerry Quinn Thu, 04 Feb 2010 14:05:15 -0800

Don Wrote:
> We seem to be approaching the point where char[], wchar[] and dchar[] 
> are all arrays of dchar, but with different levels of compression.
> It makes me wonder if the char, wchar types actually make any sense.
> If char[] is actually a UTF string, then char[] ~ char should be 
> permitted ONLY if char can be implicitly converted to dchar. Otherwise, 
> you're performing cast(char[])(cast(ubyte[])s ~ cast(ubyte)c) which will 
> not necessarily result in a valid unicode string.


Well, if you're working with a LOT of text, you may be mmapping GB's of UTF-8 
text.  Yes, this does happen.  You better be able to handle it in a sane 
manner, i.e. not reallocating the memory to read the data in.  So, there is a 
definite need for casting to array of char, and dealing with the inevitable 
stray non-unicode char in that mess.  

Real-world text processing can be a messy affair.  It probably requires walking 
such an array and returning slices cast to char after they've been validated.

Re: Making all strings UTF ranges has some risk of WTF

Reply via email to