On Sunday, 24 August 2014 at 18:19:45 UTC, Andrew Godfrey wrote:
The OP and the question of auto-decoding share the same root problem: Even though D does a lot better with UTF than other languages I've used, it still confuses characters with code points somewhat. "Element type is some character" is an example from OP. So clarify for me: If a programmer makes an array of either 'char' or 'wchar', does that always, unambiguously, mean a UTF8 or UTF16 code point?

It has to, because it is required by the specification. But ...

E.g. If interoperating with C code, they will never make the mistake of using these types for a non-string byte/word array?

... of course this cannot be guaranteed. In fact, even the druntime currently just assumes that program arguments and environment variables are UTF8 encoded on Unix, AFAIK. This is true in most cases, but of course not guaranteed. Potentially also problematic are the functions taking filenames. In Unix, filenames are just opaque arrays of bytes, but those functions take `string` parameters, i.e. assuming UTF8 encoding. This could force the user to place non-UTF8 sequences into strings.

Reply via email to