On Monday, 25 August 2014 at 02:40:20 UTC, Vladimir Panteleev wrote:
On Monday, 25 August 2014 at 01:31:35 UTC, H. S. Teoh via Digitalmars-d wrote:
In D, an array of char, wchar, or dchar always means a Unicode encoding. Non-Unicode encodings should be represented as ubyte[] (resp. ushort[]
or ulong[], if such exist) instead.

This doesn't get you far in practice if you want to actually operate on the text.

Well, all of the non-string specific stuff (like find) will work just find, but since all of the string-specific functions assume UTF-8, UTF-16, or UTF-32, you'll have to convert it. We can't really do otherwise, because you have to know what encoding you're dealing with to operate on it as a string, and than means that you need to either call specific functions which expect the encoding that you're using, or you need types specific to those encodings (in which case, you wouldn't use ubyte[] and the like directly).

We do need better support for other encodings, but I don't think that it really costs us anything to treat char as UTF-8, wchar as UTF-16, and dchar as UTF-32 and require that other encodings use different representations.

- Jonathan M Davis

Reply via email to