Re: Inconsitency

anonymous Sun, 13 Oct 2013 10:35:35 -0700

On Sunday, 13 October 2013 at 16:31:58 UTC, nickles wrote:

However, it could also yield the first code unit of the umlautdiacritic, depending on how the string is represented.
This is not true for UTF-8, which is not subject to "endianism".

This is not about endianness. It's "\u00E4" vs "a\u0308". Thefirst is the single code point 'ä', the second is two codepoints, 'a' plus umlaut dots.


[...]

Well that's a point; on the other hand, D is constantlycreating and throwing away new strings, so this isn't quite anargument. The current solution puts the programmer in charge ofdealing with UTF-x, where a more consistent implementationwould put the burden on the implementors of the libraries/core,i.e. the ones who usually have a better understanding ofUnicode than the average programmer.
Also, implementing such a semantics would not per se abandon abyte-wise access, would it?
So, how do you guys handle UTF-8 strings in D? What are yoursolutions to the problems described? Does it all come down toconverting "string"s and "wstring"s to "dstring"s, manipulatingthem, and re-convert them to "string"s? Btw, what would thismean in terms of speed?
These is no irony in my questions. I'm really looking forsolutions...

I think, std.uni and std.utf are supposed to supply everythingUnicode.

Re: Inconsitency

Reply via email to