Re: Unicode problems?

Chris Nicholson-Sauls Mon, 16 Feb 2009 06:40:09 -0800

Daniel Keep wrote:


Trass3r wrote:

Wikipedia states that D still has some Unicode problems:
"Operations on Unicode strings are unintuitive (compiler accepts Unicode
source code, standard library and foreach constructs operate on UTF-8,
but string slicing and length property operate on bytes rather than
characters)."

Is this information correct?


They're not bugs, if that's what you mean.  It's just a side-effect of
how Unicode works.

http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD

Long story short: they operate on bytes because operating on actual code
points can't be done efficiently [1].

  -- Daniel

[1] Given that strings are implemented as arrays with a given,
non-changing width and that you're not using UTF-32 which no one does
because it's too big and that we don't add some fancy caching stuff to
char[] arrays specifically, blah blah blah.

I use UTF-32, at least occasionally. In cases where I specificallyexpect/encourage multilingual support/use, it can simplify mattersgreatly, where those otherwise inefficient operations become common.


-- Chris Nicholson-Sauls

Re: Unicode problems?

Reply via email to