Re: TDPL reaches Thermopylae level

Andrei Alexandrescu Fri, 30 Oct 2009 20:26:21 -0700

Justin Johansson wrote:

Andrei Alexandrescu Wrote:
Lars T. Kyllingstad wrote:
Nick Sabalausky wrote:
"Chris Nicholson-Sauls" <ibisbase...@gmail.com> wrote in messagenews:hcctuf$140...@digitalmars.com...
Granted LTR is common enough to be expectable and acceptable. To beperfectly honest, I don't believe I have *ever* even usedwchar/wstring. Char/string gosh yes; dchar/dstring quite a bit aswell, where I need the simplicity; but I've yet to feel much need forthe "weirdo" middle child of UTF.
Given that just about anything outside of D (at least as far as I'veseen) that attempts to use unicode does so with UTF-16 (or just usesUCS-2 and pretends that's UTF-16...), wchar and wstring are great fordealing with that. For instance, my Goldie engine for GOLD currentlyuses wchar in a number of places because GOLD's .cfg format storestext in...well, presumably UTF-16 (I haven't tested to see if it'sreally UCS-2). But yea, as long as you're not dealing with anythingthat's already in UTF-16 or that expects it, then it does seem to besomewhat questionable.
I think this says it all:
http://en.wikipedia.org/wiki/Utf-16#Use_in_major_operating_systems_and_environments
-Lars :)
Yep, there was a frenzy when UCS-2 came about: everybody thought twobytes will be enough for everyone. So UCS-2 was widely adopted - whowouldn't love to have constant character width? Then, the UTF-16surrogate business came about, and the only logical step they could takewas to migrate to UTF-16, which was upward compatible to UCS-2. Ipersonally think UTF-8 is a better overall design though.
Andrei
"I personally think UTF-8 is a better overall design though."

Unicode Technical Note #12 by The Unicode Consortium apparently disagree,
recommending UTF-16 for Processing.

http://unicode.org/notes/tn12/

The major claim in the TN is that Unicode is optimized for UTF-16.  The rest of
the argument looks like a VHS (everyone is using it i.e. UTF-16) versus Beta 
argument.

So who's right?  My personal view is that whilst they are the *Unicode 
Consortium*,
I have great difficulty in accepting UTF-16 as the one-and-holy encoding.
FWIW, there was a subthread during a discussion about the ordained features ofprogramming languages on LtU a while back.
http://lambda-the-ultimate.org/node/3166#comment-46233
What Are The Resolved Debates in General Purpose Language Design?

Its a long discussion so easier to search for UTF or Unicode on the page if 
you're interested.

cheers
Justin Johansson

Thanks for the pointers. One of the reasons for which I like the designof UTF-8 is its generality: it's a variable-length code for any numberof 31 bits. In contrast, UTF-16 is a relies on specific dead zonesinside the assigned space. But the authors of the unicode.org article domake a few good points, such as there not being any invalid UTF-16symbol. But then that actually can be seen as a strength of UTF-8 - thebinary files that are actually UTF-8 files are statistically so scarce,UTF-8 has a very solid method of checking whether a file is UTF-8 orsomething else.



Andrei

Re: TDPL reaches Thermopylae level

Reply via email to