Re: The Case For Autodecode

Steven Schveighoffer via Digitalmars-d Fri, 03 Jun 2016 14:16:31 -0700

On 6/3/16 4:39 PM, ag0aep6g wrote:

On 06/03/2016 10:18 PM, Steven Schveighoffer wrote:

But you can get a standalone code unit that is part of a coded sequence
quite easily


foo(string s)
{
    auto x = s[0];
    dchar d = x;
}


I don' think we're disagreeing on anything.

I'm calling UTF-8 code units below 0x80 "standalone" code units. They're
never part of multibyte sequences. Your _dchar_convert returns them
unscathed.

Ah, I thought you meant standalone as in it was assigned to a standalonechar variable vs. part of an array or range. My mistake.


Re-reading your original message, I see that should have been clear to me...

So we need most efficient logic that does this:

if(c & 0x80)
     return wchar(0xd800 + c);


Is this going to be faster than returning a constant invalid wchar?

No, but I like the idea of preserving the erroneous character you triedto convert.

But is there an invalid wchar? I looked through the wikipedia article onUTF 16, and it didn't seem to say there was one.

If we use U+FFFD, that signifies a coding problem but is still a validcode point. However, doing a wchar in the D800 - D8FF range withoutbeing followed by a code unit in the DC00 - DFFF range is an invalidsequence. D throws if it encounters such a thing.


-Steve

Re: The Case For Autodecode

Reply via email to