On 6/3/16 4:39 PM, ag0aep6g wrote:
On 06/03/2016 10:18 PM, Steven Schveighoffer wrote:
But you can get a standalone code unit that is part of a coded sequence
quite easily

foo(string s)
{
    auto x = s[0];
    dchar d = x;
}

I don' think we're disagreeing on anything.

I'm calling UTF-8 code units below 0x80 "standalone" code units. They're
never part of multibyte sequences. Your _dchar_convert returns them
unscathed.

Ah, I thought you meant standalone as in it was assigned to a standalone char variable vs. part of an array or range. My mistake.

Re-reading your original message, I see that should have been clear to me...

So we need most efficient logic that does this:

if(c & 0x80)
     return wchar(0xd800 + c);

Is this going to be faster than returning a constant invalid wchar?

No, but I like the idea of preserving the erroneous character you tried to convert.

But is there an invalid wchar? I looked through the wikipedia article on UTF 16, and it didn't seem to say there was one.

If we use U+FFFD, that signifies a coding problem but is still a valid code point. However, doing a wchar in the D800 - D8FF range without being followed by a code unit in the DC00 - DFFF range is an invalid sequence. D throws if it encounters such a thing.

-Steve

Reply via email to