On 6/3/16 4:39 PM, ag0aep6g wrote:
On 06/03/2016 10:18 PM, Steven Schveighoffer wrote:
But you can get a standalone code unit that is part of a coded sequence
quite easily
foo(string s)
{
auto x = s[0];
dchar d = x;
}
I don' think we're disagreeing on anything.
I'm calling UTF-8 code units below 0x80 "standalone" code units. They're
never part of multibyte sequences. Your _dchar_convert returns them
unscathed.
Ah, I thought you meant standalone as in it was assigned to a standalone
char variable vs. part of an array or range. My mistake.
Re-reading your original message, I see that should have been clear to me...
So we need most efficient logic that does this:
if(c & 0x80)
return wchar(0xd800 + c);
Is this going to be faster than returning a constant invalid wchar?
No, but I like the idea of preserving the erroneous character you tried
to convert.
But is there an invalid wchar? I looked through the wikipedia article on
UTF 16, and it didn't seem to say there was one.
If we use U+FFFD, that signifies a coding problem but is still a valid
code point. However, doing a wchar in the D800 - D8FF range without
being followed by a code unit in the DC00 - DFFF range is an invalid
sequence. D throws if it encounters such a thing.
-Steve