Re: The Case For Autodecode

ag0aep6g via Digitalmars-d Fri, 03 Jun 2016 13:42:06 -0700

On 06/03/2016 10:18 PM, Steven Schveighoffer wrote:

But you can get a standalone code unit that is part of a coded sequence
quite easily


foo(string s)
{
    auto x = s[0];
    dchar d = x;
}


I don' think we're disagreeing on anything.

I'm calling UTF-8 code units below 0x80 "standalone" code units. They'renever part of multibyte sequences. Your _dchar_convert returns themunscathed.

Higher code units are always part of multibyte sequences (or invalidalready). Your function returns invalid code points for them.

_dchar_convert does exactly what I meant, except that I had in mindreturning the replacement character for non-standalone code units. But Isee that that may not be feasible, and it's probably not necessary.


[...]

So we need most efficient logic that does this:

if(c & 0x80)
     return wchar(0xd800 + c);


Is this going to be faster than returning a constant invalid wchar?

else
     return wchar(c);

More expensive, but more correct!

wchar to dchar conversion is pretty sound, as the surrogate pairs are
invalid code points for dchar.

-Steve

Re: The Case For Autodecode

Reply via email to