Hi David.

L. David Baron:
> This algorithm seems incorrect in two ways:
> 
>  * It gets the ranges for high and low surrogates backwards.  (High
>    surrogates are U+D800 - U+DBFF, low surrogates are U+DC00 -
>    U+DFFF, and in UTF-16 a surrogate pair is a high surrogate
>    followed by a low surrogate.  So swapping the ranges in the
>    headings should make the algorithm correct, modulo the next
>    point.  But you should definitely double-check this. :-)

Ouch, you’re right.

>  * It incorrectly handles unpaired high surrogates by eating the
>    next character.  Instead, I would prefer that the unpaired high
>    surrogate is replaced by U+FFFD, and the following character is
>    interpreted normally.  (That's what Gecko does, anyway.
>    Furthermore, I think it makes sense to match the handling of
>    unpaired low surrogates.)

I meant to do that initially, dunno what went wrong.  Should be fixed
now.

  http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode

Thanks,

Cameron

-- 
Cameron McCormack ≝ http://mcc.id.au/

Reply via email to