Hi David. L. David Baron: > This algorithm seems incorrect in two ways: > > * It gets the ranges for high and low surrogates backwards. (High > surrogates are U+D800 - U+DBFF, low surrogates are U+DC00 - > U+DFFF, and in UTF-16 a surrogate pair is a high surrogate > followed by a low surrogate. So swapping the ranges in the > headings should make the algorithm correct, modulo the next > point. But you should definitely double-check this. :-)
Ouch, you’re right. > * It incorrectly handles unpaired high surrogates by eating the > next character. Instead, I would prefer that the unpaired high > surrogate is replaced by U+FFFD, and the following character is > interpreted normally. (That's what Gecko does, anyway. > Furthermore, I think it makes sense to match the handling of > unpaired low surrogates.) I meant to do that initially, dunno what went wrong. Should be fixed now. http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode Thanks, Cameron -- Cameron McCormack ≝ http://mcc.id.au/