Eric Blake <ebl...@redhat.com> writes: > On 08/17/2018 10:05 AM, Markus Armbruster wrote: >> The JSON parser treats each half of a surrogate pair as unpaired >> surrogate. Fix it to recognize surrogate pairs. >> >> Signed-off-by: Markus Armbruster <arm...@redhat.com> >> Reviewed-by: Eric Blake <ebl...@redhat.com> > > I might have dropped the R-b, to ensure the changes since v1 get > re-reviewed.
I intended to, but screwed up. My apologies. >> --- >> qobject/json-parser.c | 60 ++++++++++++++++++++++++++++--------------- >> tests/check-qjson.c | 3 +-- >> 2 files changed, 40 insertions(+), 23 deletions(-) >> > >> @@ -157,22 +169,28 @@ static QString *parse_string(JSONParserContext *ctxt, >> JSONToken *token) >> qstring_append_chr(str, '\t'); >> break; >> case 'u': >> - cp = 0; >> - for (i = 0; i < 4; i++) { >> - if (!qemu_isxdigit(*ptr)) { >> - parse_error(ctxt, token, >> - "invalid hex escape sequence in >> string"); >> - goto out; >> + cp = cvt4hex(ptr); >> + ptr += 4; >> + >> + /* handle surrogate pairs */ >> + if (cp >= 0xD800 && cp <= 0xDBFF >> + && ptr[0] == '\\' && ptr[1] == 'u') { >> + /* leading surrogate followed by \u */ >> + cp = 0x10000 + ((cp & 0x3FF) << 10); >> + trailing = cvt4hex(ptr + 2); >> + if (trailing >= 0xDC00 && trailing <= 0xDFFF) { >> + /* followed by trailing surrogate */ >> + cp |= trailing & 0x3FF; >> + ptr += 6; >> + } else { >> + cp = -1; /* invalid */ >> } >> - cp <<= 4; >> - cp |= hex2decimal(*ptr); >> - ptr++; >> } >> if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), >> cp) < 0) { >> parse_error(ctxt, token, >> - "\\u%.4s is not a valid Unicode character", >> - ptr - 3); >> + "%.*s is not a valid Unicode character", >> + (int)(ptr - beg), beg); > > The error reporting here has indeed been improved over v1. > > Reviewed-by: Eric Blake <ebl...@redhat.com> Thanks!