Eric Blake <ebl...@redhat.com> writes: > On 08/13/2018 01:11 AM, Markus Armbruster wrote: > >>>>> Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that >>>>> is not valid Unicode, even if it IS a valid interpretation of UTF-8 >>>>> encoding. >>>> >>>> Correct. Testing how we handle such sequences makes sense all the same. >>>> >>>>>> { >>>>>> - "\"\xF7\xBF\xBF\xBF\"", >>>>>> + "\xF7\xBF\xBF\xBF", >>>>>> NULL, /* bug: rejected */ >>> >>> So, maybe all the more we need to do is remove the comment (as we WANT >>> to reject these)? >> >> Is PATCH 20 doing what you suggest? > > Yes, I think you get there in the end, it was more a question of churn > in the meantime.
Modest churn, I think. PATCH 09 adds some ten bug: comments that go away in "[PATCH 21/56] json: Reject invalid UTF-8 sequences" (some might go a bit later, didn't check). I put my announcement of intent "[PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected" right before its implementation in PATCH 21. Having PATCH 20 in place before PATCH 09 would avoid the bug: comment churn, but it would also separate announcement of intent from implementation. Seems doubtful to me. >>>>> >>>>> The conversion of the initializer looks sane (well, mechanical). Ergo: >>>>> >>>>> Reviewed-by: Eric Blake <ebl...@redhat.com> >>>> >>>> Thanks! >>> >>> Of course, playing games with the pre-existing comments on >>> out-of-range behavior is probably better for a separate patch, and you >>> do have some churn on these tests in later patches. I'll leave it up >>> to you what to do (or leave put). >>