Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings

Laszlo Ersek Fri, 22 Mar 2013 08:42:05 -0700

On 03/22/13 15:37, Markus Armbruster wrote:
> Laszlo Ersek <ler...@redhat.com> writes:
> 
>> On 03/14/13 18:49, Markus Armbruster wrote:
>>> These are all broken, too.
>>
>> What are "these"? And how are they broken? And how does the patch fix them?
> 
> "These" refers to the subject: noncharacters other than U+FFFE, U+FFFF.
> 
> I agree that I should better explain how they're broken, and what the
> patch does to fix them.  Will fix on respin.
> 
>>>
>>> A few test cases use noncharacters U+FFFF and U+10FFFF.  Risks testing
>>> noncharacters some more instead of what they're supposed to test.  Use
>>> U+FFFD and U+10FFFD instead.
>>>
>>> Signed-off-by: Markus Armbruster <arm...@redhat.com>
>>> ---
>>>  tests/check-qjson.c | 85
>>> +++++++++++++++++++++++++++++++++++++++++++++--------
>>>  1 file changed, 72 insertions(+), 13 deletions(-)
>>
>> I'm confused about the commit message. There are three paragraphs in it
>> (the title, the first paragraph, and the 2nd paragraph). This patch
>> modifies different tests:
>>
>>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>>> index 852124a..efec1b2 100644
>>> --- a/tests/check-qjson.c
>>> +++ b/tests/check-qjson.c
>>> @@ -158,7 +158,7 @@ static void utf8_string(void)
>>>       * consider using overlong encoding \xC0\x80 for U+0000 ("modified
>>>       * UTF-8").
>>>       *
>>> -     * Test cases are scraped from Markus Kuhn's UTF-8 decoder
>>> +     * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
>>>       * capability and stress test at
>>>       * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>>>       */
>>> @@ -256,11 +256,11 @@ static void utf8_string(void)
>>>              "\xDF\xBF",
>>>              "\"\\u07FF\"",
>>>          },
>>> -        /* 2.2.3  3 bytes U+FFFF */
>>> +        /* 2.2.3  3 bytes U+FFFD */
>>>          {
>>> -            "\"\xEF\xBF\xBF\"",
>>> -            "\xEF\xBF\xBF",
>>> -            "\"\\uFFFF\"",
>>> +            "\"\xEF\xBF\xBD\"",
>>> +            "\xEF\xBF\xBD",
>>> +            "\"\\uFFFD\"",
>>>          },
>>
>> This is under "2.2  Last possible sequence of a certain length". I guess
> 
> Which is in turn under "2  Boundary condition test cases".
> 
>> this is where you say "last possible sequence of a certain length,
>> encoding a character (= non-noncharacter)". OK, p#2.
> 
> Yes.
> 
> The test's purpose is testing the upper bound of 3-byte sequences is
> decoded correctly.
> 
> The upper bound is U+FFFF.  Since that's a noncharacter, the parser
> should reject it (or maybe replace), the formatter should replace it.
> Trouble is it could be misdecoded and then rejected / replaced.
> 
> Besides, U+FFFF already gets tested along with the other noncharacters
> under "5.3  Other illegal code positions".
> 
> Next in line is U+FFFE, also a noncharacter, also under 5.3.
> 
> Next in line is U+FFFD, which I picked.
> 
> But that gets tested under "2.3  Other boundary conditions"!  I guess I
> either drop it there, or make this one U+FFFC.
> 
> I think testing U+FFFC here makes sense, because U+FFFD could be
> misdecoded, then replaced by U+FFFD.
> 
> What do you think?


I think that we're extending Markus Kuhn's test suite, basically taking
random shots at where one specific parser's/formatter's weak spots might
be :)

That said, with intelligent fuzzing out of scope / capacity, U+FFFC
could be a good pick.

I also think I'm a quite a useless person to ask for thoughts in this
area :)

Thanks,
Laszlo

Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings

Reply via email to