Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
Nico, Thanks for the comprehensive response. I've excerpted a few things for further comment - I'm fine with the proposed actions for anything not mentioned here. [A] JSON text parse failures If parsing of such an octet string as a JSON text fails, and the octet string is followed by an RS octet, the parser SHOULD nonetheless skip ahead to that RS octet and continue parsing the remainder of the sequence from there. That's a weird way of saying that, wherever the JSON text parse fails, the sequence parser should then seek forward until the next RS-valued byte. But it works for me; I'll take it. Your alternative wording whenever the JSON text parse fails, ... is fine. [D] Truncation A missing terminating LF is not a problem for strings, arrays, or objects. I seem to recall that we did discuss this. We could require that such texts fail to parse, but perhaps the more important thing is to require common parser behavior as to such truncations. You ABNF proposal is certainly more strict than the one in the I-D. I'm neutral as to whether this form or the one in the I-D (with the ws issue fixed) is better. The stricter form is clearly easier to talk about, therefore preferable, but it will mean discarding texts where only that terminating LF is truncated. I concur with both of the above paragraphs - my preference is to detect incomplete JSON texts at the sequence level via the missing LF rather than special-casing numbers and relying on failed JSON parses for everything else. In general, earlier detection of errors increases the options for dealing with them. Once the incomplete text is detected, a JSON parse could be attempted, with the JSON parser knowing that the text is incomplete (e.g., text may fail to parse, a number at the end of the text must not be produced as an incremental parse result). I'll defer to the WG on whether/how to expand the decoder ABNF to better detect incomplete texts and how to deal with any incomplete texts that are detected. As for RFC 20 ... Nits/editorial comments: idnits didn't like the reference to RFC 20 for ASCII: ** Downref: Normative reference to an Unknown state RFC: RFC 20 Is this resolved by now? I can always reference only Unicode. Keep the RFC 20 reference - I have no problem with it. Moreover, as a result of all the hubbub around this nit, the IESG has issued a Last Call to reclassify RFC 20 as an Internet Standard ... so that this never arises again ... http://www.ietf.org/mail-archive/web/ietf-announce/current/msg13546.html Thanks, --David -Original Message- From: Nico Williams [mailto:n...@cryptonector.com] Sent: Monday, December 08, 2014 11:20 PM To: Black, David Cc: General Area Review Team (gen-art@ietf.org); ops-...@ietf.org; i...@ietf.org; j...@ietf.org Subject: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09 On Fri, Dec 05, 2014 at 02:51:04PM +, Black, David wrote: This is a combined Gen-ART and OPS-Dir review. Boilerplate for both follows ... Thanks you. Minor issues: [A] Section 2.1: If parsing of such an octet string as a JSON text fails, the parser SHOULD nonetheless continue parsing the remainder of the sequence. That's not quite right - there are two levels of parsing, JSON sequence parsing and JSON text parsing of each text in the sequence, both of which might be implemented in a single-pass parser. For such an implementation, the above sentence could be (mis-)read to imply that the JSON text parse should resume from the point at which it failed, which would be silly (although I've seen heroic PL/1 parsers do exactly that). Instead, the parse needs to skip ahead to the next RS, ignoring the rest of the JSON text that failed to parse. I suggest: Good point. If parsing of such an octet string as a JSON text fails, and the octet string is followed by an RS octet, the parser SHOULD nonetheless skip ahead to that RS octet and continue parsing the remainder of the sequence from there. That's a weird way of saying that, wherever the JSON text parse fails, the sequence parser should then seek forward until the next RS-valued byte. But it works for me; I'll take it. [B] Section 2.3: Is incremental parsing of a JSON text within a sequence allowed, or is the parser required to not produce any results until the parse of the entire text is successful? I'd expect that incremental parsing is ok (so results may be produced from a text that ultimately fails to parse), and I think that's worth stating. It's permitted, of course, with the caveat that all incremental parsers have: the parse can ultimately fail. I'll add this note: Incremental JSON text parsers may be used, though of course failure to parse a given text may result after first producing some incremental parse results. to section 2.3. [C] Section 2.4: Parsers MUST
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
On Tue, Dec 09, 2014 at 04:41:12PM +, Black, David wrote: [A] JSON text parse failures [...] Your alternative wording whenever the JSON text parse fails, ... is fine. OK. [D] Truncation A missing terminating LF is not a problem for strings, arrays, or objects. I seem to recall that we did discuss this. We could require that such texts fail to parse, but perhaps the more important thing is to require common parser behavior as to such truncations. You ABNF proposal is certainly more strict than the one in the I-D. I'm neutral as to whether this form or the one in the I-D (with the ws issue fixed) is better. The stricter form is clearly easier to talk about, therefore preferable, but it will mean discarding texts where only that terminating LF is truncated. I concur with both of the above paragraphs - my preference is to detect incomplete JSON texts at the sequence level via the missing LF rather than special-casing numbers and relying on failed JSON parses for everything else. In general, earlier detection of errors increases the options for dealing with them. And, of course, a streaming/incremental parsers might well output all there is to output when only the last LF is missing but the top-level value was properly delimited anyways. So it's kinda difficult to get a fool-proof requirement that the trailing LF must be present. Your review comments included adding this note about incremental parsing. There's a conflict here between the two comments that had not been apparent to me last night. I now think that fixing the ws problem is the best way forward. Once the incomplete text is detected, a JSON parse could be attempted, with the JSON parser knowing that the text is incomplete (e.g., text may fail to parse, a number at the end of the text must not be produced as an incremental parse result). That's so for non-incremental parsers. (Or when buffering the complete text instead of handling incrementally, even though one has an incremental parser.) Consider one implementation I'm familiar with. Its JSON text parser is incremental (but not streaming), so it produces outputs with no need for extra whitespace when the input text is a string, array, or object, but for top-level numbers, booleans, and null, it needs to either read one more byte or reach EOF before it will output them. So I think we really do need to say something about top-level numbers (and true, false, and null), namely: that they must be delimited by whitespace, that 'RS1234RS' is not a valid sequence element because the number may have been truncated. (Ditto 'RStrueRS', since the intended text could have been 'trueish', which is invalid of course, but still.) As for RFC 20 ... Is this resolved by now? I can always reference only Unicode. Keep the RFC 20 reference - I have no problem with it. Moreover, as a result of all the hubbub around this nit, the IESG has issued a Last Call to reclassify RFC 20 as an Internet Standard ... so that this never arises again ... Yes, I noticed. I expect the IETF LC will pass for that. Thanks, Nico -- ___ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
On Tue, Dec 09, 2014 at 06:49:35PM +, Black, David wrote: So I think we really do need to say something about top-level numbers (and true, false, and null), namely: that they must be delimited by whitespace, that 'RS1234RS' is not a valid sequence element because the number may have been truncated. (Ditto 'RStrueRS', since the intended text could have been 'trueish', which is invalid of course, but still.) That would be more robust, as then all JSON texts in a sequence have delimiters and absence of the closing delimiter clearly indicates truncation. OK. New section 2.4 text: While objects, arrays, and strings are self-delimited in JSON texts, numbers, and the values 'true', 'false', and 'null' are not. Only whitespace can delimit the latter four kinds of values. Parsers MUST check that any JSON texts that are a top-level number, or which might be 'true', 'false', or 'null' include JSON whitespace (at least one byte matching the ws ABNF rule from RFC7159) after that value, otherwise the JSON-text may have been truncated. Note that the LF following each JSON text matches the ws ABNF rule. Parsers MUST drop JSON-text sequence elements consisting of non-self-delimited top-level values that may have been truncated (that are not delimited by whitespace). Parsers can report such texts as warnings (including, optionally, the parsed text and/or the original octet string). For example, 'RS123RS' might have been intended to carry the top-level number 123.4, but must have been truncated. Similarly, 'RStrueRS' might have been intended to carry the invalid text 'trueish'. 'RStruefalesRS' is not two top-level values, 'true', and 'false'; it is simply not a valid JSON text. This is the only place where the ws rule comes up, so merely saying at least one byte matching it should suffice. I'm also adding this following the above, based on your comment about incremental parsers: Implementations may produce a value when parsing 'RSfooRS' because their JSON text parser might be able to consume bytes incrementally, and since the JSON text in this case is a self-delimiting top-level value, the parser can produce the result without consuming an additional byte. Such implementations should skip to the next RS byte, possibly reporting any intervening non-whitespace bytes. (yes, I think this should be a 'should', not a 'SHOULD'). Nico -- ___ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
On Wed, Dec 10, 2014 at 10:52:22AM +1000, Matthew Kerwin wrote: On 10 December 2014 at 10:49, Nico Williams n...@cryptonector.com wrote: (yes, I think this should be a 'should', not a 'SHOULD'). Not an 'ought to'? I'll take that. ___ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
This looks good, and the explanation of what's going on (numbers, 'true', 'false' and 'null' lack delimiters) is a useful addition. One small nit: 'RStruefalesRS' is not two top-level values, 'true', and 'false'; it is simply not a valid JSON text. truefales - truefalse Thanks, --David On Tue, Dec 09, 2014 at 06:49:35PM +, Black, David wrote: So I think we really do need to say something about top-level numbers (and true, false, and null), namely: that they must be delimited by whitespace, that 'RS1234RS' is not a valid sequence element because the number may have been truncated. (Ditto 'RStrueRS', since the intended text could have been 'trueish', which is invalid of course, but still.) That would be more robust, as then all JSON texts in a sequence have delimiters and absence of the closing delimiter clearly indicates truncation. OK. New section 2.4 text: While objects, arrays, and strings are self-delimited in JSON texts, numbers, and the values 'true', 'false', and 'null' are not. Only whitespace can delimit the latter four kinds of values. Parsers MUST check that any JSON texts that are a top-level number, or which might be 'true', 'false', or 'null' include JSON whitespace (at least one byte matching the ws ABNF rule from RFC7159) after that value, otherwise the JSON-text may have been truncated. Note that the LF following each JSON text matches the ws ABNF rule. Parsers MUST drop JSON-text sequence elements consisting of non-self-delimited top-level values that may have been truncated (that are not delimited by whitespace). Parsers can report such texts as warnings (including, optionally, the parsed text and/or the original octet string). For example, 'RS123RS' might have been intended to carry the top-level number 123.4, but must have been truncated. Similarly, 'RStrueRS' might have been intended to carry the invalid text 'trueish'. 'RStruefalesRS' is not two top-level values, 'true', and 'false'; it is simply not a valid JSON text. This is the only place where the ws rule comes up, so merely saying at least one byte matching it should suffice. I'm also adding this following the above, based on your comment about incremental parsers: Implementations may produce a value when parsing 'RSfooRS' because their JSON text parser might be able to consume bytes incrementally, and since the JSON text in this case is a self-delimiting top-level value, the parser can produce the result without consuming an additional byte. Such implementations should skip to the next RS byte, possibly reporting any intervening non-whitespace bytes. (yes, I think this should be a 'should', not a 'SHOULD'). Nico -- ___ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
On 10 December 2014 at 10:49, Nico Williams n...@cryptonector.com wrote: (yes, I think this should be a 'should', not a 'SHOULD'). Not an 'ought to'? -- Matthew Kerwin http://matthew.kerwin.net.au/ ___ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
On Fri, Dec 05, 2014 at 02:51:04PM +, Black, David wrote: This is a combined Gen-ART and OPS-Dir review. Boilerplate for both follows ... Thanks you. Minor issues: [A] Section 2.1: If parsing of such an octet string as a JSON text fails, the parser SHOULD nonetheless continue parsing the remainder of the sequence. That's not quite right - there are two levels of parsing, JSON sequence parsing and JSON text parsing of each text in the sequence, both of which might be implemented in a single-pass parser. For such an implementation, the above sentence could be (mis-)read to imply that the JSON text parse should resume from the point at which it failed, which would be silly (although I've seen heroic PL/1 parsers do exactly that). Instead, the parse needs to skip ahead to the next RS, ignoring the rest of the JSON text that failed to parse. I suggest: Good point. If parsing of such an octet string as a JSON text fails, and the octet string is followed by an RS octet, the parser SHOULD nonetheless skip ahead to that RS octet and continue parsing the remainder of the sequence from there. That's a weird way of saying that, wherever the JSON text parse fails, the sequence parser should then seek forward until the next RS-valued byte. But it works for me; I'll take it. [B] Section 2.3: Is incremental parsing of a JSON text within a sequence allowed, or is the parser required to not produce any results until the parse of the entire text is successful? I'd expect that incremental parsing is ok (so results may be produced from a text that ultimately fails to parse), and I think that's worth stating. It's permitted, of course, with the caveat that all incremental parsers have: the parse can ultimately fail. I'll add this note: Incremental JSON text parsers may be used, though of course failure to parse a given text may result after first producing some incremental parse results. to section 2.3. [C] Section 2.4: Parsers MUST check that any JSON texts that are a top-level number include JSON whitespace (ws ABNF rule from [RFC7159]) after the number, otherwise the JSON-text may have been truncated. That reference to the ws rule doesn't get the job done because that rule allows a match to no characters - it's of the form ws = *( ... ) where ... is the list of whitespace characters. What's needed here is a rule of the form vws = 1*( ...) to force there to be at least one whitespace character, but see the next issue for a better way to deal with this topic by pulling the appended LF into the sequence parse instead of the text parse. I'd wanted to not have to list the characters that are considered whitespace. I agree that the ws rule is not appropriate though. [D] I wonder whether the possibility of incomplete texts ought to be encoded into the parsing rules to directly catch JSON texts that must be incomplete because the last character is not LF, e.g.: A missing terminating LF is not a problem for strings, arrays, or objects. I seem to recall that we did discuss this. We could require that such texts fail to parse, but perhaps the more important thing is to require common parser behavior as to such truncations. You ABNF proposal is certainly more strict than the one in the I-D. I'm neutral as to whether this form or the one in the I-D (with the ws issue fixed) is better. The stricter form is clearly easier to talk about, therefore preferable, but it will mean discarding texts where only that terminating LF is truncated. One problem we get into with your ABNF is that RFC5234 does not discuss greediness (this came up on the list). But this can be noted (see below). One nice thing about the stricter rules is that we need not discuss top-level numbers (or booleans, or null) with normative text, just note them. JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON)) RS = %x1E; record separator (RS), see RFC20 possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded ; JSON text (see RFC7159) truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt ; to parse as JSON text empty-JSON = LF ; only the LF appended by the encoder, nothing to parse not-RS = %x00-1D / %x1F-FF; any octet other than RS not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF Note that this won't detect all incomplete JSON texts, because LF is allowed within a JSON text (and this should be stated). It will if ABNF matching is greedy and possible-JSON is defined as possible-JSON = 1*( 1*(not-RS) LF); ... Advice as to which form to take? [E] Section 3 - Security Considerations Incomplete and malformed JSON texts can be used to attack JSON parsers - that should be pointed out, as I don't see that in RFC 7159's security considerations and incomplete texts are a relevant
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
On 12/5/14 4:38 PM, Barry Leiba wrote: Hi, David. One note on your review: idnits didn't like the reference to RFC 20 for ASCII: ** Downref: Normative reference to an Unknown state RFC: RFC 20 RFC 5234 (ABNF) uses this, which looks like a better reference: [US-ASCII] American National Standards Institute, Coded Character Set -- 7-bit American Standard Code for Information Interchange, ANSI X3.4, 1986. Except that (1) many IETF documents do use RFC 20 and (2) the RFC 20 reference is not for ASCII: it's for RS, the Record Separator character, which is explained in RFC 20, Section 5.2. And as per BCP97, RFC 3967, section 3, this downref was specifically called out in the Last Call for this document. pr -- Pete Resnickhttp://www.qualcomm.com/~presnick/ Qualcomm Technologies, Inc. - +1 (858)651-4478 ___ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
Hi, David. One note on your review: idnits didn't like the reference to RFC 20 for ASCII: ** Downref: Normative reference to an Unknown state RFC: RFC 20 RFC 5234 (ABNF) uses this, which looks like a better reference: [US-ASCII] American National Standards Institute, Coded Character Set -- 7-bit American Standard Code for Information Interchange, ANSI X3.4, 1986. Except that (1) many IETF documents do use RFC 20 and (2) the RFC 20 reference is not for ASCII: it's for RS, the Record Separator character, which is explained in RFC 20, Section 5.2. Barry ___ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art
Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
Ok, it is a nit, as is much of what idnits produces ;-). I have no problem ignoring idnits on this one ... Thanks, --David -Original Message- From: barryleiba.mailing.li...@gmail.com [mailto:barryleiba.mailing.li...@gmail.com] On Behalf Of Barry Leiba Sent: Friday, December 05, 2014 5:39 PM To: Black, David Cc: n...@cryptonector.com; General Area Review Team (gen-art@ietf.org); ops- d...@ietf.org; i...@ietf.org; j...@ietf.org Subject: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09 Hi, David. One note on your review: idnits didn't like the reference to RFC 20 for ASCII: ** Downref: Normative reference to an Unknown state RFC: RFC 20 RFC 5234 (ABNF) uses this, which looks like a better reference: [US-ASCII] American National Standards Institute, Coded Character Set -- 7-bit American Standard Code for Information Interchange, ANSI X3.4, 1986. Except that (1) many IETF documents do use RFC 20 and (2) the RFC 20 reference is not for ASCII: it's for RS, the Record Separator character, which is explained in RFC 20, Section 5.2. Barry ___ Gen-art mailing list Gen-art@ietf.org https://www.ietf.org/mailman/listinfo/gen-art