Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-09 Thread Black, David
Nico,

Thanks for the comprehensive response.

I've excerpted a few things for further comment - I'm fine with the
proposed actions for anything not mentioned here.

[A] JSON text parse failures

 If parsing of such an octet string as a JSON text fails, and the
 octet string is followed by an RS octet, the parser
 SHOULD nonetheless skip ahead to that RS octet and continue parsing
 the remainder of the sequence from there.
 
 That's a weird way of saying that, wherever the JSON text parse fails,
 the sequence parser should then seek forward until the next RS-valued
 byte.  But it works for me; I'll take it.

Your alternative wording whenever the JSON text parse fails, ... is fine.

[D] Truncation

 A missing terminating LF is not a problem for strings, arrays, or
 objects.  I seem to recall that we did discuss this.  We could require
 that such texts fail to parse, but perhaps the more important thing is
 to require common parser behavior as to such truncations.
 
 You ABNF proposal is certainly more strict than the one in the I-D.  I'm
 neutral as to whether this form or the one in the I-D (with the ws issue
 fixed) is better.  The stricter form is clearly easier to talk about,
 therefore preferable, but it will mean discarding texts where only that
 terminating LF is truncated.

I concur with both of the above paragraphs - my preference is to detect
incomplete JSON texts at the sequence level via the missing LF rather than
special-casing numbers and relying on failed JSON parses for everything else.
In general, earlier detection of errors increases the options for dealing
with them.

Once the incomplete text is detected, a JSON parse could be attempted,
with the JSON parser knowing that the text is incomplete (e.g., text
may fail to parse, a number at the end of the text must not be produced
as an incremental parse result).

I'll defer to the WG on whether/how to expand the decoder ABNF to better
detect incomplete texts and how to deal with any incomplete texts that are
detected.

As for RFC 20 ...

  Nits/editorial comments:
 
  idnits didn't like the reference to RFC 20 for ASCII:
 
** Downref: Normative reference to an Unknown state RFC: RFC   20
 
 Is this resolved by now?  I can always reference only Unicode.

Keep the RFC 20 reference - I have no problem with it.  Moreover, as a
result of all the hubbub around this nit, the IESG has issued a Last Call
to reclassify RFC 20 as an Internet Standard ... so that this never
arises again ...

http://www.ietf.org/mail-archive/web/ietf-announce/current/msg13546.html

Thanks,
--David

 -Original Message-
 From: Nico Williams [mailto:n...@cryptonector.com]
 Sent: Monday, December 08, 2014 11:20 PM
 To: Black, David
 Cc: General Area Review Team (gen-art@ietf.org); ops-...@ietf.org;
 i...@ietf.org; j...@ietf.org
 Subject: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
 
 On Fri, Dec 05, 2014 at 02:51:04PM +, Black, David wrote:
  This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows
 ...
 
 Thanks you.
 
  Minor issues:
 
  [A] Section 2.1:
 
 If parsing of such an octet string as a JSON text fails, the parser
 SHOULD nonetheless continue parsing the remainder of the sequence.
 
  That's not quite right - there are two levels of parsing, JSON
  sequence parsing and JSON text parsing of each text in the sequence,
  both of which might be implemented in a single-pass parser.  For such an
  implementation, the above sentence could be (mis-)read to imply that the
  JSON text parse should resume from the point at which it failed, which
  would be silly (although I've seen heroic PL/1 parsers do exactly that).
  Instead, the parse needs to skip ahead to the next RS, ignoring the rest
  of the JSON text that failed to parse.  I suggest:
 
 Good point.
 
 If parsing of such an octet string as a JSON text fails, and the
 octet string is followed by an RS octet, the parser
 SHOULD nonetheless skip ahead to that RS octet and continue parsing
 the remainder of the sequence from there.
 
 That's a weird way of saying that, wherever the JSON text parse fails,
 the sequence parser should then seek forward until the next RS-valued
 byte.  But it works for me; I'll take it.
 
  [B] Section 2.3:
 
  Is incremental parsing of a JSON text within a sequence allowed, or
  is the parser required to not produce any results until the parse of
  the entire text is successful?  I'd expect that incremental parsing
  is ok (so results may be produced from a text that ultimately fails
  to parse), and I think that's worth stating.
 
 It's permitted, of course, with the caveat that all incremental parsers
 have: the parse can ultimately fail.  I'll add this note:
 
Incremental JSON text parsers may be used, though of course failure
to parse a given text may result after first producing some
incremental parse results.
 
 to section 2.3.
 
  [C] Section 2.4:
 
 Parsers MUST

Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-09 Thread Nico Williams
On Tue, Dec 09, 2014 at 04:41:12PM +, Black, David wrote:
 [A] JSON text parse failures
  [...]
 
 Your alternative wording whenever the JSON text parse fails, ... is fine.

OK.

 [D] Truncation
 
  A missing terminating LF is not a problem for strings, arrays, or
  objects.  I seem to recall that we did discuss this.  We could require
  that such texts fail to parse, but perhaps the more important thing is
  to require common parser behavior as to such truncations.
  
  You ABNF proposal is certainly more strict than the one in the I-D.  I'm
  neutral as to whether this form or the one in the I-D (with the ws issue
  fixed) is better.  The stricter form is clearly easier to talk about,
  therefore preferable, but it will mean discarding texts where only that
  terminating LF is truncated.
 
 I concur with both of the above paragraphs - my preference is to detect
 incomplete JSON texts at the sequence level via the missing LF rather than
 special-casing numbers and relying on failed JSON parses for everything else.
 In general, earlier detection of errors increases the options for dealing
 with them.

And, of course, a streaming/incremental parsers might well output all
there is to output when only the last LF is missing but the top-level
value was properly delimited anyways.  So it's kinda difficult to get a
fool-proof requirement that the trailing LF must be present.

Your review comments included adding this note about incremental
parsing.  There's a conflict here between the two comments that had not
been apparent to me last night.  I now think that fixing the ws problem
is the best way forward.

 Once the incomplete text is detected, a JSON parse could be attempted,
 with the JSON parser knowing that the text is incomplete (e.g., text
 may fail to parse, a number at the end of the text must not be produced
 as an incremental parse result).

That's so for non-incremental parsers.  (Or when buffering the complete
text instead of handling incrementally, even though one has an
incremental parser.)

Consider one implementation I'm familiar with.  Its JSON text parser is
incremental (but not streaming), so it produces outputs with no need for
extra whitespace when the input text is a string, array, or object, but
for top-level numbers, booleans, and null, it needs to either read one
more byte or reach EOF before it will output them.

So I think we really do need to say something about top-level numbers
(and true, false, and null), namely: that they must be delimited by
whitespace, that 'RS1234RS' is not a valid sequence element because
the number may have been truncated.  (Ditto 'RStrueRS', since the
intended text could have been 'trueish', which is invalid of course, but
still.)

 As for RFC 20 ...
 
  Is this resolved by now?  I can always reference only Unicode.
 
 Keep the RFC 20 reference - I have no problem with it.  Moreover, as a
 result of all the hubbub around this nit, the IESG has issued a Last Call
 to reclassify RFC 20 as an Internet Standard ... so that this never
 arises again ...

Yes, I noticed.  I expect the IETF LC will pass for that.

Thanks,

Nico
-- 

___
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art


Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-09 Thread Nico Williams
On Tue, Dec 09, 2014 at 06:49:35PM +, Black, David wrote:
  So I think we really do need to say something about top-level numbers
  (and true, false, and null), namely: that they must be delimited by
  whitespace, that 'RS1234RS' is not a valid sequence element because
  the number may have been truncated.  (Ditto 'RStrueRS', since the
  intended text could have been 'trueish', which is invalid of course, but
  still.)
 
 That would be more robust, as then all JSON texts in a sequence have
 delimiters and absence of the closing delimiter clearly indicates
 truncation.

OK.

New section 2.4 text:

   While objects, arrays, and strings are self-delimited in JSON texts,
   numbers, and the values 'true', 'false', and 'null' are not.  Only
   whitespace can delimit the latter four kinds of values.

   Parsers MUST check that any JSON texts that are a top-level number,
   or which might be 'true', 'false', or 'null' include JSON whitespace
   (at least one byte matching the ws ABNF rule from RFC7159) after
   that value, otherwise the JSON-text may have been truncated.  Note
   that the LF following each JSON text matches the ws ABNF rule.

   Parsers MUST drop JSON-text sequence elements consisting of
   non-self-delimited top-level values that may have been truncated
   (that are not delimited by whitespace).  Parsers can report such
   texts as warnings (including, optionally, the parsed text and/or the
   original octet string).

   For example, 'RS123RS' might have been intended to carry the
   top-level number 123.4, but must have been truncated.  Similarly,
   'RStrueRS' might have been intended to carry the invalid text
   'trueish'.  'RStruefalesRS' is not two top-level values, 'true',
   and 'false'; it is simply not a valid JSON text.

This is the only place where the ws rule comes up, so merely saying at
least one byte matching it should suffice.

I'm also adding this following the above, based on your comment about
incremental parsers:

   Implementations may produce a value when parsing 'RSfooRS'
   because their JSON text parser might be able to consume bytes
   incrementally, and since the JSON text in this case is a
   self-delimiting top-level value, the parser can produce the result
   without consuming an additional byte.  Such implementations should
   skip to the next RS byte, possibly reporting any intervening
   non-whitespace bytes.

(yes, I think this should be a 'should', not a 'SHOULD').

Nico
-- 

___
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art


Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-09 Thread Nico Williams
On Wed, Dec 10, 2014 at 10:52:22AM +1000, Matthew Kerwin wrote:
 On 10 December 2014 at 10:49, Nico Williams n...@cryptonector.com wrote:
  (yes, I think this should be a 'should', not a 'SHOULD').
 
 Not an 'ought to'?

I'll take that.

___
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art


Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-09 Thread Black, David
This looks good, and the explanation of what's going on (numbers,
'true', 'false' and 'null' lack delimiters) is a useful addition.

One small nit:

'RStruefalesRS' is not two top-level values, 'true',
and 'false'; it is simply not a valid JSON text.

truefales - truefalse

Thanks,
--David
 
 On Tue, Dec 09, 2014 at 06:49:35PM +, Black, David wrote:
   So I think we really do need to say something about top-level numbers
   (and true, false, and null), namely: that they must be delimited by
   whitespace, that 'RS1234RS' is not a valid sequence element because
   the number may have been truncated.  (Ditto 'RStrueRS', since the
   intended text could have been 'trueish', which is invalid of course, but
   still.)
 
  That would be more robust, as then all JSON texts in a sequence have
  delimiters and absence of the closing delimiter clearly indicates
  truncation.
 
 OK.
 
 New section 2.4 text:
 
While objects, arrays, and strings are self-delimited in JSON texts,
numbers, and the values 'true', 'false', and 'null' are not.  Only
whitespace can delimit the latter four kinds of values.
 
Parsers MUST check that any JSON texts that are a top-level number,
or which might be 'true', 'false', or 'null' include JSON whitespace
(at least one byte matching the ws ABNF rule from RFC7159) after
that value, otherwise the JSON-text may have been truncated.  Note
that the LF following each JSON text matches the ws ABNF rule.
 
Parsers MUST drop JSON-text sequence elements consisting of
non-self-delimited top-level values that may have been truncated
(that are not delimited by whitespace).  Parsers can report such
texts as warnings (including, optionally, the parsed text and/or the
original octet string).
 
For example, 'RS123RS' might have been intended to carry the
top-level number 123.4, but must have been truncated.  Similarly,
'RStrueRS' might have been intended to carry the invalid text
'trueish'.  'RStruefalesRS' is not two top-level values, 'true',
and 'false'; it is simply not a valid JSON text.
 
 This is the only place where the ws rule comes up, so merely saying at
 least one byte matching it should suffice.
 
 I'm also adding this following the above, based on your comment about
 incremental parsers:
 
Implementations may produce a value when parsing 'RSfooRS'
because their JSON text parser might be able to consume bytes
incrementally, and since the JSON text in this case is a
self-delimiting top-level value, the parser can produce the result
without consuming an additional byte.  Such implementations should
skip to the next RS byte, possibly reporting any intervening
non-whitespace bytes.
 
 (yes, I think this should be a 'should', not a 'SHOULD').
 
 Nico
 --

___
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art


Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-09 Thread Matthew Kerwin
On 10 December 2014 at 10:49, Nico Williams n...@cryptonector.com wrote:


 (yes, I think this should be a 'should', not a 'SHOULD').


​Not an 'ought to'?​


-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/
___
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art


Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-08 Thread Nico Williams
On Fri, Dec 05, 2014 at 02:51:04PM +, Black, David wrote:
 This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows 
 ...

Thanks you.

 Minor issues:
 
 [A] Section 2.1:
 
If parsing of such an octet string as a JSON text fails, the parser
SHOULD nonetheless continue parsing the remainder of the sequence.
 
 That's not quite right - there are two levels of parsing, JSON
 sequence parsing and JSON text parsing of each text in the sequence,
 both of which might be implemented in a single-pass parser.  For such an
 implementation, the above sentence could be (mis-)read to imply that the
 JSON text parse should resume from the point at which it failed, which
 would be silly (although I've seen heroic PL/1 parsers do exactly that).
 Instead, the parse needs to skip ahead to the next RS, ignoring the rest
 of the JSON text that failed to parse.  I suggest:

Good point.

If parsing of such an octet string as a JSON text fails, and the
octet string is followed by an RS octet, the parser
SHOULD nonetheless skip ahead to that RS octet and continue parsing
the remainder of the sequence from there.

That's a weird way of saying that, wherever the JSON text parse fails,
the sequence parser should then seek forward until the next RS-valued
byte.  But it works for me; I'll take it.

 [B] Section 2.3:
 
 Is incremental parsing of a JSON text within a sequence allowed, or
 is the parser required to not produce any results until the parse of
 the entire text is successful?  I'd expect that incremental parsing
 is ok (so results may be produced from a text that ultimately fails
 to parse), and I think that's worth stating.

It's permitted, of course, with the caveat that all incremental parsers
have: the parse can ultimately fail.  I'll add this note:

   Incremental JSON text parsers may be used, though of course failure
   to parse a given text may result after first producing some
   incremental parse results.

to section 2.3.

 [C] Section 2.4:
 
Parsers MUST check that any JSON texts that are a top-level number
include JSON whitespace (ws ABNF rule from [RFC7159]) after the
number, otherwise the JSON-text may have been truncated.
 
 That reference to the ws rule doesn't get the job done because that
 rule allows a match to no characters - it's of the form ws = *( ... )
 where ... is the list of whitespace characters.  What's needed here is
 a rule of the form vws = 1*( ...) to force there to be at least one
 whitespace character, but see the next issue for a better way to deal
 with this topic by pulling the appended LF into the sequence parse
 instead of the text parse.

I'd wanted to not have to list the characters that are considered
whitespace.  I agree that the ws rule is not appropriate though.

 [D] I wonder whether the possibility of incomplete texts ought to be
 encoded into the parsing rules to directly catch JSON texts that must
 be incomplete because the last character is not LF, e.g.:

A missing terminating LF is not a problem for strings, arrays, or
objects.  I seem to recall that we did discuss this.  We could require
that such texts fail to parse, but perhaps the more important thing is
to require common parser behavior as to such truncations.

You ABNF proposal is certainly more strict than the one in the I-D.  I'm
neutral as to whether this form or the one in the I-D (with the ws issue
fixed) is better.  The stricter form is clearly easier to talk about,
therefore preferable, but it will mean discarding texts where only that
terminating LF is truncated.

One problem we get into with your ABNF is that RFC5234 does not discuss
greediness (this came up on the list).  But this can be noted (see
below).

One nice thing about the stricter rules is that we need not discuss
top-level numbers (or booleans, or null) with normative text, just note
them.

  JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON))
  RS = %x1E; record separator (RS), see RFC20
  possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded
; JSON text (see RFC7159)
  truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt
   ; to parse as JSON text
  empty-JSON = LF ; only the LF appended by the encoder, nothing to parse
 
  not-RS = %x00-1D / %x1F-FF; any octet other than RS
  not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF
 
 Note that this won't detect all incomplete JSON texts, because LF is allowed
 within a JSON text (and this should be stated).

It will if ABNF matching is greedy and possible-JSON is defined as

   possible-JSON = 1*( 1*(not-RS) LF); ...

Advice as to which form to take?

 [E] Section 3 - Security Considerations
 
 Incomplete and malformed JSON texts can be used to attack JSON parsers -
 that should be pointed out, as I don't see that in RFC 7159's security
 considerations and incomplete texts are a relevant 

Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-07 Thread Pete Resnick

On 12/5/14 4:38 PM, Barry Leiba wrote:

Hi, David.  One note on your review:

   

idnits didn't like the reference to RFC 20 for ASCII:

   ** Downref: Normative reference to an Unknown state RFC: RFC   20

RFC 5234 (ABNF) uses this, which looks like a better reference:

[US-ASCII]  American National Standards Institute, Coded Character
Set -- 7-bit American Standard Code for Information
Interchange, ANSI X3.4, 1986.
 

Except that (1) many IETF documents do use RFC 20 and (2) the RFC 20
reference is not for ASCII: it's for RS, the Record Separator
character, which is explained in RFC 20, Section 5.2.
   


And as per BCP97, RFC 3967, section 3, this downref was specifically 
called out in the Last Call for this document.


pr

--
Pete Resnickhttp://www.qualcomm.com/~presnick/
Qualcomm Technologies, Inc. - +1 (858)651-4478

___
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art


Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-05 Thread Barry Leiba
Hi, David.  One note on your review:

 idnits didn't like the reference to RFC 20 for ASCII:

   ** Downref: Normative reference to an Unknown state RFC: RFC   20

 RFC 5234 (ABNF) uses this, which looks like a better reference:

[US-ASCII]  American National Standards Institute, Coded Character
Set -- 7-bit American Standard Code for Information
Interchange, ANSI X3.4, 1986.

Except that (1) many IETF documents do use RFC 20 and (2) the RFC 20
reference is not for ASCII: it's for RS, the Record Separator
character, which is explained in RFC 20, Section 5.2.

Barry

___
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art


Re: [Gen-art] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-05 Thread Black, David
Ok, it is a nit, as is much of what idnits produces ;-).
I have no problem ignoring idnits on this one ...

Thanks,
--David

 -Original Message-
 From: barryleiba.mailing.li...@gmail.com
 [mailto:barryleiba.mailing.li...@gmail.com] On Behalf Of Barry Leiba
 Sent: Friday, December 05, 2014 5:39 PM
 To: Black, David
 Cc: n...@cryptonector.com; General Area Review Team (gen-art@ietf.org); ops-
 d...@ietf.org; i...@ietf.org; j...@ietf.org
 Subject: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09
 
 Hi, David.  One note on your review:
 
  idnits didn't like the reference to RFC 20 for ASCII:
 
** Downref: Normative reference to an Unknown state RFC: RFC   20
 
  RFC 5234 (ABNF) uses this, which looks like a better reference:
 
 [US-ASCII]  American National Standards Institute, Coded Character
 Set -- 7-bit American Standard Code for Information
 Interchange, ANSI X3.4, 1986.
 
 Except that (1) many IETF documents do use RFC 20 and (2) the RFC 20
 reference is not for ASCII: it's for RS, the Record Separator
 character, which is explained in RFC 20, Section 5.2.
 
 Barry
___
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art