Re: [Ext] Consensus call for qlog serialization format (issue #144)

Robin MARX Fri, 06 Aug 2021 07:31:34 -0700

Hello all,

Thank you for the excellent discussion on this and sorry for my silence.

As Lucas has said multiple times, I don't think we should take the current
text too much as direct gospel for this consensus call.
To me, it is indeed much more a question of agreement on going for "a" JSON
option, as opposed to CBOR/protobuf/pcapng/etc. and not "the" specific JSON
serialization currently specified.

Much of the current JSON oddities come from:
     a) problems with JSON implementation limitations in browsers (e.g., no
true 64-bit number support) and
     b) the use of TypeScript to specify the data format (a decision taken
at a time where I was unaware of things like CDDL)
I am certainly not averse to and indeed proposing to actively revisit these
things.

Concretely to address some of Paul's original points:
Section 6.1.1: large numbers as strings or numbers
    This is because browser JavaScript engines typically only do up to
2^53-1 instead of 64 bit. I hope that with BigInt support this will
eventually change and for qlog it is probably sufficient to just use
numbers and note this as a caveat
    ("if you're targeting browser-based tooling, large numbers might be
truncated. Shouldn't happen in 99.9% of cases, but if you're doing weird
stuff, beware").

6.1.2.1 multiple ways of truncating byte strings
   I am unaware of a canonical JSON structure that would allow doing this.
If there is such a thing, kindly enlighten me.
   If the problem is mainly with the fact that I currently define multiple
options (e.g., with/without raw_length etc.), that's certainly something
I'm willing to revise.

6.1.4 allowing trailing comma's
   I'm a bit confused by this one, since the qlog text explicitly states
that trailing comma's aren't allowed. The intent is to work around this by
requiring -tools- to accept empty objects to make output easier.
   I'm not sure if this type of practical tooling guidance is
necessary/wanted in an actual RFC, but this and other similar guidances
have been quite useful in the early days.

6.2 NDJSON
   As Lucas has indicated, this imo isn't part of the current consensus
call, but the decision of what to do about a streaming format will be
influenced by the choice of main serialization format of course.
   If JSON would be chosen, the current idea is to probably switch to JSON
text sequences (https://datatracker.ietf.org/doc/html/rfc7464, see also
https://github.com/quicwg/qlog/issues/172)

With regards to having a canonical serialization format (JSON) or canonical
data definition (as Lucas said, probably CDDL-based), I personally have a
very large preference for the latter.
As indicated by Lucas, Spencer and Roberto, I fully expect there to be
other serialization formats than JSON emerging down the line if qlog
continues to grow/be used.
As such, the canonical data definition should help make such alternatives
possible/easier down the line.
However, I also see a great benefit in having a single, "default" and
standardized serialization format, even if it's just for the purposes of
having a concrete example (e.g., like how we have New Reno as the
congestion controller for QUIC),
though I think the current usage shows it's also already useful for several
other purposes as well (e.g., conversion from/to, tooling development)
As the "default" serialization that we standardize (first), I feel JSON is
our best option for the reasons outlined during the presentation and in the
related Github issues.

To make all that clearer/easier, it might indeed be interesting to split
the serialization format from the main schema document, but to me that's
something that can be easily done down the line, since it's mostly
editorial work.
For now, I think we'd benefit from keeping things together in the main
schema, especially for new implementers/people updating their qlog support
to new versions down the line.

I think that's mainly echoing what Lucas has been saying, but in a slightly
different way, hoping to make things even clearer.
Please let me know if you still have issues with this proposed direction!

With best regards,
Robin

On Thu, 5 Aug 2021 at 21:39, Lucas Pardue <lucaspardue.2...@gmail.com>
wrote:

> Hi Paul,
>
>
> On Thu, Aug 5, 2021 at 5:41 PM Paul Hoffman <paul.hoff...@icann.org>
> wrote:
>
>>
>>
>> We are having a disconnect here that is central to the question in this
>> consensus call. The original call said:
>>
>> > The feeling in the room was to keep the JSON serialization format.
>> Noting that implementations can use their own intermediate formats and
>> transform to and from JSON as needed, and that future documents could
>> specify other interop formats if there is sufficient interest.
>> >
>> > The proposed resolution for this matter is to keep the JSON
>> serialization format as the canonical interoperable format. This email
>> seeks to establish consensus for this. If you have comments, for or
>> against, please respond on the issue. The call will run until EoD August 9
>> 2021.
>> >
>>
>>
>> The call asked whether the (now "a") JSON serialization format will be
>> the canonical interoperable format. That is quite different than "including
>> at least one serialization format". As a concrete example:
>>
>> - The JSON serialization says that numbers such as packet_number may be
>> represented as a number or a string
>>
>> - The data format says that the numbers such as packet_number are uint64
>>
>
>> If the JSON serialization format is the canonical interoperable format,
>> and I'm writing a CBOR emitter, I would be allowed to write packet_number
>> as a number or string because both could convert to JSON. However, if the
>> data format is the canonical interoperable format, I would only be able to
>> write it as a number.
>>
>> Thus, it is critical for interoperability between formats to specify if
>> the data definition is canonical, or if the JSON serialization is
>> canonical. If the latter, there really is no need for the data definitions
>> at all; if this choice is made, interoperability would be more likely if
>> the data definitions were removed.
>>
>> I hope this helps clarify why the WG needs to choose one or the other.
>>
>
> Thanks for the additional commentary. I fear my use of canonical hasn't
> helped clarity much (blame my co-chair for allowing me to use big words
> :-)).
>
> The adopted qlog documents included a TypeScript-based data model (schema)
> and a JSON serialization. There have been some discussions about changing
> either of these. As you mentioned upthread, the two are associated. In an
> attempt to keep discussions focused, we've somewhat avoided mentioning the
> schema. The aim of finding consensus on the serialization format was so
> that we could use that as input into deciding on schema definition. Picking
> JSON for serialization would, hypothetically speaking for now, allow us to
> rewrite the schema in CDDL. Using an IETF format for schema definition
> would assuage some of the concerns that have been expressed about
> Typescript or other options. And I anticipate a schema rewrite would
> require the WG tackle some of the thornier issues in good time. Speaking
> personally, I think that nailing down JSON as the working format du jour
> will allow us to tackle data definition aspects that have been hanging over
> the spec for a while.
>
> qlog has an objective to support alternative serializations formats and
> robust transformations between them. If I understand your comments
> correctly. There's an argument to say we might be approaching things
> backwards and that the group should pick a data definition language first,
> and then the serialization stuff might just come out in the wash. It's a
> bit chicken and egg. Given we have interoperability today between QUIC
> implementations generating qlogs in JSON and tools consuming JSON, for many
> people the decision about data definition is of less immediate practical
> importance. But that's not to downplay the long term importance of creating
> a robust data definition.
>
> Speaking as a co-chair, getting clear consensus about the topic of schema
> and serialization is important at this stage of the document's lifecycle. I
> want to make sure we're asking the right questions now, and we understand
> the potential impacts of things we might be trying to defer. So thanks for
> picking at this. Does the intent to revist (soon) the qlog schema and
> (possibly) extract the serialization address some of your concerns? Is
> there a clearer way we could articulate the question to ensure the WG
> understands what it is agreeing to?
>
> Cheers,
> Lucas
>
>

-- 

dr. Robin Marx
Postdoc researcher - Web protocols
Expertise centre for Digital Media

*Cellphone *+32(0)497 72 86 94

www.uhasselt.be
Universiteit Hasselt - Campus Diepenbeek
Agoralaan Gebouw D - B-3590 Diepenbeek
Kantoor EDM-2.05

Re: [Ext] Consensus call for qlog serialization format (issue #144)

Reply via email to