Looking at this it seems like the main change is require empty lists
instead of null values?  I think this might potentially be too strict for
existing degenerate cases (e.g. empty files, I also don't remember if we
said null type requires a buffer).

Most of the others like MessageHeader make sense to me.

On Mon, Jan 20, 2020 at 2:32 PM Wes McKinney <wesmck...@gmail.com> wrote:

> To help with the discussion, here is a patch with 9 "definitely
> required" fields made required, and the associated generated C++
> changes
>
> https://github.com/apache/arrow/compare/master...wesm:flatbuffers-required
>
> (I am not 100% sure about Field.children always being non-null, if
> there were some doubt we could let it be null)
>
> (I would guess that the semantics in Java and elsewhere is the same,
> but someone should confirm)
>
> On Mon, Jan 20, 2020 at 12:59 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > On Mon, Jan 20, 2020 at 12:20 PM Jacques Nadeau <jacq...@apache.org>
> wrote:
> > >
> > > >
> > > > I think what we have determined is that the changes that are being
> > > > discussed in this thread would not render any existing serialized
> > > > Flatbuffers unreadable, unless they are malformed / unable to be
> > > > read with the current libraries.
> > > >
> > >
> > > I think we need to separate two different things:
> > >
> > > Point 1: If all data is populated as we expect, changing from optional
> to
> > > required is a noop.
> > > Point 2: All current Arrow code fails to work in all cases where a
> field is
> > > not populated as expected.
> >
> > I looked at the before/after when adding "(required)" to a field and
> > it appears the only change on the read path is the generated verifier
> > (which you have to explicitly invoke, and you can skip verification)
> >
> > https://gist.github.com/wesm/f1a9e7492b0daee07ccef0566c3900a2
> >
> > This is distinct from Protobuf (I think?) because protobuf verifies
> > the presence of required fields when parsing the protobuf. I assume
> > it's the same in other languages but we'll have to check to be sure
> >
> > This means that if you _fail to invoke the verifier_, you can still
> > follow a null pointer, but applications that use the verifier will
> > stop there and not have to implement their own null checks.
> >
> > >
> > > I think one needs to prove both points in order for this change to be a
> > > compatible change. I agree that point 1 is proven. I don't think point
> 2
> > > has been proven. In fact, I'm not sure how one could prove it(*). The
> bar
> > > for changing the format in a backwards incompatible way (assuming we
> can't
> > > prove point 2) should be high given how long the specification has been
> > > out. It doesn't feel like the benefits here outweigh the cost of
> changing
> > > in an incompatible way (especially given the subjective nature of
> optional
> > > vs. required).
> > >
> > > It's probably less of a concern for
> > > > an in-house protocol than for an open standard like Arrow where there
> > > > may be multiple third-party implementations around at some point.
> > > >
> > >
> > > This is subjective, just like the general argument around whether
> required
> > > or optional should be used in protobuf. My point in sharing was to (1)
> > > point out that the initial implementation choices weren't done without
> > > reason and (2) that we should avoid arguing that either direction is
> more
> > > technically sound (which seemed to be the direction the argument was
> > > taking).
> > >
> > > (*)  One could do an exhaustive analysis of every codepath. This would
> work
> > > for libraries in the Arrow project. However, the flatbuf definition is
> part
> > > of the external specification meaning that other codepaths likely exist
> > > that we could not evaluate.
>

Reply via email to