Looking at this it seems like the main change is require empty lists instead of null values? I think this might potentially be too strict for existing degenerate cases (e.g. empty files, I also don't remember if we said null type requires a buffer).
Most of the others like MessageHeader make sense to me. On Mon, Jan 20, 2020 at 2:32 PM Wes McKinney <wesmck...@gmail.com> wrote: > To help with the discussion, here is a patch with 9 "definitely > required" fields made required, and the associated generated C++ > changes > > https://github.com/apache/arrow/compare/master...wesm:flatbuffers-required > > (I am not 100% sure about Field.children always being non-null, if > there were some doubt we could let it be null) > > (I would guess that the semantics in Java and elsewhere is the same, > but someone should confirm) > > On Mon, Jan 20, 2020 at 12:59 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > On Mon, Jan 20, 2020 at 12:20 PM Jacques Nadeau <jacq...@apache.org> > wrote: > > > > > > > > > > > I think what we have determined is that the changes that are being > > > > discussed in this thread would not render any existing serialized > > > > Flatbuffers unreadable, unless they are malformed / unable to be > > > > read with the current libraries. > > > > > > > > > > I think we need to separate two different things: > > > > > > Point 1: If all data is populated as we expect, changing from optional > to > > > required is a noop. > > > Point 2: All current Arrow code fails to work in all cases where a > field is > > > not populated as expected. > > > > I looked at the before/after when adding "(required)" to a field and > > it appears the only change on the read path is the generated verifier > > (which you have to explicitly invoke, and you can skip verification) > > > > https://gist.github.com/wesm/f1a9e7492b0daee07ccef0566c3900a2 > > > > This is distinct from Protobuf (I think?) because protobuf verifies > > the presence of required fields when parsing the protobuf. I assume > > it's the same in other languages but we'll have to check to be sure > > > > This means that if you _fail to invoke the verifier_, you can still > > follow a null pointer, but applications that use the verifier will > > stop there and not have to implement their own null checks. > > > > > > > > I think one needs to prove both points in order for this change to be a > > > compatible change. I agree that point 1 is proven. I don't think point > 2 > > > has been proven. In fact, I'm not sure how one could prove it(*). The > bar > > > for changing the format in a backwards incompatible way (assuming we > can't > > > prove point 2) should be high given how long the specification has been > > > out. It doesn't feel like the benefits here outweigh the cost of > changing > > > in an incompatible way (especially given the subjective nature of > optional > > > vs. required). > > > > > > It's probably less of a concern for > > > > an in-house protocol than for an open standard like Arrow where there > > > > may be multiple third-party implementations around at some point. > > > > > > > > > > This is subjective, just like the general argument around whether > required > > > or optional should be used in protobuf. My point in sharing was to (1) > > > point out that the initial implementation choices weren't done without > > > reason and (2) that we should avoid arguing that either direction is > more > > > technically sound (which seemed to be the direction the argument was > > > taking). > > > > > > (*) One could do an exhaustive analysis of every codepath. This would > work > > > for libraries in the Arrow project. However, the flatbuf definition is > part > > > of the external specification meaning that other codepaths likely exist > > > that we could not evaluate. >