On Wed, Aug 11, 2021, 19:05 Weston Pace <weston.p...@gmail.com> wrote:
> >> The benefit is that IR components don't interact much with > `flatbuffers` or > >> `flatc` directly. > >> > [...] > >> > >> One counter-proposal might be to just put the compute IR IDL in a > separate > >> repo, > >> but that isn't tenable because the compute IR needs arrow's type > information > >> contained in `Schema.fbs`. > > > This argument seems predated on the hypothesis that the compute IR will > > use Flatbuffers. Is it set in stone? > > +1 for the original proposal (mirror repo for specs). I don't think > we have to figure out the IR format. It makes sense for all language > independent specs to be in a single place regardless of format. If IR > picked JSON I would still argue the JSON schemas for IR belong in the > same repository as the Arrow columnar format flatbuffers files. It > makes it clear what is spec and what is implementation / toolkit. > Especially since a mirror repo should be pretty low maintenance. > That's a good point. I hadn't considered that point of view, but I think you're right that specs, regardless of wire format should remain together. > On Wed, Aug 11, 2021 at 11:34 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > > Le 11/08/2021 à 23:06, Phillip Cloud a écrit : > > > On Wed, Aug 11, 2021 at 4:22 PM Antoine Pitrou <anto...@python.org> > wrote: > > > > > >> Le 11/08/2021 à 22:16, Phillip Cloud a écrit : > > >>> > > >>> Yeah, that is a drawback here, though I don't see needing to run > flatc > > >> as a > > >>> major downside given the upside > > >>> of not having to write additional code to move between formats. > > >> > > >> That's only an advantage if you already know how to read the Arrow IPC > > >> format (and, yes, in this case you already run `flatc`). Some > projects > > >> probably don't care about Arrow IPC (Dask, for example). > > > > > > > > > I don't think it's about the IPC though, at least for the compute IR > use > > > case. > > > Am I missing something there? > > > > If you're not handling the Arrow IPC format, then you probably don't > > have an encoder/decoder for Schema.fbs, so the "upside of not having to > > write additional code to move between formats" doesn't exist (unless I'm > > misunderstanding your point?). > > > > > I do think a downside of not using something like JSON or msgpack is > > > that schema validation must be implemented by both the producer and the > > > consumer. > > > That means we'd have a number of other consequential decisions to make: > > > > > > * Do we provide the validation library? > > > * If not, do all the languages arrow supports have high quality > libraries > > > for validating schemas? > > > * If so, then we have to implement/maintain/release/bugfix that. > > > > This is true. However, Flatbuffers doesn't validate much on its own, > > either, because its IDL is not expressive enough. For example, > > `Schema.fbs` allows you to declare a INT8 field with children, a LIST > > field without any children, a non-nullable NULL field... > > > > (also, there's JSON Schema: https://json-schema.org/) > > > > Regards > > > > Antoine. >