>
> Right, I had wanted to focus the discussion on Flight as I think schema
> evolution or multiplexing streams (more so the latter) is a property of the
> transport and not the stream format itself. If we are leaning towards just
> schema evolution then maybe it makes sense to discuss it for the IPC stream
> format and leverage that in Flight. I'd be interested in what others think.

I tend to agree, I think stream multiplexing is likely a transport level
issue.  IMO I think schema evolution should be consistent with the IPC
stream format  and flight.


> Nate: it may be worth starting a separate discussion about more general
> metadata in the IPC message. I'm not aware of why key-value metadata was
> chosen/if opaque bytes were considered in the past.


I think  this was an unfortunate design of the key value metadata in
Schema.fbs, but I don't think I was around when this decision was made.

Side Question: Why isn't the IPC stream format a series of the flight
> protobufs?

In addition to what David said, protobufs can't be read directly from a
memory-mapped file (they need decoding).  This was one of the design
considerations of using flatbuffers and IPC Stream/File format.

I was thinking Micah's comment is more that whatever we do, it should be
> clearly specified and edge cases should be considered, especially if we
> might want to 'backport' this into the stream format later.


Yes, for dictionaries we just need to be careful to define semantics and
ensure implementations are validating them with regards to dictionaries.
There likely isn't any need to change current implementations though.

On Mon, Jun 28, 2021 at 1:25 PM David Li <lidav...@apache.org> wrote:

> Right, I had wanted to focus the discussion on Flight as I think schema
> evolution or multiplexing streams (more so the latter) is a property of the
> transport and not the stream format itself. If we are leaning towards just
> schema evolution then maybe it makes sense to discuss it for the IPC stream
> format and leverage that in Flight. I'd be interested in what others think.
>
> Especially if we are looking at multiplexing streams - I would wonder if
> that's actually better served by making it easier to implement using the
> Flight implementation as it stands (by managing concurrent RPC calls and/or
> performing the union-of-structs encoding trick for you), instead of having
> to change the protocol.
>
> Nate: it may be worth starting a separate discussion about more general
> metadata in the IPC message. I'm not aware of why key-value metadata was
> chosen/if opaque bytes were considered in the past. Off the top of my head
> if it's for on-disk storage and fully application-defined it may make sense
> to store as a separate file alongside the Arrow file (indexed by record
> batch index) where you can take advantage of whatever format is most
> suitable.
>
> -David
>
> On Sun, Jun 27, 2021, at 07:50, Gosh Arzumanyan wrote:
> > Hi guys,
> >
> > 1. Regarding IPC vs Flight: in fact my initial suggestion was to add this
> > feature starting from the IPC(I moved initial write up steps to the
> bottom
> > of the doc). Afterwards David suggested focusing on Flight and that's how
> > we ended up with the protobufs change in the proposal. This being said I
> do
> > think that the place where this should be impemented is a good question
> on
> > its own. Maybe it makes sense to have this kind of a feature in IPC and
> > somehow use it in Flight, maybe not.
> > 2. The point about dictionaries deserves a dedicated section in the
> > proposal. Nate and David brought it up and shared some insights. I'll try
> > to aggregate them and we can continue the discussion form there.
> >
> > Cheers,
> > Gosh
> >
> > On Sat., 26 Jun. 2021, 17:26 Nate Bauernfeind, <
> natebauernfe...@deephaven.io>
> > wrote:
> >
> > > >
> > > > > > makes it more difficult to bring schema evolution back into the
> > > > > > IPC Stream format (i.e. it would live only in flight)
> > > > >
> > > > > Gosh's proposal extends the flatbuffer structures not the
> protobufs.
> > > Can
> > > > > you help me understand how difficult it would be to bring the
> > > `schema_id`
> > > > > approach to the IPC stream format?
> > > >
> > > > I thought we were talking solely about the Flight Protobuf
> definitions -
> > > > not the Flatbuffers (and the Google doc at least only talks about the
> > > > Protobufs).
> > > >
> > >
> > > I somehow missed that schema_id is being added to protobuf in the
> document.
> > > It feels to me that the schema_id is a property that would ideally only
> > > apply to the RecordBatch. I better understand Micah's dictionary
> concerns,
> > > now, too.
> > >
> > > > Side Question: Why isn't the IPC stream format a series of the flight
> > > > > protobufs? It's a real shame that there is no standard way to
> > > > > capture/replay a stream with app_metadata. (Obviously ignoring the
> > > > > annoyances around protobuf wrapping flatbuffers.)
> > > >
> > > > The IPC format was defined long before Flight, and Flight's
> app_metadata
> > > > was added after Flight's initial definition. Note an IPC message does
> > > have
> > > > a provision for key-value metadata, though I think APIs for that are
> not
> > > > fully exposed. (See ARROW-6940:
> > > > https://issues.apache.org/jira/browse/ARROW-6940 and despite my
> comments
> > > > there perhaps we need to unify or at least consider how Flight's
> > > > app_metadata relates to the IPC message custom_metadata. Also
> perhaps see
> > > > ARROW-1059.)
> > > >
> > >
> > > KeyValue unfortunately is string to string. In flatbuffer strings are
> only
> > > UTF-8 or 7-bit ASCII. The app_metadata on the other hand is opaque
> bytes.
> > > The latter is a bit more useful.
> > >
> > > --
> > >
> >
>

Reply via email to