> > It sounds like we may want to discuss some potential evolutions of the > Arrow binary protocol (for example: new Message types).
This sounds like a good path forward. On Fri, Jul 9, 2021 at 8:26 AM Wes McKinney <[email protected]> wrote: > It sounds like we may want to discuss some potential evolutions of the > Arrow binary protocol (for example: new Message types). Certainly a > can of worms but rather than trying to bolt some new functionality > onto the existing structures, it might be better to support the new > use cases through some new structures which will be more clear cut > from a forward compatibility standpoint. > > On Wed, Jul 7, 2021 at 8:31 PM David Li <[email protected]> wrote: > > > > To summarize so far, it sounds like schema evolution is neither > sufficient nor necessary for either Gosh or Nate's use-cases here? It could > be useful for FlightSQL but even there I don't think it's a requirement. > > > > For Nate - it almost sounds like what you need is some way to slice up a > record batch and send columns individually, which isn't really a concept in > IPC (and hence Flight). Or rather, record batch is almost the wrong > abstraction for your use case (when you're sending per-column deltas), even > though you could model it as a record batch with 'empty' columns or as a > stream with constantly shifting schema (neither of which are perfect > encodings). > > > > -David > > > > On Wed, Jul 7, 2021, at 13:24, Nate Bauernfeind wrote: > > > > Flatbuffers does not support modifying structs > > > > in any forwards or backwards compatible way > > > > (only tables support evolution). > > > > > > Bah. I did not realize that. > > > > > > To reiterate the feature that would be ideal: > > > I realize the specific feature I am missing is the ability to encode > that a > > > field (i.e. its FieldNode and accompanying Buffers in the RecordBatch) > is > > > empty/has-no-data in O(0) cost (yes; for free). > > > > > > Since RecordBatch is a table, might there be interest in adding a field > > > that is a bitset (formatted as a byte vector) to indicate the subset of > > > root FieldNodes that are included in the RecordBatch's field-node-list > > > (noting that the buffer list most be appropriately filtered, too)? If > the > > > bitset is omitted/empty we can be forwards-compatible by assuming every > > > field is included (seems fair; because why send a record batch without > a > > > payload?). This does bring down the cost from 16 bytes (field node) + > 32 > > > bytes (2x buffer node minimum; validity buffer node + payload buffer > node) > > > to 1 bit per field node -- that's a compression ratio of >= 384:1 -- > > > although it's not free, it's a lot closer. > > > > > > Are there any other alternatives the Arrow community might consider? > > > > > > > > > > This sounds a lot like what DenseUnion provides though? > > > > > > The use case is as follows: We send multiple record batches that > aggregate > > > / accumulate into a single logical update. The first set of record > batches > > > contain payload for rows that were added since the last logical update > > > (this is a set of updates to accommodate that not every implementation > > > supports 32-bit lengths). A field node for every column and every > added row > > > will be sent. For this half of the logical update the RecordBatch > length > > > matches the root FieldNode lengths. The second set of record batches > > > contain payload for only rows, and fields, that were actually > modified. If > > > only one field changed, we send only that payload for that row. There > is > > > additional metadata that allows the client to understand which > existing row > > > is to be replaced by any given row for a given Field/Column. In this > > > context, each root field node has a different mapping to minimize total > > > payload size. > > > > > > We might be able to use DenseUnion but it certainly feels like folding > the > > > entire data-source into a dense union makes the design less useful and > less > > > accessible. I will spend some time sleeping on your suggestion, but > I'm not > > > immediately excited about it. At this moment, I suspect I will > continue to > > > lie and state that the RecordBatch's length is the max length across > all > > > root field node lengths (and be content that it's not ideal from a > > > copy/allocation perspective). > > > - > > > > > > On Wed, Jul 7, 2021 at 10:57 AM Micah Kornfield <[email protected] > > > > > wrote: > > > > > > > > > > > > > Might there be interest in adding a "field_id" to the FieldNode > (which is > > > > > encoded on the RecordBatch flatbuffer)? I see a simple > forward-compatible > > > > > upgrade (by either keying off of 0, or explicitly set the field > default > > > > to > > > > > -1) which would allow the sender to "skip" fields that have 1) > FieldNode > > > > > length of zero, and 2) all Buffer's associated at that level (and > further > > > > > nested) are also equally empty (i.e. Buffer length is zero). > > > > > > > > > > > > FieldNode is a struct in Message.fbs. Flatbuffers does not support > > > > modifying structs in any forwards or backwards compatible way (only > tables > > > > support evolution). I think there was originally more metadata in > > > > FieldNode and it was stripped down due to size concerns. > > > > > > > > I understand this concept slightly interferes with RecordBatch's > `length` > > > > > field, and that many implementations use that length to resize the > > > > > root-level FieldNodes. The use-case I have in mind has different > logical > > > > > lengths per field node; current implementations require sending a > > > > > RecordBatch length of the max length across all root level field > nodes. I > > > > > believe this requires a copy of data whenever a field node is too > short; > > > > I > > > > > don't know if there is a decent solution to this slight > inefficiency. I > > > > am > > > > > bringing it up because if "skipping a field node when it is empty" > is a > > > > > feature, then we may not want to allocate space for those nodes > given > > > > that > > > > > the record batch length will likely be greater than zero. > > > > > > > > > > > > Having conflicting RecordBatch and top-level field nodes is > something that > > > > I believe we have pushed back on in the past. This sounds a lot > like > > > > what DenseUnion provides though? > > > > > > > > On Wed, Jul 7, 2021 at 8:32 AM Nate Bauernfeind < > > > > [email protected]> wrote: > > > > > > > > > Deephaven and I are very supportive of "upgrading" the value half > of the > > > > kv > > > > > pair to a byte vector. What is the best way to find out if there is > > > > > sufficient interest? > > > > > > > > > > > > > > > I've been stewing on the ideas here around schema evolution, and I > > > > realize > > > > > the specific feature I am missing is the ability to encode that a > field > > > > > (i.e. its FieldNode and accompanying Buffers in the RecordBatch) is > > > > > empty/has-no-data in O(0) cost (yes; for free). > > > > > > > > > > Might there be interest in adding a "field_id" to the FieldNode > (which is > > > > > encoded on the RecordBatch flatbuffer)? I see a simple > forward-compatible > > > > > upgrade (by either keying off of 0, or explicitly set the field > default > > > > to > > > > > -1) which would allow the sender to "skip" fields that have 1) > FieldNode > > > > > length of zero, and 2) all Buffer's associated at that level (and > further > > > > > nested) are also equally empty (i.e. Buffer length is zero). > > > > > > > > > > I understand this concept slightly interferes with RecordBatch's > `length` > > > > > field, and that many implementations use that length to resize the > > > > > root-level FieldNodes. The use-case I have in mind has different > logical > > > > > lengths per field node; current implementations require sending a > > > > > RecordBatch length of the max length across all root level field > nodes. I > > > > > believe this requires a copy of data whenever a field node is too > short; > > > > I > > > > > don't know if there is a decent solution to this slight > inefficiency. I > > > > am > > > > > bringing it up because if "skipping a field node when it is empty" > is a > > > > > feature, then we may not want to allocate space for those nodes > given > > > > that > > > > > the record batch length will likely be greater than zero. > > > > > > > > > > On Wed, Jul 7, 2021 at 8:12 AM Wes McKinney <[email protected]> > wrote: > > > > > > > > > > > On Wed, Jul 7, 2021 at 2:53 PM David Li <[email protected]> > wrote: > > > > > > > > > > > > > > From the Flatbuffers internals doc[1] it appears they are the > same: > > > > > > "Strings are simply a vector of bytes, and are always > null-terminated." > > > > > > > > > > > > I see. I took a look at flatbuffers.h, and it appears that > changing > > > > > > this field from string to [byte] would be backward-compatible and > > > > > > forward-compatible except with code that expects a null > terminator. > > > > > > This is something we could discuss separately if there were > enough > > > > > > interest. > > > > > > > > > > > > > [1]: > https://google.github.io/flatbuffers/flatbuffers_internals.html > > > > > > > > > > > > > > -David > > > > > > > > > > > > > > On Wed, Jul 7, 2021, at 05:08, Wes McKinney wrote: > > > > > > > > On Tue, Jul 6, 2021 at 6:33 PM Micah Kornfield < > > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Right, I had wanted to focus the discussion on Flight as > I > > > > think > > > > > > schema > > > > > > > > > > evolution or multiplexing streams (more so the latter) > is a > > > > > > property of the > > > > > > > > > > transport and not the stream format itself. If we are > leaning > > > > > > towards just > > > > > > > > > > schema evolution then maybe it makes sense to discuss it > for > > > > the > > > > > > IPC stream > > > > > > > > > > format and leverage that in Flight. I'd be interested in > what > > > > > > others think. > > > > > > > > > > > > > > > > > > I tend to agree, I think stream multiplexing is likely a > > > > transport > > > > > > level > > > > > > > > > issue. IMO I think schema evolution should be consistent > with > > > > the > > > > > > IPC > > > > > > > > > stream format and flight. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Nate: it may be worth starting a separate discussion > about more > > > > > > general > > > > > > > > > > metadata in the IPC message. I'm not aware of why > key-value > > > > > > metadata was > > > > > > > > > > chosen/if opaque bytes were considered in the past. > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this was an unfortunate design of the key value > metadata > > > > > in > > > > > > > > > Schema.fbs, but I don't think I was around when this > decision was > > > > > > made. > > > > > > > > > > > > > > > > I agree that it's unfortunate that we did not use [ byte ] > instead > > > > of > > > > > > > > string for the value in the KeyValue metadata — I think this > was > > > > more > > > > > > > > of an oversight than a deliberate choice (e.g. it was not our > > > > intent > > > > > > > > to require binary data to be base64-encoded — this is > something > > > > that > > > > > > > > we have to do when encoding binary data in Thrift KeyValue > metadata > > > > > > > > for Parquet, for example). Is the binary representation of > [byte] > > > > > > > > different from string? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Side Question: Why isn't the IPC stream format a series of > the > > > > > flight > > > > > > > > > > protobufs? > > > > > > > > > > > > > > > > > > In addition to what David said, protobufs can't be read > directly > > > > > > from a > > > > > > > > > memory-mapped file (they need decoding). This was one of > the > > > > > design > > > > > > > > > considerations of using flatbuffers and IPC Stream/File > format. > > > > > > > > > > > > > > > > > > I was thinking Micah's comment is more that whatever we > do, it > > > > > > should be > > > > > > > > > > clearly specified and edge cases should be considered, > > > > especially > > > > > > if we > > > > > > > > > > might want to 'backport' this into the stream format > later. > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, for dictionaries we just need to be careful to define > > > > > semantics > > > > > > and > > > > > > > > > ensure implementations are validating them with regards to > > > > > > dictionaries. > > > > > > > > > There likely isn't any need to change current > implementations > > > > > though. > > > > > > > > > > > > > > > > > > On Mon, Jun 28, 2021 at 1:25 PM David Li < > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Right, I had wanted to focus the discussion on Flight as > I > > > > think > > > > > > schema > > > > > > > > > > evolution or multiplexing streams (more so the latter) > is a > > > > > > property of the > > > > > > > > > > transport and not the stream format itself. If we are > leaning > > > > > > towards just > > > > > > > > > > schema evolution then maybe it makes sense to discuss it > for > > > > the > > > > > > IPC stream > > > > > > > > > > format and leverage that in Flight. I'd be interested in > what > > > > > > others think. > > > > > > > > > > > > > > > > > > > > Especially if we are looking at multiplexing streams - I > would > > > > > > wonder if > > > > > > > > > > that's actually better served by making it easier to > implement > > > > > > using the > > > > > > > > > > Flight implementation as it stands (by managing > concurrent RPC > > > > > > calls and/or > > > > > > > > > > performing the union-of-structs encoding trick for you), > > > > instead > > > > > > of having > > > > > > > > > > to change the protocol. > > > > > > > > > > > > > > > > > > > > Nate: it may be worth starting a separate discussion > about more > > > > > > general > > > > > > > > > > metadata in the IPC message. I'm not aware of why > key-value > > > > > > metadata was > > > > > > > > > > chosen/if opaque bytes were considered in the past. Off > the top > > > > > of > > > > > > my head > > > > > > > > > > if it's for on-disk storage and fully > application-defined it > > > > may > > > > > > make sense > > > > > > > > > > to store as a separate file alongside the Arrow file > (indexed > > > > by > > > > > > record > > > > > > > > > > batch index) where you can take advantage of whatever > format is > > > > > > most > > > > > > > > > > suitable. > > > > > > > > > > > > > > > > > > > > -David > > > > > > > > > > > > > > > > > > > > On Sun, Jun 27, 2021, at 07:50, Gosh Arzumanyan wrote: > > > > > > > > > > > Hi guys, > > > > > > > > > > > > > > > > > > > > > > 1. Regarding IPC vs Flight: in fact my initial > suggestion was > > > > > to > > > > > > add this > > > > > > > > > > > feature starting from the IPC(I moved initial write up > steps > > > > to > > > > > > the > > > > > > > > > > bottom > > > > > > > > > > > of the doc). Afterwards David suggested focusing on > Flight > > > > and > > > > > > that's how > > > > > > > > > > > we ended up with the protobufs change in the proposal. > This > > > > > > being said I > > > > > > > > > > do > > > > > > > > > > > think that the place where this should be impemented > is a > > > > good > > > > > > question > > > > > > > > > > on > > > > > > > > > > > its own. Maybe it makes sense to have this kind of a > feature > > > > in > > > > > > IPC and > > > > > > > > > > > somehow use it in Flight, maybe not. > > > > > > > > > > > 2. The point about dictionaries deserves a dedicated > section > > > > in > > > > > > the > > > > > > > > > > > proposal. Nate and David brought it up and shared some > > > > > insights. > > > > > > I'll try > > > > > > > > > > > to aggregate them and we can continue the discussion > form > > > > > there. > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Gosh > > > > > > > > > > > > > > > > > > > > > > On Sat., 26 Jun. 2021, 17:26 Nate Bauernfeind, < > > > > > > > > > > [email protected]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > makes it more difficult to bring schema > evolution > > > > back > > > > > > into the > > > > > > > > > > > > > > > IPC Stream format (i.e. it would live only in > flight) > > > > > > > > > > > > > > > > > > > > > > > > > > > > Gosh's proposal extends the flatbuffer > structures not > > > > the > > > > > > > > > > protobufs. > > > > > > > > > > > > Can > > > > > > > > > > > > > > you help me understand how difficult it would be > to > > > > bring > > > > > > the > > > > > > > > > > > > `schema_id` > > > > > > > > > > > > > > approach to the IPC stream format? > > > > > > > > > > > > > > > > > > > > > > > > > > I thought we were talking solely about the Flight > > > > Protobuf > > > > > > > > > > definitions - > > > > > > > > > > > > > not the Flatbuffers (and the Google doc at least > only > > > > talks > > > > > > about the > > > > > > > > > > > > > Protobufs). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I somehow missed that schema_id is being added to > protobuf > > > > in > > > > > > the > > > > > > > > > > document. > > > > > > > > > > > > It feels to me that the schema_id is a property that > would > > > > > > ideally only > > > > > > > > > > > > apply to the RecordBatch. I better understand Micah's > > > > > > dictionary > > > > > > > > > > concerns, > > > > > > > > > > > > now, too. > > > > > > > > > > > > > > > > > > > > > > > > > Side Question: Why isn't the IPC stream format a > series > > > > of > > > > > > the flight > > > > > > > > > > > > > > protobufs? It's a real shame that there is no > standard > > > > > way > > > > > > to > > > > > > > > > > > > > > capture/replay a stream with app_metadata. > (Obviously > > > > > > ignoring the > > > > > > > > > > > > > > annoyances around protobuf wrapping flatbuffers.) > > > > > > > > > > > > > > > > > > > > > > > > > > The IPC format was defined long before Flight, and > > > > Flight's > > > > > > > > > > app_metadata > > > > > > > > > > > > > was added after Flight's initial definition. Note > an IPC > > > > > > message does > > > > > > > > > > > > have > > > > > > > > > > > > > a provision for key-value metadata, though I think > APIs > > > > for > > > > > > that are > > > > > > > > > > not > > > > > > > > > > > > > fully exposed. (See ARROW-6940: > > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-6940 > and > > > > > > despite my > > > > > > > > > > comments > > > > > > > > > > > > > there perhaps we need to unify or at least > consider how > > > > > > Flight's > > > > > > > > > > > > > app_metadata relates to the IPC message > custom_metadata. > > > > > Also > > > > > > > > > > perhaps see > > > > > > > > > > > > > ARROW-1059.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > KeyValue unfortunately is string to string. In > flatbuffer > > > > > > strings are > > > > > > > > > > only > > > > > > > > > > > > UTF-8 or 7-bit ASCII. The app_metadata on the other > hand is > > > > > > opaque > > > > > > > > > > bytes. > > > > > > > > > > > > The latter is a bit more useful. > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > -- > > > >
