>
> It sounds like we may want to discuss some potential evolutions of the
> Arrow binary protocol (for example: new Message types).


This sounds like a good path forward.

On Fri, Jul 9, 2021 at 8:26 AM Wes McKinney <wesmck...@gmail.com> wrote:

> It sounds like we may want to discuss some potential evolutions of the
> Arrow binary protocol (for example: new Message types). Certainly a
> can of worms but rather than trying to bolt some new functionality
> onto the existing structures, it might be better to support the new
> use cases through some new structures which will be more clear cut
> from a forward compatibility standpoint.
>
> On Wed, Jul 7, 2021 at 8:31 PM David Li <lidav...@apache.org> wrote:
> >
> > To summarize so far, it sounds like schema evolution is neither
> sufficient nor necessary for either Gosh or Nate's use-cases here? It could
> be useful for FlightSQL but even there I don't think it's a requirement.
> >
> > For Nate - it almost sounds like what you need is some way to slice up a
> record batch and send columns individually, which isn't really a concept in
> IPC (and hence Flight). Or rather, record batch is almost the wrong
> abstraction for your use case (when you're sending per-column deltas), even
> though you could model it as a record batch with 'empty' columns or as a
> stream with constantly shifting schema (neither of which are perfect
> encodings).
> >
> > -David
> >
> > On Wed, Jul 7, 2021, at 13:24, Nate Bauernfeind wrote:
> > > > Flatbuffers does not support modifying structs
> > > > in any forwards or backwards compatible way
> > > > (only tables support evolution).
> > >
> > > Bah. I did not realize that.
> > >
> > > To reiterate the feature that would be ideal:
> > > I realize the specific feature I am missing is the ability to encode
> that a
> > > field (i.e. its FieldNode and accompanying Buffers in the RecordBatch)
> is
> > > empty/has-no-data in O(0) cost (yes; for free).
> > >
> > > Since RecordBatch is a table, might there be interest in adding a field
> > > that is a bitset (formatted as a byte vector) to indicate the subset of
> > > root FieldNodes that are included in the RecordBatch's field-node-list
> > > (noting that the buffer list most be appropriately filtered, too)? If
> the
> > > bitset is omitted/empty we can be forwards-compatible by assuming every
> > > field is included (seems fair; because why send a record batch without
> a
> > > payload?). This does bring down the cost from 16 bytes (field node) +
> 32
> > > bytes (2x buffer node minimum; validity buffer node + payload buffer
> node)
> > > to 1 bit per field node -- that's a compression ratio of >= 384:1 --
> > > although it's not free, it's a lot closer.
> > >
> > > Are there any other alternatives the Arrow community might consider?
> > >
> > >
> > > > This sounds a lot like what DenseUnion provides though?
> > >
> > > The use case is as follows: We send multiple record batches that
> aggregate
> > > / accumulate into a single logical update. The first set of record
> batches
> > > contain payload for rows that were added since the last logical update
> > > (this is a set of updates to accommodate that not every implementation
> > > supports 32-bit lengths). A field node for every column and every
> added row
> > > will be sent. For this half of the logical update the RecordBatch
> length
> > > matches the root FieldNode lengths. The second set of record batches
> > > contain payload for only rows, and fields, that were actually
> modified. If
> > > only one field changed, we send only that payload for that row. There
> is
> > > additional metadata that allows the client to understand which
> existing row
> > > is to be replaced by any given row for a given Field/Column. In this
> > > context, each root field node has a different mapping to minimize total
> > > payload size.
> > >
> > > We might be able to use DenseUnion but it certainly feels like folding
> the
> > > entire data-source into a dense union makes the design less useful and
> less
> > > accessible. I will spend some time sleeping on your suggestion, but
> I'm not
> > > immediately excited about it. At this moment, I suspect I will
> continue to
> > > lie and state that the RecordBatch's length is the max length across
> all
> > > root field node lengths (and be content that it's not ideal from a
> > > copy/allocation perspective).
> > > -
> > >
> > > On Wed, Jul 7, 2021 at 10:57 AM Micah Kornfield <emkornfi...@gmail.com
> >
> > > wrote:
> > >
> > > > >
> > > > > Might there be interest in adding a "field_id" to the FieldNode
> (which is
> > > > > encoded on the RecordBatch flatbuffer)? I see a simple
> forward-compatible
> > > > > upgrade (by either keying off of 0, or explicitly set the field
> default
> > > > to
> > > > > -1) which would allow the sender to "skip" fields that have 1)
> FieldNode
> > > > > length of zero, and 2) all Buffer's associated at that level (and
> further
> > > > > nested) are also equally empty (i.e. Buffer length is zero).
> > > >
> > > >
> > > > FieldNode is a struct in Message.fbs.   Flatbuffers does not support
> > > > modifying structs in any forwards or backwards compatible way (only
> tables
> > > > support evolution).  I think there was originally more metadata in
> > > > FieldNode and it was stripped down due to size concerns.
> > > >
> > > > I understand this concept slightly interferes with RecordBatch's
> `length`
> > > > > field, and that many implementations use that length to resize the
> > > > > root-level FieldNodes. The use-case I have in mind has different
> logical
> > > > > lengths per field node; current implementations require sending a
> > > > > RecordBatch length of the max length across all root level field
> nodes. I
> > > > > believe this requires a copy of data whenever a field node is too
> short;
> > > > I
> > > > > don't know if there is a decent solution to this slight
> inefficiency. I
> > > > am
> > > > > bringing it up because if "skipping a field node when it is empty"
> is a
> > > > > feature, then we may not want to allocate space for those nodes
> given
> > > > that
> > > > > the record batch length will likely be greater than zero.
> > > >
> > > >
> > > > Having conflicting RecordBatch and top-level field nodes is
> something that
> > > > I believe we have pushed back on in the past.   This sounds a lot
> like
> > > > what  DenseUnion provides though?
> > > >
> > > > On Wed, Jul 7, 2021 at 8:32 AM Nate Bauernfeind <
> > > > natebauernfe...@deephaven.io> wrote:
> > > >
> > > > > Deephaven and I are very supportive of "upgrading" the value half
> of the
> > > > kv
> > > > > pair to a byte vector. What is the best way to find out if there is
> > > > > sufficient interest?
> > > > >
> > > > >
> > > > > I've been stewing on the ideas here around schema evolution, and I
> > > > realize
> > > > > the specific feature I am missing is the ability to encode that a
> field
> > > > > (i.e. its FieldNode and accompanying Buffers in the RecordBatch) is
> > > > > empty/has-no-data in O(0) cost (yes; for free).
> > > > >
> > > > > Might there be interest in adding a "field_id" to the FieldNode
> (which is
> > > > > encoded on the RecordBatch flatbuffer)? I see a simple
> forward-compatible
> > > > > upgrade (by either keying off of 0, or explicitly set the field
> default
> > > > to
> > > > > -1) which would allow the sender to "skip" fields that have 1)
> FieldNode
> > > > > length of zero, and 2) all Buffer's associated at that level (and
> further
> > > > > nested) are also equally empty (i.e. Buffer length is zero).
> > > > >
> > > > > I understand this concept slightly interferes with RecordBatch's
> `length`
> > > > > field, and that many implementations use that length to resize the
> > > > > root-level FieldNodes. The use-case I have in mind has different
> logical
> > > > > lengths per field node; current implementations require sending a
> > > > > RecordBatch length of the max length across all root level field
> nodes. I
> > > > > believe this requires a copy of data whenever a field node is too
> short;
> > > > I
> > > > > don't know if there is a decent solution to this slight
> inefficiency. I
> > > > am
> > > > > bringing it up because if "skipping a field node when it is empty"
> is a
> > > > > feature, then we may not want to allocate space for those nodes
> given
> > > > that
> > > > > the record batch length will likely be greater than zero.
> > > > >
> > > > > On Wed, Jul 7, 2021 at 8:12 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > > > >
> > > > > > On Wed, Jul 7, 2021 at 2:53 PM David Li <apa...@lidavidm.me>
> wrote:
> > > > > > >
> > > > > > > From the Flatbuffers internals doc[1] it appears they are the
> same:
> > > > > > "Strings are simply a vector of bytes, and are always
> null-terminated."
> > > > > >
> > > > > > I see. I took a look at flatbuffers.h, and it appears that
> changing
> > > > > > this field from string to [byte] would be backward-compatible and
> > > > > > forward-compatible except with code that expects a null
> terminator.
> > > > > > This is something we could discuss separately if there were
> enough
> > > > > > interest.
> > > > > >
> > > > > > > [1]:
> https://google.github.io/flatbuffers/flatbuffers_internals.html
> > > > > > >
> > > > > > > -David
> > > > > > >
> > > > > > > On Wed, Jul 7, 2021, at 05:08, Wes McKinney wrote:
> > > > > > > > On Tue, Jul 6, 2021 at 6:33 PM Micah Kornfield <
> > > > > emkornfi...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Right, I had wanted to focus the discussion on Flight as
> I
> > > > think
> > > > > > schema
> > > > > > > > > > evolution or multiplexing streams (more so the latter)
> is a
> > > > > > property of the
> > > > > > > > > > transport and not the stream format itself. If we are
> leaning
> > > > > > towards just
> > > > > > > > > > schema evolution then maybe it makes sense to discuss it
> for
> > > > the
> > > > > > IPC stream
> > > > > > > > > > format and leverage that in Flight. I'd be interested in
> what
> > > > > > others think.
> > > > > > > > >
> > > > > > > > > I tend to agree, I think stream multiplexing is likely a
> > > > transport
> > > > > > level
> > > > > > > > > issue.  IMO I think schema evolution should be consistent
> with
> > > > the
> > > > > > IPC
> > > > > > > > > stream format  and flight.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Nate: it may be worth starting a separate discussion
> about more
> > > > > > general
> > > > > > > > > > metadata in the IPC message. I'm not aware of why
> key-value
> > > > > > metadata was
> > > > > > > > > > chosen/if opaque bytes were considered in the past.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think  this was an unfortunate design of the key value
> metadata
> > > > > in
> > > > > > > > > Schema.fbs, but I don't think I was around when this
> decision was
> > > > > > made.
> > > > > > > >
> > > > > > > > I agree that it's unfortunate that we did not use [ byte ]
> instead
> > > > of
> > > > > > > > string for the value in the KeyValue metadata — I think this
> was
> > > > more
> > > > > > > > of an oversight than a deliberate choice (e.g. it was not our
> > > > intent
> > > > > > > > to require binary data to be base64-encoded — this is
> something
> > > > that
> > > > > > > > we have to do when encoding binary data in Thrift KeyValue
> metadata
> > > > > > > > for Parquet, for example). Is the binary representation of
> [byte]
> > > > > > > > different from string?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Side Question: Why isn't the IPC stream format a series of
> the
> > > > > flight
> > > > > > > > > > protobufs?
> > > > > > > > >
> > > > > > > > > In addition to what David said, protobufs can't be read
> directly
> > > > > > from a
> > > > > > > > > memory-mapped file (they need decoding).  This was one of
> the
> > > > > design
> > > > > > > > > considerations of using flatbuffers and IPC Stream/File
> format.
> > > > > > > > >
> > > > > > > > > I was thinking Micah's comment is more that whatever we
> do, it
> > > > > > should be
> > > > > > > > > > clearly specified and edge cases should be considered,
> > > > especially
> > > > > > if we
> > > > > > > > > > might want to 'backport' this into the stream format
> later.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, for dictionaries we just need to be careful to define
> > > > > semantics
> > > > > > and
> > > > > > > > > ensure implementations are validating them with regards to
> > > > > > dictionaries.
> > > > > > > > > There likely isn't any need to change current
> implementations
> > > > > though.
> > > > > > > > >
> > > > > > > > > On Mon, Jun 28, 2021 at 1:25 PM David Li <
> lidav...@apache.org>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Right, I had wanted to focus the discussion on Flight as
> I
> > > > think
> > > > > > schema
> > > > > > > > > > evolution or multiplexing streams (more so the latter)
> is a
> > > > > > property of the
> > > > > > > > > > transport and not the stream format itself. If we are
> leaning
> > > > > > towards just
> > > > > > > > > > schema evolution then maybe it makes sense to discuss it
> for
> > > > the
> > > > > > IPC stream
> > > > > > > > > > format and leverage that in Flight. I'd be interested in
> what
> > > > > > others think.
> > > > > > > > > >
> > > > > > > > > > Especially if we are looking at multiplexing streams - I
> would
> > > > > > wonder if
> > > > > > > > > > that's actually better served by making it easier to
> implement
> > > > > > using the
> > > > > > > > > > Flight implementation as it stands (by managing
> concurrent RPC
> > > > > > calls and/or
> > > > > > > > > > performing the union-of-structs encoding trick for you),
> > > > instead
> > > > > > of having
> > > > > > > > > > to change the protocol.
> > > > > > > > > >
> > > > > > > > > > Nate: it may be worth starting a separate discussion
> about more
> > > > > > general
> > > > > > > > > > metadata in the IPC message. I'm not aware of why
> key-value
> > > > > > metadata was
> > > > > > > > > > chosen/if opaque bytes were considered in the past. Off
> the top
> > > > > of
> > > > > > my head
> > > > > > > > > > if it's for on-disk storage and fully
> application-defined it
> > > > may
> > > > > > make sense
> > > > > > > > > > to store as a separate file alongside the Arrow file
> (indexed
> > > > by
> > > > > > record
> > > > > > > > > > batch index) where you can take advantage of whatever
> format is
> > > > > > most
> > > > > > > > > > suitable.
> > > > > > > > > >
> > > > > > > > > > -David
> > > > > > > > > >
> > > > > > > > > > On Sun, Jun 27, 2021, at 07:50, Gosh Arzumanyan wrote:
> > > > > > > > > > > Hi guys,
> > > > > > > > > > >
> > > > > > > > > > > 1. Regarding IPC vs Flight: in fact my initial
> suggestion was
> > > > > to
> > > > > > add this
> > > > > > > > > > > feature starting from the IPC(I moved initial write up
> steps
> > > > to
> > > > > > the
> > > > > > > > > > bottom
> > > > > > > > > > > of the doc). Afterwards David suggested focusing on
> Flight
> > > > and
> > > > > > that's how
> > > > > > > > > > > we ended up with the protobufs change in the proposal.
> This
> > > > > > being said I
> > > > > > > > > > do
> > > > > > > > > > > think that the place where this should be impemented
> is a
> > > > good
> > > > > > question
> > > > > > > > > > on
> > > > > > > > > > > its own. Maybe it makes sense to have this kind of a
> feature
> > > > in
> > > > > > IPC and
> > > > > > > > > > > somehow use it in Flight, maybe not.
> > > > > > > > > > > 2. The point about dictionaries deserves a dedicated
> section
> > > > in
> > > > > > the
> > > > > > > > > > > proposal. Nate and David brought it up and shared some
> > > > > insights.
> > > > > > I'll try
> > > > > > > > > > > to aggregate them and we can continue the discussion
> form
> > > > > there.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Gosh
> > > > > > > > > > >
> > > > > > > > > > > On Sat., 26 Jun. 2021, 17:26 Nate Bauernfeind, <
> > > > > > > > > > natebauernfe...@deephaven.io>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > makes it more difficult to bring schema
> evolution
> > > > back
> > > > > > into the
> > > > > > > > > > > > > > > IPC Stream format (i.e. it would live only in
> flight)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Gosh's proposal extends the flatbuffer
> structures not
> > > > the
> > > > > > > > > > protobufs.
> > > > > > > > > > > > Can
> > > > > > > > > > > > > > you help me understand how difficult it would be
> to
> > > > bring
> > > > > > the
> > > > > > > > > > > > `schema_id`
> > > > > > > > > > > > > > approach to the IPC stream format?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I thought we were talking solely about the Flight
> > > > Protobuf
> > > > > > > > > > definitions -
> > > > > > > > > > > > > not the Flatbuffers (and the Google doc at least
> only
> > > > talks
> > > > > > about the
> > > > > > > > > > > > > Protobufs).
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I somehow missed that schema_id is being added to
> protobuf
> > > > in
> > > > > > the
> > > > > > > > > > document.
> > > > > > > > > > > > It feels to me that the schema_id is a property that
> would
> > > > > > ideally only
> > > > > > > > > > > > apply to the RecordBatch. I better understand Micah's
> > > > > > dictionary
> > > > > > > > > > concerns,
> > > > > > > > > > > > now, too.
> > > > > > > > > > > >
> > > > > > > > > > > > > Side Question: Why isn't the IPC stream format a
> series
> > > > of
> > > > > > the flight
> > > > > > > > > > > > > > protobufs? It's a real shame that there is no
> standard
> > > > > way
> > > > > > to
> > > > > > > > > > > > > > capture/replay a stream with app_metadata.
> (Obviously
> > > > > > ignoring the
> > > > > > > > > > > > > > annoyances around protobuf wrapping flatbuffers.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > The IPC format was defined long before Flight, and
> > > > Flight's
> > > > > > > > > > app_metadata
> > > > > > > > > > > > > was added after Flight's initial definition. Note
> an IPC
> > > > > > message does
> > > > > > > > > > > > have
> > > > > > > > > > > > > a provision for key-value metadata, though I think
> APIs
> > > > for
> > > > > > that are
> > > > > > > > > > not
> > > > > > > > > > > > > fully exposed. (See ARROW-6940:
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-6940
> and
> > > > > > despite my
> > > > > > > > > > comments
> > > > > > > > > > > > > there perhaps we need to unify or at least
> consider how
> > > > > > Flight's
> > > > > > > > > > > > > app_metadata relates to the IPC message
> custom_metadata.
> > > > > Also
> > > > > > > > > > perhaps see
> > > > > > > > > > > > > ARROW-1059.)
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > KeyValue unfortunately is string to string. In
> flatbuffer
> > > > > > strings are
> > > > > > > > > > only
> > > > > > > > > > > > UTF-8 or 7-bit ASCII. The app_metadata on the other
> hand is
> > > > > > opaque
> > > > > > > > > > bytes.
> > > > > > > > > > > > The latter is a bit more useful.
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > >
> > >
> > >
> > > --
> > >
>

Reply via email to