Wes suggested that maybe there are enough new ideas that it may make sense
to evolve-past the existing structures rather than to bolt-on new
functionality. I would like to learn what requirements exist should new
structures be adopted, and if applicable, would like to turn this into a
full POC proposal.

These are the features that I feel are missing from the existing design:
- the ability to notify that the columns are not consistent in length (e.g.
setting RecordBatch.length to -1; and give the arrow/flight user the true
FieldNode lengths).
- the ability to skip top-level field nodes that have length 0 at a small
cost (such as in a bitset)
- the ability to embed binary payload in the Message flatbuffer wrapper
(instead of String payload only)
- the ability to concurrently use more than one schema (the most likely API
will look like how one identifies a dictionary. ideally dictionaries could
be shared across field nodes in a schema or across schemas in the same
flight)

What other features, or improvements, could/should be considered? Any
strong opinions against the ideas above? (Remember, that a goal of mine is
to be able to send a RecordBatch of rows that were modified intersected
only by the field-nodes that have changed (including those with only inner
node changes); thus the columns are a subset of the full schema and that
the length of each node is independent of the other).

On Fri, Jul 9, 2021 at 9:26 AM Wes McKinney <wesmck...@gmail.com> wrote:
> It sounds like we may want to discuss some potential evolutions of the
> Arrow binary protocol (for example: new Message types). Certainly a
> can of worms but rather than trying to bolt some new functionality
> onto the existing structures, it might be better to support the new
> use cases through some new structures which will be more clear cut
> from a forward compatibility standpoint.

Nate

--

Reply via email to