Hey everyone,

With some of the other developments surrounding libraries adopting the
Arrow C Data interfaces, there's been a consistent question about handling
tables (record batch) vs columns vs scalars.

Right now, a Record Batch is sent through the C interface as a struct
column whose children are the individual columns of the batch and a Scalar
would be sent through as just an array of length 1. Applications would have
to create their own contextual way of indicating whether the Array being
passed should be interpreted as just a single array/column or should be
treated as a full table/record batch.

Rather than introducing new members or otherwise complicating the structs,
I wanted to gauge how people felt about introducing new flags for the
ArrowSchema object.

Right now, we only have 3 defined flags:

ARROW_FLAG_DICTIONARY_ORDERED
ARROW_FLAG_NULLABLE
ARROW_FLAG_MAP_KEYS_SORTED

The flags member of the struct is an int64, so we have another 61 bits to
play with! If no one has any strong objections, I wanted to propose adding
at least 2 new flags:

ARROW_FLAG_RECORD_BATCH
ARROW_FLAG_SINGLE_COLUMN

If neither flag is set, then it is contextual as to whether it should be
expected that the corresponding data is a table or a single column. If
ARROW_FLAG_RECORD_BATCH is set, then the corresponding data MUST be a
struct array and should be interpreted as a record batch by any consumers
(erroring otherwise). If ARROW_FLAG_SINGLE_COLUMN is set, then the
corresponding ArrowArray should be interpreted and utilized as a single
array/column regardless of its type.

This provides a standardized way for producers of Arrow data to indicate in
the schema to consumers how the data they produced should be used (as a
table or column) rather than forcing everyone to come up with their own
contextualized way of handling things (extra arguments, differently named
functions for RecordBatch / Array, etc.).

If there's no objections to this, I'll take a pass at implementing these
flags in C++ and Go to put up a PR and make a Vote thread. I just wanted to
see what others on the mailing list thought before I go ahead and put
effort into this.

Thanks everyone! Take care!

--Matt

Reply via email to