Hello Wes, where do you intend the Field object living then? Would this be part of the schema of the Table object?
Uwe On Mon, Jul 8, 2019, at 11:18 PM, Wes McKinney wrote: > hi folks, > > For some time now I have been uncertain about the utility provided by > the arrow::Column C++ class. Fundamentally, it is a container for two > things: > > * An arrow::Field object (name and data type) > * An arrow::ChunkedArray object for the data > > It was added to the C++ library in ARROW-23 in March 2016 as the basis > for the arrow::Table class which represents a collection of > ChunkedArray objects coming usually from multiple RecordBatches. > Sometimes a Table will have mostly columns with a single chunk while > some columns will have many chunks. > > I'm concerned about continuing to maintain the Column class as it's > spilling complexity into computational libraries and bindings alike. > > The Python Column class for example mostly forwards method calls to > the underlying ChunkedArray > > https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L355 > > If the developer wants to construct a Table or insert a new "column", > Column objects must generally be constructed, leading to boilerplate > without clear benefit. > > Since we're discussing building a more significant higher-level > DataFrame interface per past mailing list discussions, my preference > would be to consider removing the Column class to make the user- and > developer-facing data structures simpler. I hate to propose breaking > API changes, so it may not be practical at this point, but I wanted to > at least bring up the issue to see if others have opinions after > working with the library for a few years. > > Thanks > Wes >