Hello Wes,

where do you intend the Field object living then? Would this be part of the 
schema of the Table object?

Uwe

On Mon, Jul 8, 2019, at 11:18 PM, Wes McKinney wrote:
> hi folks,
> 
> For some time now I have been uncertain about the utility provided by
> the arrow::Column C++ class. Fundamentally, it is a container for two
> things:
> 
> * An arrow::Field object (name and data type)
> * An arrow::ChunkedArray object for the data
> 
> It was added to the C++ library in ARROW-23 in March 2016 as the basis
> for the arrow::Table class which represents a collection of
> ChunkedArray objects coming usually from multiple RecordBatches.
> Sometimes a Table will have mostly columns with a single chunk while
> some columns will have many chunks.
> 
> I'm concerned about continuing to maintain the Column class as it's
> spilling complexity into computational libraries and bindings alike.
> 
> The Python Column class for example mostly forwards method calls to
> the underlying ChunkedArray
> 
> https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L355
> 
> If the developer wants to construct a Table or insert a new "column",
> Column objects must generally be constructed, leading to boilerplate
> without clear benefit.
> 
> Since we're discussing building a more significant higher-level
> DataFrame interface per past mailing list discussions, my preference
> would be to consider removing the Column class to make the user- and
> developer-facing data structures simpler. I hate to propose breaking
> API changes, so it may not be practical at this point, but I wanted to
> at least bring up the issue to see if others have opinions after
> working with the library for a few years.
> 
> Thanks
> Wes
>

Reply via email to