Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

Uwe L. Korn Tue, 09 Jul 2019 00:55:05 -0700

Hello Wes,

where do you intend the Field object living then? Would this be part of the 
schema of the Table object?


Uwe

On Mon, Jul 8, 2019, at 11:18 PM, Wes McKinney wrote:
> hi folks,
> 
> For some time now I have been uncertain about the utility provided by
> the arrow::Column C++ class. Fundamentally, it is a container for two
> things:
> 
> * An arrow::Field object (name and data type)
> * An arrow::ChunkedArray object for the data
> 
> It was added to the C++ library in ARROW-23 in March 2016 as the basis
> for the arrow::Table class which represents a collection of
> ChunkedArray objects coming usually from multiple RecordBatches.
> Sometimes a Table will have mostly columns with a single chunk while
> some columns will have many chunks.
> 
> I'm concerned about continuing to maintain the Column class as it's
> spilling complexity into computational libraries and bindings alike.
> 
> The Python Column class for example mostly forwards method calls to
> the underlying ChunkedArray
> 
> https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L355
> 
> If the developer wants to construct a Table or insert a new "column",
> Column objects must generally be constructed, leading to boilerplate
> without clear benefit.
> 
> Since we're discussing building a more significant higher-level
> DataFrame interface per past mailing list discussions, my preference
> would be to consider removing the Column class to make the user- and
> developer-facing data structures simpler. I hate to propose breaking
> API changes, so it may not be practical at this point, but I wanted to
> at least bring up the issue to see if others have opinions after
> working with the library for a few years.
> 
> Thanks
> Wes
>

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

Reply via email to