There may be something now, but I wrote this a few years ago and it may be 
helpful [1].

The function, `ExtendTable`, takes a base_table and adds the columns from 
ext_table to it as a "column bind". FieldVec is an arrow::FieldVector which is 
a std::vector<std::shared_ptr<arrow::Field>> [2]. Similarly, ChunkedArrVec is 
an arrow::ChunkedArrayVector which is a 
std::vector<std::shared_ptr<arrow::ChunkedArray>> [3]. My relevant header is 
datatypes.hpp [4].

This is a zero-copy approach in the sense that I'm copying shared_ptr, but not 
the data itself. This requires extending both the schema and the vector of 
columns (which I think should be self explanatory in the code).

Otherwise, I'm quite sure adding columns [5] is a zero-copy function (your 
sample code doesn't seem to add more than one column).






# ------------------------------

# Aldrin

On Tuesday, February 27th, 2024 at 09:16, Blair Azzopardi <> 

> Hi

> I'm curious if there's a way of creating a zero copy union of two tables. 
> Currently, I'm augmenting an existing table by adding new columns (with say 
> moving averages - see snippet below). I do a pointer swap at the end and 
> release the memory of the old table (reset).

> I wonder if it's more efficient if I created a new table with the new columns 
> and then created some kind of "zero-copy table union" of the new table with 
> the old table. Does that exist?

> That said, perhaps the AddColumn method does re-use the existing table memory 
> location when it creates a "new Table".

> arrow::Status AddMovingAverage(shared_ptr<arrow::Table>& table,
> const std::string& colNameIn, int n,
> const std::string& colNameOut) {
> auto vals = table->GetColumnByName(colNameIn);

> // calculate moving average vector
> vector<double> ma....

> // convert vector to arrow array
> shared_ptr<arrow::Array> ma_arr;
> arrow::DoubleBuilder dbl_builder = arrow::DoubleBuilder();

> ARROW_RETURN_NOT_OK(dbl_builder.AppendValues(ma.begin(), ma.end()));
> ARROW_ASSIGN_OR_RAISE(ma_arr, dbl_builder.Finish());
> // LOG(INFO) << ma_arr->ToString() << std::endl;

> // add new column to table (need to convert to chunked array first)
> auto f0 = arrow::field(colNameOut, arrow::float64());
> auto ma_chunked_arr = std::make_shared<arrow::ChunkedArray>(ma_arr);

> // Can this be done more efficiently with copying the original table to
> // a new memory location?
> ARROW_ASSIGN_OR_RAISE(auto new_table,
> table->AddColumn(0, f0, ma_chunked_arr));

> // swap pointer to new table and clean up
> table.swap(new_table);
> new_table.reset();

> return arrow::Status::OK();
> }

Attachment: publickey - - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to