That's really helpful and your ExtendTable function helps me understand the internals a little better. You're right my sample code only adds one column although I also repeat the function multiple times on the same table pointer.
On Wed, 28 Feb 2024 at 06:35, Aldrin <octalene....@pm.me> wrote: > There may be something now, but I wrote this a few years ago and it may be > helpful [1]. > > The function, ExtendTable​, takes a base_table and adds the columns from > ext_table to it as a "column bind". FieldVec is an arrow::FieldVector which > is a std::vector<std::shared_ptr<arrow::Field>> [2]. Similarly, > ChunkedArrVec is an arrow::ChunkedArrayVector which is a > std::vector<std::shared_ptr<arrow::ChunkedArray>> [3]. My relevant header > is datatypes.hpp [4]. > > This is a zero-copy approach in the sense that I'm copying shared_ptr, but > not the data itself. This requires extending both the schema and the vector > of columns (which I think should be self explanatory in the code). > > Otherwise, I'm quite sure adding columns [5] is a zero-copy function (your > sample code doesn't seem to add more than one column). > > [1]: > https://gitlab.com/skyhookdm/skytether-singlecell/-/blob/develop/src/cpp/processing/dataops.cpp#L305-L337 > [2]: > https://github.com/apache/arrow/blob/main/cpp/src/arrow/type_fwd.h#L68 > [3]: > https://github.com/apache/arrow/blob/main/cpp/src/arrow/type_fwd.h#L88 > [4]: > https://gitlab.com/skyhookdm/skytether-singlecell/-/blob/develop/src/cpp/headers/datatypes.hpp > [5]: > https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow5Table9AddColumnEiNSt10shared_ptrI5FieldEENSt10shared_ptrI12ChunkedArrayEE > > > # ------------------------------ > # Aldrin > > https://github.com/drin/ > https://gitlab.com/octalene > https://keybase.io/octalene > > On Tuesday, February 27th, 2024 at 09:16, Blair Azzopardi < > blai...@gmail.com> wrote: > > Hi > > I'm curious if there's a way of creating a zero copy union of two tables. > Currently, I'm augmenting an existing table by adding new columns (with say > moving averages - see snippet below). I do a pointer swap at the end and > release the memory of the old table (reset). > > I wonder if it's more efficient if I created a new table with the new > columns and then created some kind of "zero-copy table union" of the new > table with the old table. Does that exist? > > That said, perhaps the AddColumn method does re-use the existing table > memory location when it creates a "new Table". > > arrow::Status AddMovingAverage(shared_ptr<arrow::Table>& table, > const std::string& colNameIn, int n, > const std::string& colNameOut) { > auto vals = table->GetColumnByName(colNameIn); > > // calculate moving average vector > vector<double> ma > .... > > // convert vector to arrow array > shared_ptr<arrow::Array> ma_arr; > arrow::DoubleBuilder dbl_builder = arrow::DoubleBuilder(); > > ARROW_RETURN_NOT_OK(dbl_builder.AppendValues(ma.begin(), ma.end())); > ARROW_ASSIGN_OR_RAISE(ma_arr, dbl_builder.Finish()); > // LOG(INFO) << ma_arr->ToString() << std::endl; > > // add new column to table (need to convert to chunked array first) > auto f0 = arrow::field(colNameOut, arrow::float64()); > auto ma_chunked_arr = std::make_shared<arrow::ChunkedArray>(ma_arr); > > // Can this be done more efficiently with copying the original table to > // a new memory location? > ARROW_ASSIGN_OR_RAISE(auto new_table, > table->AddColumn(0, f0, ma_chunked_arr)); > > // swap pointer to new table and clean up > table.swap(new_table); > new_table.reset(); > > return arrow::Status::OK(); > } > > >