[ https://issues.apache.org/jira/browse/ARROW-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Li updated ARROW-16503: ----------------------------- Labels: good-second-issue (was: ) > [C++] Can't concatenate extension arrays > ---------------------------------------- > > Key: ARROW-16503 > URL: https://issues.apache.org/jira/browse/ARROW-16503 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Dewey Dunnington > Priority: Major > Labels: good-second-issue > > It looks like Arrays with an extension type can't be concatenated. From the R > bindings: > {code:R} > library(arrow, warn.conflicts = FALSE) > arr <- vctrs_extension_array(1:10) > concat_arrays(arr, arr) > #> Error: NotImplemented: concatenation of integer(0) > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/array/concatenate.cc:195 > VisitTypeInline(*out_->type, this) > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/array/concatenate.cc:590 > ConcatenateImpl(data, pool).Concatenate(&out_data) > {code} > This shows up more practically when using the query engine: > {code:R} > library(arrow, warn.conflicts = FALSE) > table <- arrow_table( > group = rep(c("a", "b"), 5), > col1 = 1:10, > col2 = vctrs_extension_array(1:10) > ) > tf <- tempfile() > table |> dplyr::group_by(group) |> write_dataset(tf) > open_dataset(tf) |> > dplyr::arrange(col1) |> > dplyr::collect() > #> Error in `dplyr::collect()`: > #> ! NotImplemented: concatenation of extension<arrow.r.vctrs> > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/array/concatenate.cc:195 > VisitTypeInline(*out_->type, this) > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/array/concatenate.cc:590 > ConcatenateImpl(data, pool).Concatenate(&out_data) > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/kernels/vector_selection.cc:2025 > Concatenate(values.chunks(), ctx->memory_pool()) > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/kernels/vector_selection.cc:2084 > TakeCA(*table.column(j), indices, options, ctx) > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/sink_node.cc:527 > impl_->DoFinish() > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:467 > iterator_.Next() > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/record_batch.cc:337 > ReadNext(&batch) > #> > /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/record_batch.cc:351 > ToRecordBatches() > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)