westonpace commented on code in PR #13150: URL: https://github.com/apache/arrow/pull/13150#discussion_r882163835
########## cpp/src/arrow/compute/api_aggregate.h: ########## @@ -480,6 +480,12 @@ struct ARROW_EXPORT Aggregate { /// options for the aggregation function const FunctionOptions* options; Review Comment: We should convert this to `std::unique_ptr<FunctionOptions>` but I don't mind if we leave that for a follow-up PR. ########## r/R/query-engine.R: ########## @@ -121,11 +119,13 @@ ExecPlan <- R6Class("ExecPlan", x }) } + target_names <- names(.data$aggregations) + for (i in seq_len(length(target_names))) { + .data$aggregations[[i]][["name"]] <- .data$aggregations[[i]][["target"]] <- target_names[i] Review Comment: Hmm...do we have any test case doing something like... ``` mtcars %>% group_by(cyl) %>% summarise( disp_mean = mean(disp), hp_mean = mean(hp) ) ``` I would expect `target` to be `disp` and `name` to be `disp_mean`. It's always possible we are renaming the columns somewhere else in the generated plan as well in which case `name` is meaningless here. ########## cpp/src/arrow/compute/exec/test_util.cc: ########## @@ -372,22 +374,12 @@ static inline void PrintToImpl(const std::string& factory_name, for (const auto& agg : o->aggregates) { *os << agg.function << "<"; if (agg.options) PrintTo(*agg.options, os); + *os << agg.target.ToString() << "<"; Review Comment: I'm not sure this is correct. Ideally we would have something like... ``` aggregates=[{function=hash_sum<>,target=a,name=sum(a)}] ``` I think the way it is now it would print: ``` aggregates={hash_sum<a<sum_a<>,}, ``` ########## cpp/src/arrow/compute/api_aggregate.h: ########## @@ -480,6 +480,12 @@ struct ARROW_EXPORT Aggregate { /// options for the aggregation function const FunctionOptions* options; + + // fields to which aggregations will be applied + FieldRef target; + + // output field name for aggregations Review Comment: ```suggestion // output field name ``` ########## cpp/src/arrow/compute/kernels/hash_aggregate_test.cc: ########## @@ -975,12 +964,12 @@ TEST(GroupBy, SumMeanProductDecimal) { }, {table->GetColumnByName("key")}, { - {"hash_sum", nullptr}, - {"hash_sum", nullptr}, - {"hash_mean", nullptr}, - {"hash_mean", nullptr}, - {"hash_product", nullptr}, - {"hash_product", nullptr}, + {"hash_sum", nullptr, "agg_0", "hash_sum"}, + {"hash_sum", nullptr, "agg_1", "hash_sum"}, + {"hash_mean", nullptr, "agg_2", "hash_mean"}, + {"hash_mean", nullptr, "agg_3", "hash_mean"}, + {"hash_product", nullptr, "agg_4", "hash_product"}, + {"hash_product", nullptr, "agg_5", "hash_product"}, Review Comment: I don't think we should have to make these changes and they do not make the test easier to read. Could we change the definition of `GroupByTest` so that instead of taking in `const std::vector<internal::Aggregate>& aggregates` it takes in `const std::vector<TestAggregate>& aggregates` where we define: ``` struct TestAggregate { std::string function; const FunctionOptions* options; }; ``` The old style makes sense for these unit tests since we don't really care about what we are naming the columns. ########## r/src/compute-exec.cpp: ########## @@ -226,31 +226,27 @@ std::shared_ptr<compute::ExecNode> ExecNode_Project( // [[arrow::export]] std::shared_ptr<compute::ExecNode> ExecNode_Aggregate( const std::shared_ptr<compute::ExecNode>& input, cpp11::list options, - std::vector<std::string> target_names, std::vector<std::string> out_field_names, std::vector<std::string> key_names) { std::vector<arrow::compute::internal::Aggregate> aggregates; std::vector<std::shared_ptr<arrow::compute::FunctionOptions>> keep_alives; - for (cpp11::list name_opts : options) { - auto name = cpp11::as_cpp<std::string>(name_opts[0]); - auto opts = make_compute_options(name, name_opts[1]); + auto function = cpp11::as_cpp<std::string>(name_opts["fun"]); + auto opts = make_compute_options(function, name_opts["options"]); Review Comment: It's a good thing this is probably always `nullptr` as this would be unsafe otherwise :). Though, as I said above, this is a concern for another PR. ########## cpp/src/arrow/compute/api_aggregate.h: ########## @@ -480,6 +480,12 @@ struct ARROW_EXPORT Aggregate { /// options for the aggregation function const FunctionOptions* options; + + // fields to which aggregations will be applied Review Comment: ```suggestion // field to which aggregations will be applied ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org