ianmcook commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r582349495
########## File path: r/R/dplyr.R ########## @@ -423,26 +493,114 @@ ungroup.arrow_dplyr_query <- function(x, ...) { } ungroup.Dataset <- ungroup.ArrowTabular <- force -mutate.arrow_dplyr_query <- function(.data, ...) { +mutate.arrow_dplyr_query <- function(.data, + ..., + .keep = c("all", "used", "unused", "none"), + .before = NULL, + .after = NULL) { + call <- match.call() + exprs <- quos(...) + if (length(exprs) == 0) { + # Nothing to do + return(.data) + } + .data <- arrow_dplyr_query(.data) if (query_on_dataset(.data)) { not_implemented_for_dataset("mutate()") } - # TODO: see if we can defer evaluating the expressions and not collect here. - # It's different from filters (as currently implemented) because the basic - # vector transformation functions aren't yet implemented in Arrow C++. - dplyr::mutate(dplyr::collect(.data), ...) + + .keep <- match.arg(.keep) + .before <- enquo(.before) + .after <- enquo(.after) + # Restrict the cases we support for now + if (!quo_is_null(.before) || !quo_is_null(.after)) { + # TODO(ARROW-11701) + return(abandon_ship(call, .data, '.before and .after arguments are not supported in Arrow')) + } else if (length(group_vars(.data)) > 0) { + # mutate() on a grouped dataset does calculations within groups + # This doesn't matter on scalar ops (arithmetic etc.) but it does + # for things with aggregations (e.g. subtracting the mean) + return(abandon_ship(call, .data, 'mutate() on grouped data not supported in Arrow')) + } + + # Check for unnamed expressions and fix if any + unnamed <- !nzchar(names(exprs)) + # Deparse and take the first element in case they're long expressions + names(exprs)[unnamed] <- map_chr(exprs[unnamed], ~deparse(.)[1]) Review comment: Wait, since this renaming only applies to columns that did not have names to begin with, we _should_ dedupe them. I think my code in the comment above (https://github.com/apache/arrow/pull/9521#discussion_r582298264) should do it right. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org