[ https://issues.apache.org/jira/browse/ARROW-13618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson resolved ARROW-13618. ------------------------------------- Resolution: Done All linked tasks have been completed :tada: > [R] Use Arrow engine for summarize() by default > ------------------------------------------------- > > Key: ARROW-13618 > URL: https://issues.apache.org/jira/browse/ARROW-13618 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Ian Cook > Assignee: Ian Cook > Priority: Critical > Labels: query-engine > Fix For: 6.0.0 > > > ARROW-13344 enabled the dplyr verb {{summarise()}} to use the Arrow engine > but kept this off by default, controlled by the {{arrow.debug}} option. > Before this can be turned on by default, we should ensure that the following > are all implemented: > * a sufficient set of hash aggregate kernels and R aggregate function > mappings to them, covering the vast majority of all aggregate functions that > dplyr users call in {{summarise()}} (add any additional required ones to > ARROW-13339) > * support for a sufficient set of data types in aggregates > * support for a sufficient set of data types in grouping columns > * handling of {{NA}} and {{NaN}} values in aggregates and the {{na.rm}} > option consistent with base R and dplyr (ARROW-13497 and possibly other > issues) > * handling of {{NA}} and {{NaN}} values in grouping columns consistent with > dplyr > * handling empty or bad input to {{summarise()}} (ARROW-13543) > * many new tests to confirm equivalent results from a variety of > {{group_by() %>% summarise()}} queries on data frames and on Arrow data > * resolution of various related bugs -- This message was sent by Atlassian Jira (v8.3.4#803005)