nealrichardson opened a new pull request #9510: URL: https://github.com/apache/arrow/pull/9510
https://github.com/tidyverse/dplyr/issues/5763 concisely exposes multiple issues: * We don't expect the .drop argument to group_by (fixed here) * You can apparently provide expressions to `...` in group_by, which effectively do `mutate()` to add the columns and then group by them (detected here with a useful error; implementation of the feature deferred to ARROW-11658) * Our input validation in `[.ArrowTabular` and the Table/RecordBatch `SelectColumns` methods was incomplete, and where it was present was not very helpful (fixed here) Other issues observed here and deferred: * Table has a proper SelectColumns method in the C++ library but the RecordBatch one is in the R library and should be pushed down to C++ (ARROW-11660) * The .drop argument is not actually implemented here, it is only caught if specified, and if the value given is other than the default, it errors. We should keep it around like we do the group_vars themselves (ARROW-11658), and we'll need to implement it too in the C++ query engine eventually. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
