nealrichardson opened a new pull request #9510:
URL: https://github.com/apache/arrow/pull/9510


   https://github.com/tidyverse/dplyr/issues/5763 concisely exposes multiple 
issues:
   
   * We don't expect the .drop argument to group_by (fixed here)
   * You can apparently provide expressions to `...` in group_by, which 
effectively do `mutate()` to add the columns and then group by them (detected 
here with a useful error; implementation of the feature deferred to ARROW-11658)
   * Our input validation in `[.ArrowTabular` and the Table/RecordBatch 
`SelectColumns` methods was incomplete, and where it was present was not very 
helpful (fixed here)
   
   Other issues observed here and deferred:
   
   * Table has a proper SelectColumns method in the C++ library but the 
RecordBatch one is in the R library and should be pushed down to C++ 
(ARROW-11660)
   * The .drop argument is not actually implemented here, it is only caught if 
specified, and if the value given is other than the default, it errors. We 
should keep it around like we do the group_vars themselves (ARROW-11658), and 
we'll need to implement it too in the C++ query engine eventually.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to