[GitHub] [arrow] jorisvandenbossche commented on pull request #11624: ARROW-14608: [Python] Provide access to hash_aggregate functions through a Table.group_by method

GitBox Mon, 15 Nov 2021 08:36:54 -0800


jorisvandenbossche commented on pull request #11624:
URL: https://github.com/apache/arrow/pull/11624#issuecomment-969091664



   I am a bit hesitant to add such a two-step interface to pyarrow. It's indeed 
the way how it is done in other packages, but the ones that @ianmcook mentions 
(ibis, pandas, dplyr) also all have slightly different APIs on how to specify 
this. And then pyarrow would add yet another slightly different interface. 
   
   (but I also agree that groupby is not a great name as method on the table 
for this reason)
   
   ---
   
   Playing a bit with this branch, some other observations:
   
   - I find it unexpected that the resulting table always has "key" column 
instead of reusing the original name that was specified as the key column
   - Is it possible to group by multiple columns? Not in the current bindings 
in this PR, but I suppose in c++ / R this is already possible?
   - I think users will very quickly request the ability to specify the 
resulting column name .. (to not have things like "column_count_distinct")


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #11624: ARROW-14608: [Python] Provide access to hash_aggregate functions through a Table.group_by method

Reply via email to