jorisvandenbossche commented on pull request #11624: URL: https://github.com/apache/arrow/pull/11624#issuecomment-969091664
I am a bit hesitant to add such a two-step interface to pyarrow. It's indeed the way how it is done in other packages, but the ones that @ianmcook mentions (ibis, pandas, dplyr) also all have slightly different APIs on how to specify this. And then pyarrow would add yet another slightly different interface. (but I also agree that groupby is not a great name as method on the table for this reason) --- Playing a bit with this branch, some other observations: - I find it unexpected that the resulting table always has "key" column instead of reusing the original name that was specified as the key column - Is it possible to group by multiple columns? Not in the current bindings in this PR, but I suppose in c++ / R this is already possible? - I think users will very quickly request the ability to specify the resulting column name .. (to not have things like "column_count_distinct") -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
