[ https://issues.apache.org/jira/browse/SPARK-19428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852946#comment-15852946 ]
Luke Miner edited comment on SPARK-19428 at 2/5/17 3:45 AM: ------------------------------------------------------------ I did not know of the existence of the {{first}} function for {{GroupedData}}. Would be nice to include it in the {{GroupedData}} portion of the api docs. Doesn't seem like this deals with the need to sort the {{GroupedData}} first though. [~srowen] I'm not clear on how you could find the nth most recent timestamps by group needed to perform the final join. The method I've used in the past is to loop through each id, filter by the id, sort the filtered dataframe on timestamp, limit by n, and then append each id specific dataframe back together. But this is extremely slow. was (Author: lminer): I did not know of the existence of the {first}} function for {{GroupedData}}. Would be nice to include it in the {{GroupedData}} portion of the api docs. Doesn't seem like this deals with the need to sort the {{GroupedData}} first though. [~srowen] I'm not clear on how you could find the nth most recent timestamps by group needed to perform the final join. The method I've used in the past is to loop through each id, filter by the id, sort the filtered dataframe on timestamp, limit by n, and then append each id specific dataframe back together. But this is extremely slow. > Ability to select first row of groupby > -------------------------------------- > > Key: SPARK-19428 > URL: https://issues.apache.org/jira/browse/SPARK-19428 > Project: Spark > Issue Type: Brainstorming > Components: SQL > Affects Versions: 2.1.0 > Reporter: Luke Miner > Priority: Minor > > It would be nice to be able to select the first row from {{GroupedData}}. > Pandas has something like this: > {{df.groupby('group').first()}} > It's especially handy if you can order the group as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org