https://issues.apache.org/jira/browse/SPARK-5954 is for this issue and Shuo is working on it. We will first implement topByKey for RDD and them we could add it to DataFrames. -Xiangrui
On Mon, Mar 9, 2015 at 9:43 PM, Moss <rhoud...@gmail.com> wrote: > I do have a schemaRDD where I want to group by a given field F1, but want > the result to be not a single row per group but multiple rows per group > where only the rows that have the N top F2 field values are kept. > The issue is that the groupBy operation is an aggregation of multiple rows > to a single one. > Any suggestion or hint will be appreciated. > > Best, > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Top-rows-per-group-tp21983.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org