Re: Top rows per group

2015-03-16 Thread Xiangrui Meng
https://issues.apache.org/jira/browse/SPARK-5954 is for this issue and
Shuo is working on it. We will first implement topByKey for RDD and
them we could add it to DataFrames. -Xiangrui

On Mon, Mar 9, 2015 at 9:43 PM, Moss rhoud...@gmail.com wrote:
  I do have a schemaRDD where I want to group by a given field F1, but  want
 the result to be not a single row per group but multiple rows per group
 where only the rows that have the N top F2 field values are kept.
 The issue is that the groupBy operation is an aggregation of multiple rows
 to a single one.
 Any suggestion or hint will be appreciated.

 Best,



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Top-rows-per-group-tp21983.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Top rows per group

2015-03-09 Thread Moss
 I do have a schemaRDD where I want to group by a given field F1, but  want
the result to be not a single row per group but multiple rows per group
where only the rows that have the N top F2 field values are kept.
The issue is that the groupBy operation is an aggregation of multiple rows
to a single one.
Any suggestion or hint will be appreciated.

Best,



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Top-rows-per-group-tp21983.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org