Re: [Spark SQL] How to select first row in each GROUP BY group?

2014-08-25 Thread Michael Armbrust
In our case, the ROW has about 80 columns which exceeds the case class limit.​ Starting with Spark 1.1 you'll be able to also use the applySchema API https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L126 .

Re: [Spark SQL] How to select first row in each GROUP BY group?

2014-08-21 Thread Fengyun RAO
on the SchemaRDD, then for each group just take the first record. From: Fengyun RAO raofeng...@gmail.com Date: Thursday, August 21, 2014 at 8:26 AM To: user@spark.apache.org user@spark.apache.org Subject: Re: [Spark SQL] How to select first row in each GROUP BY group? Could anybody help? I googled

[Spark SQL] How to select first row in each GROUP BY group?

2014-08-20 Thread Fengyun RAO
I have a table with 4 columns: a, b, c, time What I need is something like: SELECT a, b, GroupFirst(c) FROM t GROUP BY a, b GroupFirst means the first item of column c group, and by the first I mean minimal time in that group. In Oracle/Sql Server, we could write: WITH summary AS (