In our case, the ROW has about 80 columns which exceeds the case class
limit.
Starting with Spark 1.1 you'll be able to also use the applySchema API
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L126
.
on the SchemaRDD, then for
each group just take the first record.
From: Fengyun RAO raofeng...@gmail.com
Date: Thursday, August 21, 2014 at 8:26 AM
To: user@spark.apache.org user@spark.apache.org
Subject: Re: [Spark SQL] How to select first row in each GROUP BY group?
Could anybody help? I googled
I have a table with 4 columns: a, b, c, time
What I need is something like:
SELECT a, b, GroupFirst(c)
FROM t
GROUP BY a, b
GroupFirst means the first item of column c group,
and by the first I mean minimal time in that group.
In Oracle/Sql Server, we could write:
WITH summary AS (