Hi Richard,

There are several values of time per id. Is there a way to perform group by id 
and sort by time in Spark?

Best regards, Alexander

From: Richard Marscher [mailto:rmarsc...@localytics.com]
Sent: Monday, April 27, 2015 12:20 PM
To: Ulanov, Alexander
Cc: user@spark.apache.org
Subject: Re: Group by order by

Hi,

that error seems to indicate the basic query is not properly expressed. If you 
group by just ID, then that means it would need to aggregate all the time 
values into one value per ID, so you can't sort by it. Thus it tries to suggest 
an aggregate function for time so you can have 1 value per ID and properly sort 
it.

On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander 
<alexander.ula...@hp.com<mailto:alexander.ula...@hp.com>> wrote:
Hi,

Could you suggest what is the best way to do “group by x order by y” in Spark?

When I try to perform it with Spark SQL I get the following error (Spark 1.3):

val results = sqlContext.sql("select * from sample group by id order by time")
org.apache.spark.sql.AnalysisException: expression 'time' is neither present in 
the group by, nor is it an aggregate function. Add to group by or wrap in 
first() if you don't care which value you get.;
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37)

Is there a way to do it with just RDD?

Best regards, Alexander

Reply via email to