Re: Group by order by

2015-04-27 Thread Richard Marscher
It's not related to Spark, but the concept of what you are trying to do
with the data. Grouping by ID means consolidating data for each ID down to
1 row per ID. You can sort by time after this point yes, but you would need
to either take each ID and time value pair OR do some aggregate operation
on the time. That's what the error message is explaining. Maybe you can
describe what you want your results to look like?

Here is some detail about the underlying operations here:

Example Data:
ID |  Time |  SomeVal

102-02-154
1   02-03-15 5
2   02-02-15 4
2   02-02-15 5
2   02-05-15 2

A.

So if you do Group By ID this means 1 row per ID like below:

ID

1
2

To include Time in this projection you need to aggregate it with a function
to a single value. Then and only then can you use it in the projection and
sort on it.

SELECT id, max(time) FROM sample GROUP BY id SORT BY max(time) desc;

ID  | max(time)
2 02-05-15
1 02-03-15

B.

Or if you do Group by ID, time then you get 1 row per ID and time pair:

ID | Time
102-02-15
102-03-15
202-02-15
202-05-15

Notice both rows with ID `2` and time `02-02-15` group down to 1 row in the
results here. In this case you can sort the results by time without using
an aggregate function.

SELECT id, time FROM sample GROUP BY id, time SORT BY time desc;

ID | Time
202-05-15
102-03-15
102-02-15
202-02-15


On Mon, Apr 27, 2015 at 3:28 PM, Ulanov, Alexander alexander.ula...@hp.com
wrote:

  Hi Richard,



 There are several values of time per id. Is there a way to perform group
 by id and sort by time in Spark?



 Best regards, Alexander



 *From:* Richard Marscher [mailto:rmarsc...@localytics.com]
 *Sent:* Monday, April 27, 2015 12:20 PM
 *To:* Ulanov, Alexander
 *Cc:* user@spark.apache.org
 *Subject:* Re: Group by order by



 Hi,



 that error seems to indicate the basic query is not properly expressed. If
 you group by just ID, then that means it would need to aggregate all the
 time values into one value per ID, so you can't sort by it. Thus it tries
 to suggest an aggregate function for time so you can have 1 value per ID
 and properly sort it.



 On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander 
 alexander.ula...@hp.com wrote:

  Hi,



 Could you suggest what is the best way to do “group by x order by y” in
 Spark?



 When I try to perform it with Spark SQL I get the following error (Spark
 1.3):



 val results = sqlContext.sql(select * from sample group by id order by
 time)

 org.apache.spark.sql.AnalysisException: expression 'time' is neither
 present in the group by, nor is it an aggregate function. Add to group by
 or wrap in first() if you don't care which value you get.;

 at
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37)



 Is there a way to do it with just RDD?



 Best regards, Alexander





RE: Group by order by

2015-04-27 Thread Ulanov, Alexander
Thanks, it should be
“select id, time, min(x1), min(x2), … from data group by id, time order by time”

(“min” or other aggregate function to pick other fields)

Forgot to mention that (id, time) is my primary key and I took for granted that 
it worked in my MySQL example.

Best regards, Alexander


From: Richard Marscher [mailto:rmarsc...@localytics.com]
Sent: Monday, April 27, 2015 12:47 PM
To: Ulanov, Alexander
Cc: user@spark.apache.org
Subject: Re: Group by order by

It's not related to Spark, but the concept of what you are trying to do with 
the data. Grouping by ID means consolidating data for each ID down to 1 row per 
ID. You can sort by time after this point yes, but you would need to either 
take each ID and time value pair OR do some aggregate operation on the time. 
That's what the error message is explaining. Maybe you can describe what you 
want your results to look like?

Here is some detail about the underlying operations here:

Example Data:
ID |  Time |  SomeVal

102-02-154
1   02-03-15 5
2   02-02-15 4
2   02-02-15 5
2   02-05-15 2

A.

So if you do Group By ID this means 1 row per ID like below:

ID

1
2

To include Time in this projection you need to aggregate it with a function to 
a single value. Then and only then can you use it in the projection and sort on 
it.

SELECT id, max(time) FROM sample GROUP BY id SORT BY max(time) desc;

ID  | max(time)
2 02-05-15
1 02-03-15

B.

Or if you do Group by ID, time then you get 1 row per ID and time pair:

ID | Time
102-02-15
102-03-15
202-02-15
202-05-15

Notice both rows with ID `2` and time `02-02-15` group down to 1 row in the 
results here. In this case you can sort the results by time without using an 
aggregate function.

SELECT id, time FROM sample GROUP BY id, time SORT BY time desc;

ID | Time
202-05-15
102-03-15
102-02-15
202-02-15

On Mon, Apr 27, 2015 at 3:28 PM, Ulanov, Alexander 
alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
Hi Richard,

There are several values of time per id. Is there a way to perform group by id 
and sort by time in Spark?

Best regards, Alexander

From: Richard Marscher 
[mailto:rmarsc...@localytics.commailto:rmarsc...@localytics.com]
Sent: Monday, April 27, 2015 12:20 PM
To: Ulanov, Alexander
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Group by order by

Hi,

that error seems to indicate the basic query is not properly expressed. If you 
group by just ID, then that means it would need to aggregate all the time 
values into one value per ID, so you can't sort by it. Thus it tries to suggest 
an aggregate function for time so you can have 1 value per ID and properly sort 
it.

On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander 
alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
Hi,

Could you suggest what is the best way to do “group by x order by y” in Spark?

When I try to perform it with Spark SQL I get the following error (Spark 1.3):

val results = sqlContext.sql(select * from sample group by id order by time)
org.apache.spark.sql.AnalysisException: expression 'time' is neither present in 
the group by, nor is it an aggregate function. Add to group by or wrap in 
first() if you don't care which value you get.;
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37)

Is there a way to do it with just RDD?

Best regards, Alexander




Group by order by

2015-04-27 Thread Ulanov, Alexander
Hi,

Could you suggest what is the best way to do group by x order by y in Spark?

When I try to perform it with Spark SQL I get the following error (Spark 1.3):

val results = sqlContext.sql(select * from sample group by id order by time)
org.apache.spark.sql.AnalysisException: expression 'time' is neither present in 
the group by, nor is it an aggregate function. Add to group by or wrap in 
first() if you don't care which value you get.;
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37)

Is there a way to do it with just RDD?

Best regards, Alexander


Re: Group by order by

2015-04-27 Thread Richard Marscher
Hi,

that error seems to indicate the basic query is not properly expressed. If
you group by just ID, then that means it would need to aggregate all the
time values into one value per ID, so you can't sort by it. Thus it tries
to suggest an aggregate function for time so you can have 1 value per ID
and properly sort it.

On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander alexander.ula...@hp.com
wrote:

  Hi,



 Could you suggest what is the best way to do “group by x order by y” in
 Spark?



 When I try to perform it with Spark SQL I get the following error (Spark
 1.3):



 val results = sqlContext.sql(select * from sample group by id order by
 time)

 org.apache.spark.sql.AnalysisException: expression 'time' is neither
 present in the group by, nor is it an aggregate function. Add to group by
 or wrap in first() if you don't care which value you get.;

 at
 org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37)



 Is there a way to do it with just RDD?



 Best regards, Alexander



RE: Group by order by

2015-04-27 Thread Ulanov, Alexander
Hi Richard,

There are several values of time per id. Is there a way to perform group by id 
and sort by time in Spark?

Best regards, Alexander

From: Richard Marscher [mailto:rmarsc...@localytics.com]
Sent: Monday, April 27, 2015 12:20 PM
To: Ulanov, Alexander
Cc: user@spark.apache.org
Subject: Re: Group by order by

Hi,

that error seems to indicate the basic query is not properly expressed. If you 
group by just ID, then that means it would need to aggregate all the time 
values into one value per ID, so you can't sort by it. Thus it tries to suggest 
an aggregate function for time so you can have 1 value per ID and properly sort 
it.

On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander 
alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote:
Hi,

Could you suggest what is the best way to do “group by x order by y” in Spark?

When I try to perform it with Spark SQL I get the following error (Spark 1.3):

val results = sqlContext.sql(select * from sample group by id order by time)
org.apache.spark.sql.AnalysisException: expression 'time' is neither present in 
the group by, nor is it an aggregate function. Add to group by or wrap in 
first() if you don't care which value you get.;
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37)

Is there a way to do it with just RDD?

Best regards, Alexander



RE: group by order by fails

2015-02-27 Thread Tridib Samanta
Thanks Michael! It worked. Some how my mails are not getting accepted by spark 
user mailing list. :(
 
From: mich...@databricks.com
Date: Thu, 26 Feb 2015 17:49:43 -0800
Subject: Re: group by order by fails
To: tridib.sama...@live.com
CC: ak...@sigmoidanalytics.com; user@spark.apache.org

Assign an alias to the count in the select clause and use that alias in the 
order by clause.
On Wed, Feb 25, 2015 at 11:17 PM, Tridib Samanta tridib.sama...@live.com 
wrote:



Actually I just realized , I am using 1.2.0.
 
Thanks
Tridib
 
Date: Thu, 26 Feb 2015 12:37:06 +0530
Subject: Re: group by order by fails
From: ak...@sigmoidanalytics.com
To: tridib.sama...@live.com
CC: user@spark.apache.org

Which version of spark are you having? It seems there was a similar Jira 
https://issues.apache.org/jira/browse/SPARK-2474ThanksBest Regards

On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote:
Hi,

I need to find top 10 most selling samples. So query looks like:

select  s.name, count(s.name) from sample s group by s.name order by

count(s.name)



This query fails with following error:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree:

Sort [COUNT(name#0) ASC], true

 Exchange (RangePartitioning [COUNT(name#0) ASC], 200)

  Aggregate false, [name#0], [name#0 AS

name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]

   Exchange (HashPartitioning [name#0], 200)

Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]

 PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at

JavaSQLContext.scala:102



at

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)

at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206)

at 
org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43)

at

org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)

at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)

at

org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)

at

com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at

org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

at

org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

at

org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)

at

org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)

at

org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)

at

org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)

at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

at org.junit.runner.JUnitCore.run(JUnitCore.java:160)

at

com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)

at

com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)

at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:

execute, tree:

Exchange (RangePartitioning [COUNT(name#0) ASC], 200)

 Aggregate false, [name#0], [name#0 AS

name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]

  Exchange (HashPartitioning [name#0], 200)

   Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]

PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at

JavaSQLContext.scala

Re: group by order by fails

2015-02-27 Thread iceback
String query = select  s.name, count(s.name) as tally from sample s group by
s.name order by tally;



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/group-by-order-by-fails-tp21815p21854.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: group by order by fails

2015-02-26 Thread Michael Armbrust
Assign an alias to the count in the select clause and use that alias in the
order by clause.

On Wed, Feb 25, 2015 at 11:17 PM, Tridib Samanta tridib.sama...@live.com
wrote:

 Actually I just realized , I am using 1.2.0.

 Thanks
 Tridib

 --
 Date: Thu, 26 Feb 2015 12:37:06 +0530
 Subject: Re: group by order by fails
 From: ak...@sigmoidanalytics.com
 To: tridib.sama...@live.com
 CC: user@spark.apache.org


 Which version of spark are you having? It seems there was a similar Jira
 https://issues.apache.org/jira/browse/SPARK-2474

 Thanks
 Best Regards

 On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote:

 Hi,
 I need to find top 10 most selling samples. So query looks like:
 select  s.name, count(s.name) from sample s group by s.name order by
 count(s.name)

 This query fails with following error:
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree:
 Sort [COUNT(name#0) ASC], true
  Exchange (RangePartitioning [COUNT(name#0) ASC], 200)
   Aggregate false, [name#0], [name#0 AS
 name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]
Exchange (HashPartitioning [name#0], 200)
 Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]
  PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at
 JavaSQLContext.scala:102

 at
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
 at
 org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206)
 at
 org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43)
 at
 org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
 at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
 at

 org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)
 at

 com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at

 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at

 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at

 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at

 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
 at

 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
 at

 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
 at
 org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
 at

 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
 at

 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
 at
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at
 com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at

 com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
 Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
 execute, tree:
 Exchange (RangePartitioning [COUNT(name#0) ASC], 200)
  Aggregate false, [name#0], [name#0 AS
 name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]
   Exchange (HashPartitioning [name#0], 200)
Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]
 PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at
 JavaSQLContext.scala:102

 at
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
 at
 org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47)
 at

 org.apache.spark.sql.execution.Sort$$anonfun

Re: group by order by fails

2015-02-25 Thread Akhil Das
Which version of spark are you having? It seems there was a similar Jira
https://issues.apache.org/jira/browse/SPARK-2474

Thanks
Best Regards

On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote:

 Hi,
 I need to find top 10 most selling samples. So query looks like:
 select  s.name, count(s.name) from sample s group by s.name order by
 count(s.name)

 This query fails with following error:
 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree:
 Sort [COUNT(name#0) ASC], true
  Exchange (RangePartitioning [COUNT(name#0) ASC], 200)
   Aggregate false, [name#0], [name#0 AS
 name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]
Exchange (HashPartitioning [name#0], 200)
 Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]
  PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at
 JavaSQLContext.scala:102

 at
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
 at
 org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206)
 at
 org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43)
 at
 org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
 at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
 at

 org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)
 at

 com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at

 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at

 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at

 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at

 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
 at

 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
 at

 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
 at
 org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
 at

 com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
 at

 com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
 at
 com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at
 com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at

 com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
 Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
 execute, tree:
 Exchange (RangePartitioning [COUNT(name#0) ASC], 200)
  Aggregate false, [name#0], [name#0 AS
 name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]
   Exchange (HashPartitioning [name#0], 200)
Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]
 PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at
 JavaSQLContext.scala:102

 at
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
 at
 org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47)
 at

 org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207)
 at

 org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207)
 at
 org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
 ... 37 more
 Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
 No function to evaluate expression. type: Count, tree: COUNT(input[2

RE: group by order by fails

2015-02-25 Thread Tridib Samanta
Actually I just realized , I am using 1.2.0.
 
Thanks
Tridib
 
Date: Thu, 26 Feb 2015 12:37:06 +0530
Subject: Re: group by order by fails
From: ak...@sigmoidanalytics.com
To: tridib.sama...@live.com
CC: user@spark.apache.org

Which version of spark are you having? It seems there was a similar Jira 
https://issues.apache.org/jira/browse/SPARK-2474ThanksBest Regards

On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote:
Hi,

I need to find top 10 most selling samples. So query looks like:

select  s.name, count(s.name) from sample s group by s.name order by

count(s.name)



This query fails with following error:

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree:

Sort [COUNT(name#0) ASC], true

 Exchange (RangePartitioning [COUNT(name#0) ASC], 200)

  Aggregate false, [name#0], [name#0 AS

name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]

   Exchange (HashPartitioning [name#0], 200)

Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]

 PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at

JavaSQLContext.scala:102



at

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)

at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206)

at 
org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43)

at

org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)

at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)

at

org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)

at

com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at

org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

at

org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

at

org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)

at

org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)

at

org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)

at

org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)

at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)

at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)

at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)

at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)

at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)

at org.junit.runners.ParentRunner.run(ParentRunner.java:309)

at org.junit.runner.JUnitCore.run(JUnitCore.java:160)

at

com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)

at

com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)

at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:

execute, tree:

Exchange (RangePartitioning [COUNT(name#0) ASC], 200)

 Aggregate false, [name#0], [name#0 AS

name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]

  Exchange (HashPartitioning [name#0], 200)

   Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]

PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at

JavaSQLContext.scala:102



at

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)

at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47)

at

org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207)

at

org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207)

at

org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46

group by order by fails

2015-02-25 Thread tridib
Hi,
I need to find top 10 most selling samples. So query looks like:
select  s.name, count(s.name) from sample s group by s.name order by
count(s.name)

This query fails with following error:
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree:
Sort [COUNT(name#0) ASC], true
 Exchange (RangePartitioning [COUNT(name#0) ASC], 200)
  Aggregate false, [name#0], [name#0 AS
name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]
   Exchange (HashPartitioning [name#0], 200)
Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]
 PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at
JavaSQLContext.scala:102

at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206)
at 
org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43)
at
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
at
org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)
at
com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
execute, tree:
Exchange (RangePartitioning [COUNT(name#0) ASC], 200)
 Aggregate false, [name#0], [name#0 AS
name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0]
  Exchange (HashPartitioning [name#0], 200)
   Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L]
PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at
JavaSQLContext.scala:102

at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47)
at
org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207)
at
org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207)
at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
... 37 more
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
No function to evaluate expression. type: Count, tree: COUNT(input[2])
at
org.apache.spark.sql.catalyst.expressions.AggregateExpression.eval(aggregates.scala:41)
at
org.apache.spark.sql.catalyst.expressions.RowOrdering.compare(Row.scala:250)
at
org.apache.spark.sql.catalyst.expressions.RowOrdering.compare(Row.scala:242)
at scala.math.Ordering$$anon$5.compare(Ordering.scala:122