Re: Group by order by
It's not related to Spark, but the concept of what you are trying to do with the data. Grouping by ID means consolidating data for each ID down to 1 row per ID. You can sort by time after this point yes, but you would need to either take each ID and time value pair OR do some aggregate operation on the time. That's what the error message is explaining. Maybe you can describe what you want your results to look like? Here is some detail about the underlying operations here: Example Data: ID | Time | SomeVal 102-02-154 1 02-03-15 5 2 02-02-15 4 2 02-02-15 5 2 02-05-15 2 A. So if you do Group By ID this means 1 row per ID like below: ID 1 2 To include Time in this projection you need to aggregate it with a function to a single value. Then and only then can you use it in the projection and sort on it. SELECT id, max(time) FROM sample GROUP BY id SORT BY max(time) desc; ID | max(time) 2 02-05-15 1 02-03-15 B. Or if you do Group by ID, time then you get 1 row per ID and time pair: ID | Time 102-02-15 102-03-15 202-02-15 202-05-15 Notice both rows with ID `2` and time `02-02-15` group down to 1 row in the results here. In this case you can sort the results by time without using an aggregate function. SELECT id, time FROM sample GROUP BY id, time SORT BY time desc; ID | Time 202-05-15 102-03-15 102-02-15 202-02-15 On Mon, Apr 27, 2015 at 3:28 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi Richard, There are several values of time per id. Is there a way to perform group by id and sort by time in Spark? Best regards, Alexander *From:* Richard Marscher [mailto:rmarsc...@localytics.com] *Sent:* Monday, April 27, 2015 12:20 PM *To:* Ulanov, Alexander *Cc:* user@spark.apache.org *Subject:* Re: Group by order by Hi, that error seems to indicate the basic query is not properly expressed. If you group by just ID, then that means it would need to aggregate all the time values into one value per ID, so you can't sort by it. Thus it tries to suggest an aggregate function for time so you can have 1 value per ID and properly sort it. On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Could you suggest what is the best way to do “group by x order by y” in Spark? When I try to perform it with Spark SQL I get the following error (Spark 1.3): val results = sqlContext.sql(select * from sample group by id order by time) org.apache.spark.sql.AnalysisException: expression 'time' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37) Is there a way to do it with just RDD? Best regards, Alexander
RE: Group by order by
Thanks, it should be “select id, time, min(x1), min(x2), … from data group by id, time order by time” (“min” or other aggregate function to pick other fields) Forgot to mention that (id, time) is my primary key and I took for granted that it worked in my MySQL example. Best regards, Alexander From: Richard Marscher [mailto:rmarsc...@localytics.com] Sent: Monday, April 27, 2015 12:47 PM To: Ulanov, Alexander Cc: user@spark.apache.org Subject: Re: Group by order by It's not related to Spark, but the concept of what you are trying to do with the data. Grouping by ID means consolidating data for each ID down to 1 row per ID. You can sort by time after this point yes, but you would need to either take each ID and time value pair OR do some aggregate operation on the time. That's what the error message is explaining. Maybe you can describe what you want your results to look like? Here is some detail about the underlying operations here: Example Data: ID | Time | SomeVal 102-02-154 1 02-03-15 5 2 02-02-15 4 2 02-02-15 5 2 02-05-15 2 A. So if you do Group By ID this means 1 row per ID like below: ID 1 2 To include Time in this projection you need to aggregate it with a function to a single value. Then and only then can you use it in the projection and sort on it. SELECT id, max(time) FROM sample GROUP BY id SORT BY max(time) desc; ID | max(time) 2 02-05-15 1 02-03-15 B. Or if you do Group by ID, time then you get 1 row per ID and time pair: ID | Time 102-02-15 102-03-15 202-02-15 202-05-15 Notice both rows with ID `2` and time `02-02-15` group down to 1 row in the results here. In this case you can sort the results by time without using an aggregate function. SELECT id, time FROM sample GROUP BY id, time SORT BY time desc; ID | Time 202-05-15 102-03-15 102-02-15 202-02-15 On Mon, Apr 27, 2015 at 3:28 PM, Ulanov, Alexander alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote: Hi Richard, There are several values of time per id. Is there a way to perform group by id and sort by time in Spark? Best regards, Alexander From: Richard Marscher [mailto:rmarsc...@localytics.commailto:rmarsc...@localytics.com] Sent: Monday, April 27, 2015 12:20 PM To: Ulanov, Alexander Cc: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Group by order by Hi, that error seems to indicate the basic query is not properly expressed. If you group by just ID, then that means it would need to aggregate all the time values into one value per ID, so you can't sort by it. Thus it tries to suggest an aggregate function for time so you can have 1 value per ID and properly sort it. On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote: Hi, Could you suggest what is the best way to do “group by x order by y” in Spark? When I try to perform it with Spark SQL I get the following error (Spark 1.3): val results = sqlContext.sql(select * from sample group by id order by time) org.apache.spark.sql.AnalysisException: expression 'time' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37) Is there a way to do it with just RDD? Best regards, Alexander
Group by order by
Hi, Could you suggest what is the best way to do group by x order by y in Spark? When I try to perform it with Spark SQL I get the following error (Spark 1.3): val results = sqlContext.sql(select * from sample group by id order by time) org.apache.spark.sql.AnalysisException: expression 'time' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37) Is there a way to do it with just RDD? Best regards, Alexander
Re: Group by order by
Hi, that error seems to indicate the basic query is not properly expressed. If you group by just ID, then that means it would need to aggregate all the time values into one value per ID, so you can't sort by it. Thus it tries to suggest an aggregate function for time so you can have 1 value per ID and properly sort it. On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Could you suggest what is the best way to do “group by x order by y” in Spark? When I try to perform it with Spark SQL I get the following error (Spark 1.3): val results = sqlContext.sql(select * from sample group by id order by time) org.apache.spark.sql.AnalysisException: expression 'time' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37) Is there a way to do it with just RDD? Best regards, Alexander
RE: Group by order by
Hi Richard, There are several values of time per id. Is there a way to perform group by id and sort by time in Spark? Best regards, Alexander From: Richard Marscher [mailto:rmarsc...@localytics.com] Sent: Monday, April 27, 2015 12:20 PM To: Ulanov, Alexander Cc: user@spark.apache.org Subject: Re: Group by order by Hi, that error seems to indicate the basic query is not properly expressed. If you group by just ID, then that means it would need to aggregate all the time values into one value per ID, so you can't sort by it. Thus it tries to suggest an aggregate function for time so you can have 1 value per ID and properly sort it. On Mon, Apr 27, 2015 at 3:07 PM, Ulanov, Alexander alexander.ula...@hp.commailto:alexander.ula...@hp.com wrote: Hi, Could you suggest what is the best way to do “group by x order by y” in Spark? When I try to perform it with Spark SQL I get the following error (Spark 1.3): val results = sqlContext.sql(select * from sample group by id order by time) org.apache.spark.sql.AnalysisException: expression 'time' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:37) Is there a way to do it with just RDD? Best regards, Alexander
RE: group by order by fails
Thanks Michael! It worked. Some how my mails are not getting accepted by spark user mailing list. :( From: mich...@databricks.com Date: Thu, 26 Feb 2015 17:49:43 -0800 Subject: Re: group by order by fails To: tridib.sama...@live.com CC: ak...@sigmoidanalytics.com; user@spark.apache.org Assign an alias to the count in the select clause and use that alias in the order by clause. On Wed, Feb 25, 2015 at 11:17 PM, Tridib Samanta tridib.sama...@live.com wrote: Actually I just realized , I am using 1.2.0. Thanks Tridib Date: Thu, 26 Feb 2015 12:37:06 +0530 Subject: Re: group by order by fails From: ak...@sigmoidanalytics.com To: tridib.sama...@live.com CC: user@spark.apache.org Which version of spark are you having? It seems there was a similar Jira https://issues.apache.org/jira/browse/SPARK-2474ThanksBest Regards On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote: Hi, I need to find top 10 most selling samples. So query looks like: select s.name, count(s.name) from sample s group by s.name order by count(s.name) This query fails with following error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree: Sort [COUNT(name#0) ASC], true Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206) at org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) at com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121) Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala
Re: group by order by fails
String query = select s.name, count(s.name) as tally from sample s group by s.name order by tally; -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/group-by-order-by-fails-tp21815p21854.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: group by order by fails
Assign an alias to the count in the select clause and use that alias in the order by clause. On Wed, Feb 25, 2015 at 11:17 PM, Tridib Samanta tridib.sama...@live.com wrote: Actually I just realized , I am using 1.2.0. Thanks Tridib -- Date: Thu, 26 Feb 2015 12:37:06 +0530 Subject: Re: group by order by fails From: ak...@sigmoidanalytics.com To: tridib.sama...@live.com CC: user@spark.apache.org Which version of spark are you having? It seems there was a similar Jira https://issues.apache.org/jira/browse/SPARK-2474 Thanks Best Regards On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote: Hi, I need to find top 10 most selling samples. So query looks like: select s.name, count(s.name) from sample s group by s.name order by count(s.name) This query fails with following error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree: Sort [COUNT(name#0) ASC], true Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206) at org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) at com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121) Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47) at org.apache.spark.sql.execution.Sort$$anonfun
Re: group by order by fails
Which version of spark are you having? It seems there was a similar Jira https://issues.apache.org/jira/browse/SPARK-2474 Thanks Best Regards On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote: Hi, I need to find top 10 most selling samples. So query looks like: select s.name, count(s.name) from sample s group by s.name order by count(s.name) This query fails with following error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree: Sort [COUNT(name#0) ASC], true Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206) at org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) at com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121) Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47) at org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207) at org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46) ... 37 more Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: No function to evaluate expression. type: Count, tree: COUNT(input[2
RE: group by order by fails
Actually I just realized , I am using 1.2.0. Thanks Tridib Date: Thu, 26 Feb 2015 12:37:06 +0530 Subject: Re: group by order by fails From: ak...@sigmoidanalytics.com To: tridib.sama...@live.com CC: user@spark.apache.org Which version of spark are you having? It seems there was a similar Jira https://issues.apache.org/jira/browse/SPARK-2474ThanksBest Regards On Thu, Feb 26, 2015 at 12:03 PM, tridib tridib.sama...@live.com wrote: Hi, I need to find top 10 most selling samples. So query looks like: select s.name, count(s.name) from sample s group by s.name order by count(s.name) This query fails with following error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree: Sort [COUNT(name#0) ASC], true Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206) at org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) at com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121) Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47) at org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207) at org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46
group by order by fails
Hi, I need to find top 10 most selling samples. So query looks like: select s.name, count(s.name) from sample s group by s.name order by count(s.name) This query fails with following error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: sort, tree: Sort [COUNT(name#0) ASC], true Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Sort.execute(basicOperators.scala:206) at org.apache.spark.sql.execution.Project.execute(basicOperators.scala:43) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) at com.edifecs.platform.df.analytics.spark.domain.dao.OrderByTest.testGetVisitDistributionByPrimaryDx(OrderByTest.java:48) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121) Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange (RangePartitioning [COUNT(name#0) ASC], 200) Aggregate false, [name#0], [name#0 AS name#1,Coalesce(SUM(PartialCount#4L),0) AS count#2L,name#0] Exchange (HashPartitioning [name#0], 200) Aggregate true, [name#0], [name#0,COUNT(name#0) AS PartialCount#4L] PhysicalRDD [name#0], MapPartitionsRDD[1] at mapPartitions at JavaSQLContext.scala:102 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.execution.Exchange.execute(Exchange.scala:47) at org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207) at org.apache.spark.sql.execution.Sort$$anonfun$execute$3.apply(basicOperators.scala:207) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46) ... 37 more Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: No function to evaluate expression. type: Count, tree: COUNT(input[2]) at org.apache.spark.sql.catalyst.expressions.AggregateExpression.eval(aggregates.scala:41) at org.apache.spark.sql.catalyst.expressions.RowOrdering.compare(Row.scala:250) at org.apache.spark.sql.catalyst.expressions.RowOrdering.compare(Row.scala:242) at scala.math.Ordering$$anon$5.compare(Ordering.scala:122