Also are the other aggregate functions to be treated as bugs or not? On Wed, Oct 28, 2015 at 4:08 PM, Shagun Sodhani <sshagunsodh...@gmail.com> wrote:
> Wouldnt it be: > > + expression[Max]("avg"), > > On Wed, Oct 28, 2015 at 4:06 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Since there is already Average, the simplest change is the following: >> >> $ git diff >> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >> diff --git >> a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >> b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Functi >> index 3dce6c1..920f95b 100644 >> --- >> a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >> +++ >> b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >> @@ -184,6 +184,7 @@ object FunctionRegistry { >> expression[Last]("last"), >> expression[Last]("last_value"), >> expression[Max]("max"), >> + expression[Average]("mean"), >> expression[Min]("min"), >> expression[Stddev]("stddev"), >> expression[StddevPop]("stddev_pop"), >> >> FYI >> >> On Wed, Oct 28, 2015 at 2:07 AM, Shagun Sodhani <sshagunsodh...@gmail.com >> > wrote: >> >>> I tried adding the aggregate functions in the registry and they work, >>> other than mean, for which Ted has forwarded some code changes. I will try >>> out those changes and update the status here. >>> >>> On Wed, Oct 28, 2015 at 9:03 AM, Shagun Sodhani < >>> sshagunsodh...@gmail.com> wrote: >>> >>>> Yup avg works good. So we have alternate functions to use in place on >>>> the functions pointed out earlier. But my point is that are those original >>>> aggregate functions not supposed to be used or I am using them in the wrong >>>> way or is it a bug as I asked in my first mail. >>>> >>>> On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Have you tried using avg in place of mean ? >>>>> >>>>> (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j, >>>>> s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } >>>>> sqlContext.sql(""" >>>>> CREATE TEMPORARY TABLE partitionedParquet >>>>> USING org.apache.spark.sql.parquet >>>>> OPTIONS ( >>>>> path '/tmp/partitioned' >>>>> )""") >>>>> sqlContext.sql("""select avg(a) from partitionedParquet""").show() >>>>> >>>>> Cheers >>>>> >>>>> On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani < >>>>> sshagunsodh...@gmail.com> wrote: >>>>> >>>>>> So I tried @Reynold's suggestion. I could get countDistinct and >>>>>> sumDistinct running but mean and approxCountDistinct do not work. >>>>>> (I guess I am using the wrong syntax for approxCountDistinct) For mean, I >>>>>> think the registry entry is missing. Can someone clarify that as well? >>>>>> >>>>>> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani < >>>>>> sshagunsodh...@gmail.com> wrote: >>>>>> >>>>>>> Will try in a while when I get back. I assume this applies to all >>>>>>> functions other than mean. Also countDistinct is defined along with all >>>>>>> other SQL functions. So I don't get "distinct is not part of function >>>>>>> name" >>>>>>> part. >>>>>>> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote: >>>>>>> >>>>>>>> Try >>>>>>>> >>>>>>>> count(distinct columnane) >>>>>>>> >>>>>>>> In SQL distinct is not part of the function name. >>>>>>>> >>>>>>>> On Tuesday, October 27, 2015, Shagun Sodhani < >>>>>>>> sshagunsodh...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Oops seems I made a mistake. The error message is : Exception in >>>>>>>>> thread "main" org.apache.spark.sql.AnalysisException: undefined >>>>>>>>> function >>>>>>>>> countDistinct >>>>>>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi! I was trying out some aggregate functions in SparkSql and I >>>>>>>>>> noticed that certain aggregate operators are not working. This >>>>>>>>>> includes: >>>>>>>>>> >>>>>>>>>> approxCountDistinct >>>>>>>>>> countDistinct >>>>>>>>>> mean >>>>>>>>>> sumDistinct >>>>>>>>>> >>>>>>>>>> For example using countDistinct results in an error saying >>>>>>>>>> *Exception in thread "main" >>>>>>>>>> org.apache.spark.sql.AnalysisException: undefined function cosh;* >>>>>>>>>> >>>>>>>>>> I had a similar issue with cosh operator >>>>>>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html> >>>>>>>>>> as well some time back and it turned out that it was not registered >>>>>>>>>> in the >>>>>>>>>> registry: >>>>>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *I* *think it is the same issue again and would be glad to send >>>>>>>>>> over a PR if someone can confirm if this is an actual bug and not >>>>>>>>>> some >>>>>>>>>> mistake on my part.* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM >>>>>>>>>> `table` >>>>>>>>>> Spark Version: 10.4 >>>>>>>>>> SparkSql Version: 1.5.1 >>>>>>>>>> >>>>>>>>>> I am using the standard example of (name, age) schema (though I >>>>>>>>>> am setting age as Double and not Int as I am trying out maths >>>>>>>>>> functions). >>>>>>>>>> >>>>>>>>>> The entire error stack can be found here >>>>>>>>>> <http://pastebin.com/G6YzQXnn>. >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>>> >>> >> >