I tried adding the aggregate functions in the registry and they work, other than mean, for which Ted has forwarded some code changes. I will try out those changes and update the status here.
On Wed, Oct 28, 2015 at 9:03 AM, Shagun Sodhani <sshagunsodh...@gmail.com> wrote: > Yup avg works good. So we have alternate functions to use in place on the > functions pointed out earlier. But my point is that are those original > aggregate functions not supposed to be used or I am using them in the wrong > way or is it a bug as I asked in my first mail. > > On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Have you tried using avg in place of mean ? >> >> (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j, >> s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } >> sqlContext.sql(""" >> CREATE TEMPORARY TABLE partitionedParquet >> USING org.apache.spark.sql.parquet >> OPTIONS ( >> path '/tmp/partitioned' >> )""") >> sqlContext.sql("""select avg(a) from partitionedParquet""").show() >> >> Cheers >> >> On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani < >> sshagunsodh...@gmail.com> wrote: >> >>> So I tried @Reynold's suggestion. I could get countDistinct and >>> sumDistinct running but mean and approxCountDistinct do not work. (I >>> guess I am using the wrong syntax for approxCountDistinct) For mean, I >>> think the registry entry is missing. Can someone clarify that as well? >>> >>> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani < >>> sshagunsodh...@gmail.com> wrote: >>> >>>> Will try in a while when I get back. I assume this applies to all >>>> functions other than mean. Also countDistinct is defined along with all >>>> other SQL functions. So I don't get "distinct is not part of function name" >>>> part. >>>> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote: >>>> >>>>> Try >>>>> >>>>> count(distinct columnane) >>>>> >>>>> In SQL distinct is not part of the function name. >>>>> >>>>> On Tuesday, October 27, 2015, Shagun Sodhani <sshagunsodh...@gmail.com> >>>>> wrote: >>>>> >>>>>> Oops seems I made a mistake. The error message is : Exception in >>>>>> thread "main" org.apache.spark.sql.AnalysisException: undefined function >>>>>> countDistinct >>>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi! I was trying out some aggregate functions in SparkSql and I >>>>>>> noticed that certain aggregate operators are not working. This includes: >>>>>>> >>>>>>> approxCountDistinct >>>>>>> countDistinct >>>>>>> mean >>>>>>> sumDistinct >>>>>>> >>>>>>> For example using countDistinct results in an error saying >>>>>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException: >>>>>>> undefined function cosh;* >>>>>>> >>>>>>> I had a similar issue with cosh operator >>>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html> >>>>>>> as well some time back and it turned out that it was not registered in >>>>>>> the >>>>>>> registry: >>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >>>>>>> >>>>>>> >>>>>>> *I* *think it is the same issue again and would be glad to send >>>>>>> over a PR if someone can confirm if this is an actual bug and not some >>>>>>> mistake on my part.* >>>>>>> >>>>>>> >>>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table` >>>>>>> Spark Version: 10.4 >>>>>>> SparkSql Version: 1.5.1 >>>>>>> >>>>>>> I am using the standard example of (name, age) schema (though I am >>>>>>> setting age as Double and not Int as I am trying out maths functions). >>>>>>> >>>>>>> The entire error stack can be found here >>>>>>> <http://pastebin.com/G6YzQXnn>. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>> >>> >> >