Re: Exception when using some aggregate operators

Shagun Sodhani Wed, 28 Oct 2015 03:40:07 -0700

Wouldnt it be:

+    expression[Max]("avg"),


On Wed, Oct 28, 2015 at 4:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Since there is already Average, the simplest change is the following:
>
> $ git diff
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
> diff --git
> a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
> b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Functi
> index 3dce6c1..920f95b 100644
> ---
> a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
> +++
> b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
> @@ -184,6 +184,7 @@ object FunctionRegistry {
>      expression[Last]("last"),
>      expression[Last]("last_value"),
>      expression[Max]("max"),
> +    expression[Average]("mean"),
>      expression[Min]("min"),
>      expression[Stddev]("stddev"),
>      expression[StddevPop]("stddev_pop"),
>
> FYI
>
> On Wed, Oct 28, 2015 at 2:07 AM, Shagun Sodhani <sshagunsodh...@gmail.com>
> wrote:
>
>> I tried adding the aggregate functions in the registry and they work,
>> other than mean, for which Ted has forwarded some code changes. I will try
>> out those changes and update the status here.
>>
>> On Wed, Oct 28, 2015 at 9:03 AM, Shagun Sodhani <sshagunsodh...@gmail.com
>> > wrote:
>>
>>> Yup avg works good. So we have alternate functions to use in place on
>>> the functions pointed out earlier. But my point is that are those original
>>> aggregate functions not supposed to be used or I am using them in the wrong
>>> way or is it a bug as I asked in my first mail.
>>>
>>> On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> Have you tried using avg in place of mean ?
>>>>
>>>> (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j,
>>>> s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") }
>>>>     sqlContext.sql("""
>>>>     CREATE TEMPORARY TABLE partitionedParquet
>>>>     USING org.apache.spark.sql.parquet
>>>>     OPTIONS (
>>>>       path '/tmp/partitioned'
>>>>     )""")
>>>> sqlContext.sql("""select avg(a) from partitionedParquet""").show()
>>>>
>>>> Cheers
>>>>
>>>> On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani <
>>>> sshagunsodh...@gmail.com> wrote:
>>>>
>>>>> So I tried @Reynold's suggestion. I could get countDistinct and
>>>>> sumDistinct running but  mean and approxCountDistinct do not work. (I
>>>>> guess I am using the wrong syntax for approxCountDistinct) For mean, I
>>>>> think the registry entry is missing. Can someone clarify that as well?
>>>>>
>>>>> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <
>>>>> sshagunsodh...@gmail.com> wrote:
>>>>>
>>>>>> Will try in a while when I get back. I assume this applies to all
>>>>>> functions other than mean. Also countDistinct is defined along with all
>>>>>> other SQL functions. So I don't get "distinct is not part of function 
>>>>>> name"
>>>>>> part.
>>>>>> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote:
>>>>>>
>>>>>>> Try
>>>>>>>
>>>>>>> count(distinct columnane)
>>>>>>>
>>>>>>> In SQL distinct is not part of the function name.
>>>>>>>
>>>>>>> On Tuesday, October 27, 2015, Shagun Sodhani <
>>>>>>> sshagunsodh...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Oops seems I made a mistake. The error message is : Exception in
>>>>>>>> thread "main" org.apache.spark.sql.AnalysisException: undefined 
>>>>>>>> function
>>>>>>>> countDistinct
>>>>>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi! I was trying out some aggregate  functions in SparkSql and I
>>>>>>>>> noticed that certain aggregate operators are not working. This 
>>>>>>>>> includes:
>>>>>>>>>
>>>>>>>>> approxCountDistinct
>>>>>>>>> countDistinct
>>>>>>>>> mean
>>>>>>>>> sumDistinct
>>>>>>>>>
>>>>>>>>> For example using countDistinct results in an error saying
>>>>>>>>> *Exception in thread "main"
>>>>>>>>> org.apache.spark.sql.AnalysisException: undefined function cosh;*
>>>>>>>>>
>>>>>>>>> I had a similar issue with cosh operator
>>>>>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html>
>>>>>>>>> as well some time back and it turned out that it was not registered 
>>>>>>>>> in the
>>>>>>>>> registry:
>>>>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *I* *think it is the same issue again and would be glad to send
>>>>>>>>> over a PR if someone can confirm if this is an actual bug and not some
>>>>>>>>> mistake on my part.*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM
>>>>>>>>> `table`
>>>>>>>>> Spark Version: 10.4
>>>>>>>>> SparkSql Version: 1.5.1
>>>>>>>>>
>>>>>>>>> I am using the standard example of (name, age) schema (though I am
>>>>>>>>> setting age as Double and not Int as I am trying out maths functions).
>>>>>>>>>
>>>>>>>>> The entire error stack can be found here
>>>>>>>>> <http://pastebin.com/G6YzQXnn>.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Exception when using some aggregate operators

Reply via email to