Also are the other aggregate functions to be treated as bugs or not?

On Wed, Oct 28, 2015 at 4:08 PM, Shagun Sodhani <sshagunsodh...@gmail.com>
wrote:

> Wouldnt it be:
>
> +    expression[Max]("avg"),
>
> On Wed, Oct 28, 2015 at 4:06 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Since there is already Average, the simplest change is the following:
>>
>> $ git diff
>> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>> diff --git
>> a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>> b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Functi
>> index 3dce6c1..920f95b 100644
>> ---
>> a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>> +++
>> b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>> @@ -184,6 +184,7 @@ object FunctionRegistry {
>>      expression[Last]("last"),
>>      expression[Last]("last_value"),
>>      expression[Max]("max"),
>> +    expression[Average]("mean"),
>>      expression[Min]("min"),
>>      expression[Stddev]("stddev"),
>>      expression[StddevPop]("stddev_pop"),
>>
>> FYI
>>
>> On Wed, Oct 28, 2015 at 2:07 AM, Shagun Sodhani <sshagunsodh...@gmail.com
>> > wrote:
>>
>>> I tried adding the aggregate functions in the registry and they work,
>>> other than mean, for which Ted has forwarded some code changes. I will try
>>> out those changes and update the status here.
>>>
>>> On Wed, Oct 28, 2015 at 9:03 AM, Shagun Sodhani <
>>> sshagunsodh...@gmail.com> wrote:
>>>
>>>> Yup avg works good. So we have alternate functions to use in place on
>>>> the functions pointed out earlier. But my point is that are those original
>>>> aggregate functions not supposed to be used or I am using them in the wrong
>>>> way or is it a bug as I asked in my first mail.
>>>>
>>>> On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> Have you tried using avg in place of mean ?
>>>>>
>>>>> (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j,
>>>>> s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") }
>>>>>     sqlContext.sql("""
>>>>>     CREATE TEMPORARY TABLE partitionedParquet
>>>>>     USING org.apache.spark.sql.parquet
>>>>>     OPTIONS (
>>>>>       path '/tmp/partitioned'
>>>>>     )""")
>>>>> sqlContext.sql("""select avg(a) from partitionedParquet""").show()
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani <
>>>>> sshagunsodh...@gmail.com> wrote:
>>>>>
>>>>>> So I tried @Reynold's suggestion. I could get countDistinct and
>>>>>> sumDistinct running but  mean and approxCountDistinct do not work.
>>>>>> (I guess I am using the wrong syntax for approxCountDistinct) For mean, I
>>>>>> think the registry entry is missing. Can someone clarify that as well?
>>>>>>
>>>>>> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <
>>>>>> sshagunsodh...@gmail.com> wrote:
>>>>>>
>>>>>>> Will try in a while when I get back. I assume this applies to all
>>>>>>> functions other than mean. Also countDistinct is defined along with all
>>>>>>> other SQL functions. So I don't get "distinct is not part of function 
>>>>>>> name"
>>>>>>> part.
>>>>>>> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote:
>>>>>>>
>>>>>>>> Try
>>>>>>>>
>>>>>>>> count(distinct columnane)
>>>>>>>>
>>>>>>>> In SQL distinct is not part of the function name.
>>>>>>>>
>>>>>>>> On Tuesday, October 27, 2015, Shagun Sodhani <
>>>>>>>> sshagunsodh...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Oops seems I made a mistake. The error message is : Exception in
>>>>>>>>> thread "main" org.apache.spark.sql.AnalysisException: undefined 
>>>>>>>>> function
>>>>>>>>> countDistinct
>>>>>>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi! I was trying out some aggregate  functions in SparkSql and I
>>>>>>>>>> noticed that certain aggregate operators are not working. This 
>>>>>>>>>> includes:
>>>>>>>>>>
>>>>>>>>>> approxCountDistinct
>>>>>>>>>> countDistinct
>>>>>>>>>> mean
>>>>>>>>>> sumDistinct
>>>>>>>>>>
>>>>>>>>>> For example using countDistinct results in an error saying
>>>>>>>>>> *Exception in thread "main"
>>>>>>>>>> org.apache.spark.sql.AnalysisException: undefined function cosh;*
>>>>>>>>>>
>>>>>>>>>> I had a similar issue with cosh operator
>>>>>>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html>
>>>>>>>>>> as well some time back and it turned out that it was not registered 
>>>>>>>>>> in the
>>>>>>>>>> registry:
>>>>>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *I* *think it is the same issue again and would be glad to send
>>>>>>>>>> over a PR if someone can confirm if this is an actual bug and not 
>>>>>>>>>> some
>>>>>>>>>> mistake on my part.*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM
>>>>>>>>>> `table`
>>>>>>>>>> Spark Version: 10.4
>>>>>>>>>> SparkSql Version: 1.5.1
>>>>>>>>>>
>>>>>>>>>> I am using the standard example of (name, age) schema (though I
>>>>>>>>>> am setting age as Double and not Int as I am trying out maths 
>>>>>>>>>> functions).
>>>>>>>>>>
>>>>>>>>>> The entire error stack can be found here
>>>>>>>>>> <http://pastebin.com/G6YzQXnn>.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to