So I tried @Reynold's suggestion. I could get countDistinct and sumDistinct
running but  mean and approxCountDistinct do not work. (I guess I am using
the wrong syntax for approxCountDistinct) For mean, I think the
registry entry is missing. Can someone clarify that as well?

On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <sshagunsodh...@gmail.com>
wrote:

> Will try in a while when I get back. I assume this applies to all
> functions other than mean. Also countDistinct is defined along with all
> other SQL functions. So I don't get "distinct is not part of function name"
> part.
> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote:
>
>> Try
>>
>> count(distinct columnane)
>>
>> In SQL distinct is not part of the function name.
>>
>> On Tuesday, October 27, 2015, Shagun Sodhani <sshagunsodh...@gmail.com>
>> wrote:
>>
>>> Oops seems I made a mistake. The error message is : Exception in thread
>>> "main" org.apache.spark.sql.AnalysisException: undefined function
>>> countDistinct
>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com> wrote:
>>>
>>>> Hi! I was trying out some aggregate  functions in SparkSql and I
>>>> noticed that certain aggregate operators are not working. This includes:
>>>>
>>>> approxCountDistinct
>>>> countDistinct
>>>> mean
>>>> sumDistinct
>>>>
>>>> For example using countDistinct results in an error saying
>>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException:
>>>> undefined function cosh;*
>>>>
>>>> I had a similar issue with cosh operator
>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html>
>>>> as well some time back and it turned out that it was not registered in the
>>>> registry:
>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>>>>
>>>>
>>>> *I* *think it is the same issue again and would be glad to send over a
>>>> PR if someone can confirm if this is an actual bug and not some mistake on
>>>> my part.*
>>>>
>>>>
>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table`
>>>> Spark Version: 10.4
>>>> SparkSql Version: 1.5.1
>>>>
>>>> I am using the standard example of (name, age) schema (though I am
>>>> setting age as Double and not Int as I am trying out maths functions).
>>>>
>>>> The entire error stack can be found here <http://pastebin.com/G6YzQXnn>
>>>> .
>>>>
>>>> Thanks!
>>>>
>>>

Reply via email to