I tried adding the aggregate functions in the registry and they work, other
than mean, for which Ted has forwarded some code changes. I will try out
those changes and update the status here.

On Wed, Oct 28, 2015 at 9:03 AM, Shagun Sodhani <sshagunsodh...@gmail.com>
wrote:

> Yup avg works good. So we have alternate functions to use in place on the
> functions pointed out earlier. But my point is that are those original
> aggregate functions not supposed to be used or I am using them in the wrong
> way or is it a bug as I asked in my first mail.
>
> On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Have you tried using avg in place of mean ?
>>
>> (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j,
>> s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") }
>>     sqlContext.sql("""
>>     CREATE TEMPORARY TABLE partitionedParquet
>>     USING org.apache.spark.sql.parquet
>>     OPTIONS (
>>       path '/tmp/partitioned'
>>     )""")
>> sqlContext.sql("""select avg(a) from partitionedParquet""").show()
>>
>> Cheers
>>
>> On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani <
>> sshagunsodh...@gmail.com> wrote:
>>
>>> So I tried @Reynold's suggestion. I could get countDistinct and
>>> sumDistinct running but  mean and approxCountDistinct do not work. (I
>>> guess I am using the wrong syntax for approxCountDistinct) For mean, I
>>> think the registry entry is missing. Can someone clarify that as well?
>>>
>>> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <
>>> sshagunsodh...@gmail.com> wrote:
>>>
>>>> Will try in a while when I get back. I assume this applies to all
>>>> functions other than mean. Also countDistinct is defined along with all
>>>> other SQL functions. So I don't get "distinct is not part of function name"
>>>> part.
>>>> On 27 Oct 2015 19:58, "Reynold Xin" <r...@databricks.com> wrote:
>>>>
>>>>> Try
>>>>>
>>>>> count(distinct columnane)
>>>>>
>>>>> In SQL distinct is not part of the function name.
>>>>>
>>>>> On Tuesday, October 27, 2015, Shagun Sodhani <sshagunsodh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Oops seems I made a mistake. The error message is : Exception in
>>>>>> thread "main" org.apache.spark.sql.AnalysisException: undefined function
>>>>>> countDistinct
>>>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi! I was trying out some aggregate  functions in SparkSql and I
>>>>>>> noticed that certain aggregate operators are not working. This includes:
>>>>>>>
>>>>>>> approxCountDistinct
>>>>>>> countDistinct
>>>>>>> mean
>>>>>>> sumDistinct
>>>>>>>
>>>>>>> For example using countDistinct results in an error saying
>>>>>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException:
>>>>>>> undefined function cosh;*
>>>>>>>
>>>>>>> I had a similar issue with cosh operator
>>>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html>
>>>>>>> as well some time back and it turned out that it was not registered in 
>>>>>>> the
>>>>>>> registry:
>>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>>>>>>>
>>>>>>>
>>>>>>> *I* *think it is the same issue again and would be glad to send
>>>>>>> over a PR if someone can confirm if this is an actual bug and not some
>>>>>>> mistake on my part.*
>>>>>>>
>>>>>>>
>>>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table`
>>>>>>> Spark Version: 10.4
>>>>>>> SparkSql Version: 1.5.1
>>>>>>>
>>>>>>> I am using the standard example of (name, age) schema (though I am
>>>>>>> setting age as Double and not Int as I am trying out maths functions).
>>>>>>>
>>>>>>> The entire error stack can be found here
>>>>>>> <http://pastebin.com/G6YzQXnn>.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>
>>>
>>
>

Reply via email to