Re: Confusing argument of sql.functions.count

Takeshi Yamamuro Wed, 22 Jun 2016 09:32:07 -0700

Hi,

An argument for `functions.count` is needed for per-column counting;
df.groupBy($"a").agg(count($"b"))


// maropu

On Thu, Jun 23, 2016 at 1:27 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> See the first example in:
>
> http://www.w3schools.com/sql/sql_func_count.asp
>
> On Wed, Jun 22, 2016 at 9:21 AM, Jakub Dubovsky <
> spark.dubovsky.ja...@gmail.com> wrote:
>
>> Hey Ted,
>>
>> thanks for reacting.
>>
>> I am refering to both of them. They both take column as parameter
>> regardless of its type. Intuition here is that count should take no
>> parameter. Or am I missing something?
>>
>> Jakub
>>
>> On Wed, Jun 22, 2016 at 6:19 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> Are you referring to the following method in
>>> sql/core/src/main/scala/org/apache/spark/sql/functions.scala :
>>>
>>>   def count(e: Column): Column = withAggregateFunction {
>>>
>>> Did you notice this method ?
>>>
>>>   def count(columnName: String): TypedColumn[Any, Long] =
>>>
>>> On Wed, Jun 22, 2016 at 9:06 AM, Jakub Dubovsky <
>>> spark.dubovsky.ja...@gmail.com> wrote:
>>>
>>>> Hey sparkers,
>>>>
>>>> an aggregate function *count* in *org.apache.spark.sql.functions*
>>>> package takes a *column* as an argument. Is this needed for something?
>>>> I find it confusing that I need to supply a column there. It feels like it
>>>> might be distinct count or something. This can be seen in latest
>>>> documentation
>>>> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$>
>>>> .
>>>>
>>>> I am considering filling this in spark bug tracker. Any opinions on
>>>> this?
>>>>
>>>> Thanks
>>>>
>>>> Jakub
>>>>
>>>>
>>>
>>
>


-- 
---
Takeshi Yamamuro

Re: Confusing argument of sql.functions.count

Reply via email to