Re: Confusing argument of sql.functions.count

2016-06-22 Thread Xinh Huynh
I can see how the linked documentation could be confusing: "Aggregate function: returns the number of items in a group." What it doesn't mention is that it returns the number of rows for which the given column is non-null. Xinh On Wed, Jun 22, 2016 at 9:31 AM, Takeshi Yamamuro

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Jakub Dubovsky
Nice reactions. My comments: @Ted.Yu: I see now that count(*) works for what I want @Takeshi: I understand this is the syntax but it was not clear to me what this $"b" column will be used for... My line of thinking was this: I started with 1) someDF.groupBy("colA").count() and then I realized

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Takeshi Yamamuro
Hi, An argument for `functions.count` is needed for per-column counting; df.groupBy($"a").agg(count($"b")) // maropu On Thu, Jun 23, 2016 at 1:27 AM, Ted Yu wrote: > See the first example in: > > http://www.w3schools.com/sql/sql_func_count.asp > > On Wed, Jun 22, 2016 at

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Ted Yu
See the first example in: http://www.w3schools.com/sql/sql_func_count.asp On Wed, Jun 22, 2016 at 9:21 AM, Jakub Dubovsky < spark.dubovsky.ja...@gmail.com> wrote: > Hey Ted, > > thanks for reacting. > > I am refering to both of them. They both take column as parameter > regardless of its type.

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Jakub Dubovsky
Hey Ted, thanks for reacting. I am refering to both of them. They both take column as parameter regardless of its type. Intuition here is that count should take no parameter. Or am I missing something? Jakub On Wed, Jun 22, 2016 at 6:19 PM, Ted Yu wrote: > Are you

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Ted Yu
Are you referring to the following method in sql/core/src/main/scala/org/apache/spark/sql/functions.scala : def count(e: Column): Column = withAggregateFunction { Did you notice this method ? def count(columnName: String): TypedColumn[Any, Long] = On Wed, Jun 22, 2016 at 9:06 AM, Jakub

Confusing argument of sql.functions.count

2016-06-22 Thread Jakub Dubovsky
Hey sparkers, an aggregate function *count* in *org.apache.spark.sql.functions* package takes a *column* as an argument. Is this needed for something? I find it confusing that I need to supply a column there. It feels like it might be distinct count or something. This can be seen in latest