I can see how the linked documentation could be confusing:
"Aggregate function: returns the number of items in a group."
What it doesn't mention is that it returns the number of rows for which the
given column is non-null.
Xinh
On Wed, Jun 22, 2016 at 9:31 AM, Takeshi Yamamuro
Nice reactions. My comments:
@Ted.Yu: I see now that count(*) works for what I want
@Takeshi: I understand this is the syntax but it was not clear to me what
this $"b" column will be used for...
My line of thinking was this:
I started with
1) someDF.groupBy("colA").count()
and then I realized
Hi,
An argument for `functions.count` is needed for per-column counting;
df.groupBy($"a").agg(count($"b"))
// maropu
On Thu, Jun 23, 2016 at 1:27 AM, Ted Yu wrote:
> See the first example in:
>
> http://www.w3schools.com/sql/sql_func_count.asp
>
> On Wed, Jun 22, 2016 at
See the first example in:
http://www.w3schools.com/sql/sql_func_count.asp
On Wed, Jun 22, 2016 at 9:21 AM, Jakub Dubovsky <
spark.dubovsky.ja...@gmail.com> wrote:
> Hey Ted,
>
> thanks for reacting.
>
> I am refering to both of them. They both take column as parameter
> regardless of its type.
Hey Ted,
thanks for reacting.
I am refering to both of them. They both take column as parameter
regardless of its type. Intuition here is that count should take no
parameter. Or am I missing something?
Jakub
On Wed, Jun 22, 2016 at 6:19 PM, Ted Yu wrote:
> Are you
Are you referring to the following method in
sql/core/src/main/scala/org/apache/spark/sql/functions.scala :
def count(e: Column): Column = withAggregateFunction {
Did you notice this method ?
def count(columnName: String): TypedColumn[Any, Long] =
On Wed, Jun 22, 2016 at 9:06 AM, Jakub
Hey sparkers,
an aggregate function *count* in *org.apache.spark.sql.functions* package
takes a *column* as an argument. Is this needed for something? I find it
confusing that I need to supply a column there. It feels like it might be
distinct count or something. This can be seen in latest