Hi, An argument for `functions.count` is needed for per-column counting; df.groupBy($"a").agg(count($"b"))
// maropu On Thu, Jun 23, 2016 at 1:27 AM, Ted Yu <yuzhih...@gmail.com> wrote: > See the first example in: > > http://www.w3schools.com/sql/sql_func_count.asp > > On Wed, Jun 22, 2016 at 9:21 AM, Jakub Dubovsky < > spark.dubovsky.ja...@gmail.com> wrote: > >> Hey Ted, >> >> thanks for reacting. >> >> I am refering to both of them. They both take column as parameter >> regardless of its type. Intuition here is that count should take no >> parameter. Or am I missing something? >> >> Jakub >> >> On Wed, Jun 22, 2016 at 6:19 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Are you referring to the following method in >>> sql/core/src/main/scala/org/apache/spark/sql/functions.scala : >>> >>> def count(e: Column): Column = withAggregateFunction { >>> >>> Did you notice this method ? >>> >>> def count(columnName: String): TypedColumn[Any, Long] = >>> >>> On Wed, Jun 22, 2016 at 9:06 AM, Jakub Dubovsky < >>> spark.dubovsky.ja...@gmail.com> wrote: >>> >>>> Hey sparkers, >>>> >>>> an aggregate function *count* in *org.apache.spark.sql.functions* >>>> package takes a *column* as an argument. Is this needed for something? >>>> I find it confusing that I need to supply a column there. It feels like it >>>> might be distinct count or something. This can be seen in latest >>>> documentation >>>> <http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$> >>>> . >>>> >>>> I am considering filling this in spark bug tracker. Any opinions on >>>> this? >>>> >>>> Thanks >>>> >>>> Jakub >>>> >>>> >>> >> > -- --- Takeshi Yamamuro