Dear list,

I am trying to calculate sum and count on the same column:

user_id_books_clicks =
(sqlContext.read.parquet('hdfs:///projects/kaggle-expedia/input/train.parquet')
                                  .groupby('user_id')
                                  .agg({'is_booking':'count',
'is_booking':'sum'})
                                  .orderBy(fn.desc('count(user_id)'))
                                  .cache()
                       )

If I do it like that, it only gives me one (last) aggregate -
sum(is_booking)

But if I change to .agg({'user_id':'count', 'is_booking':'sum'})  -  it
gives me both. I am on 1.6.1. Is it fixed in 2.+? Or should I report it to
JIRA?

Reply via email to