Dear list, I am trying to calculate sum and count on the same column:
user_id_books_clicks = (sqlContext.read.parquet('hdfs:///projects/kaggle-expedia/input/train.parquet') .groupby('user_id') .agg({'is_booking':'count', 'is_booking':'sum'}) .orderBy(fn.desc('count(user_id)')) .cache() ) If I do it like that, it only gives me one (last) aggregate - sum(is_booking) But if I change to .agg({'user_id':'count', 'is_booking':'sum'}) - it gives me both. I am on 1.6.1. Is it fixed in 2.+? Or should I report it to JIRA?