FYI.. the problem is that column names spark generates are not able to be
referenced within SQL or dataframe operations (ie. SUM(cool_cnt#725))..
any idea how to alias these final aggregate columns..
the syntax below doesn't make sense, but this is what i'd ideally want to
do:
Hi Guys -
Having trouble figuring out the semantics for using the alias function on
the final sum and count aggregations?
cool_summary = reviews.select(reviews.user_id,
cool_cnt(votes.cool).alias(cool_cnt)).groupBy(user_id).agg({cool_cnt:sum,*:count})
cool_summary
DataFrame[user_id: string,