Hi All,

[PySpark 2.3, python 2.7]

I would like to achieve something like this, could you please suggest best
way to implement (perhaps highlight pros & cons of the approach in terms of
performance)?

df = df.groupby('grp_col').agg(max(date).alias('max_date'), count(when
col('file_date') == col('max_date')))

Please note 'max_date' is a result of aggregate function max inside the
group by agg. I can definitely use multiple groupbys to achieve this but is
there a better way? better performance wise may be?

Appreciate your help!

-- 
Regards,

Rishi Shah

Reply via email to