Thanks Michael.
If I understand correctly, this is the expected behaviour then, and there
is no order guarantee within grouped DataFrame. I'll comment on that blog
post to report that its message is inaccurate.
Your first suggestion is to use window functions. As I understand, window
functions
You need to use window functions to get this kind of behavior. Or use max
and a struct (
http://stackoverflow.com/questions/13523049/hive-sql-find-the-latest-record)
On Thu, Dec 17, 2015 at 11:55 PM, Timothée Carayol <
timothee.cara...@gmail.com> wrote:
> Hi all,
>
> I tried to do something
Hi all,
I tried to do something like the following in Spark
df.orderBy('col1, 'col2).groupBy('col1).agg(first('col3))
I was hoping to get, within each col1 value, the value for col3 that
corresponds to the highest value for col2 within that col1 group. This only
works if the order on col2 is