Re: Is DataFrame.groupBy supposed to preserve order within groups?

2015-12-19 Thread Timothée Carayol
Thanks Michael. If I understand correctly, this is the expected behaviour then, and there is no order guarantee within grouped DataFrame. I'll comment on that blog post to report that its message is inaccurate. Your first suggestion is to use window functions. As I understand, window functions

Re: Is DataFrame.groupBy supposed to preserve order within groups?

2015-12-18 Thread Michael Armbrust
You need to use window functions to get this kind of behavior. Or use max and a struct ( http://stackoverflow.com/questions/13523049/hive-sql-find-the-latest-record) On Thu, Dec 17, 2015 at 11:55 PM, Timothée Carayol < timothee.cara...@gmail.com> wrote: > Hi all, > > I tried to do something

Is DataFrame.groupBy supposed to preserve order within groups?

2015-12-17 Thread Timothée Carayol
Hi all, I tried to do something like the following in Spark df.orderBy('col1, 'col2).groupBy('col1).agg(first('col3)) I was hoping to get, within each col1 value, the value for col3 that corresponds to the highest value for col2 within that col1 group. This only works if the order on col2 is