Raghavendra, Thanks for the quick reply! I don’t think I included enough information in my question. I am hoping to get fields that are not directly part of the aggregation. Imagine a dataframe representing website views with a userID, datetime, and a webpage address. How could I find the oldest or newest webpage address that an user visited? As I understand it you can only access fields that are part of the aggregation itself.
Thanks, Impact > On Aug 21, 2015, at 11:11 AM, Raghavendra Pandey > <raghavendra.pan...@gmail.com> wrote: > > Impact, > You can group by the data and then sort it by timestamp and take max to > select the oldest value. > > On Aug 21, 2015 11:15 PM, "Impact" <nat...@skone.org > <mailto:nat...@skone.org>> wrote: > I am also looking for a way to achieve the reducebykey functionality on data > frames. In my case I need to select one particular row (the oldest, based on > a timestamp column value) by key. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Aggregate-to-array-or-slice-by-key-with-DataFrames-tp23636p24399.html > > <http://apache-spark-user-list.1001560.n3.nabble.com/Aggregate-to-array-or-slice-by-key-with-DataFrames-tp23636p24399.html> > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> >