You are trying to get the top-k most recent records for each user (k=1 in
your case). You should avoid using groupBy because it's an expensive
operation that will hurt performance in Spark -- check out [1] for more
details. Instead, you can use the combineByKey function with a custom
combiner
I asked the same question a few days ago, but I did not receive any answer.
You may want to look into UDAFs for that.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Get-complete-row-with-latest-timestamp-after-a-groupBy-tp25304p25308.html
Sent from the