Re: Get complete row with latest timestamp after a groupBy?

2015-11-06 Thread bghit
You are trying to get the top-k most recent records for each user (k=1 in your case). You should avoid using groupBy because it's an expensive operation that will hurt performance in Spark -- check out [1] for more details. Instead, you can use the combineByKey function with a custom combiner

Re: Get complete row with latest timestamp after a groupBy?

2015-11-06 Thread bghit
I asked the same question a few days ago, but I did not receive any answer. You may want to look into UDAFs for that. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Get-complete-row-with-latest-timestamp-after-a-groupBy-tp25304p25308.html Sent from the