I optimized a spark sql script but have come to the conclusion that the sql
api is not ideal as the tasks which are generated are slow and require too
much shuffling. 

So the script should be converted to rdd 
http://stackoverflow.com/q/41445571/2587904

How can I formulate this more efficient using RDD API? aggregateByKeyshould
be a good idea but is still not very clear to me how to apply it here to
substitute the window functions.

Cheers 
Georg 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Migrate-spark-sql-to-rdd-for-better-performance-tp28270.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to