Re: Need Spark(Scala) Performance Tuning tips

2017-06-09 Thread Debabrata Ghosh
Thanks a lot for your quick help ! Further, I have 2 more points: a) I heard from my colleagues that if my Scala code had RDD then I need to replace with datasets / dataframes. Why is that ? b) One of the operator saveasTextFile is taking a long time. What would be the probable cause and

Re: Need Spark(Scala) Performance Tuning tips

2017-06-09 Thread Gerard Maas
also, read the newest book of Holden on High-Performance Spark: http://shop.oreilly.com/product/0636920046967.do On Fri, Jun 9, 2017 at 5:38 PM, Alonso Isidoro Roman wrote: > a quick search on google: > > https://www.cloudera.com/documentation/enterprise/5-9- >

Re: Need Spark(Scala) Performance Tuning tips

2017-06-09 Thread Alonso Isidoro Roman
a quick search on google: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_spark_tuning.html https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ and of course,

Need Spark(Scala) Performance Tuning tips

2017-06-09 Thread Debabrata Ghosh
Hi, I need some help / guidance in performance tuning Spark code written in Scala. Can you please help. Thanks