If you are running on your local, I do not see the point that you start with 32 executors with 2 cores for each.
Also, you can check the Spark web console to find out where the time spent. Also, you may want to read http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ . On Thu, Oct 20, 2016 at 6:21 PM 陈哲 <czhenj...@gmail.com> wrote: > I'm training random forest model using spark2.0 on yarn with cmd like: > $SPARK_HOME/bin/spark-submit \ > --class com.netease.risk.prediction.HelpMain --master yarn > --deploy-mode client --driver-cores 1 --num-executors 32 --executor-cores 2 > --driver-memory > 10g --executor-memory 6g \ > --conf spark.rpc.askTimeout=3000 --conf spark.rpc.lookupTimeout=3000 > --conf spark.rpc.message.maxSize=2000 --conf spark.driver.maxResultSize=0 > \ > .... > > the training process cost almost 8 hours > > And I tried training model on local machine with master(local[4]) , the > whole process still cost 8 - 9 hours. > > My question is why running on yarn doesn't save time ? is this suppose to > be distributed, with 32 executors ? And am I missing anything or what I can > do to improve this and save more time ? > > Thanks > > -- Thanks, David S.