Hi all, I am very new to spark and yarn.
I am running a BroadcastTest example application using spark 1.6.0 and Hadoop/Yarn 2.7.1. in a 5 nodes cluster. I configured my configuration files according to https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation 1. copy ./spark-1.6.0/network/yarn/target/scala-2.10/spark-1.6.0-yarn-shuffle.jar to /hadoop-2.7.1/share/hadoop/yarn/lib/ 2. yarn-site.xml is like this http://www.owlnet.rice.edu/~xs6/yarn-site.xml 3. spark-defaults.conf is like this http://www.owlnet.rice.edu/~xs6/spark-defaults.conf 4. spark-env.sh is like this http://www.owlnet.rice.edu/~xs6/spark-env.sh 5. the command I use to submit spark application is: ./bin/spark-submit --class org.apache.spark.examples.BroadcastTest --master yarn --deploy-mode cluster ./examples/target/spark-examples_2.10-1.6.0.jar 1 10000000 Http However, the job is stuck at RUNNING status, and by looking at the log, I found that the executor is failed/cancelled frequently... Here is the log output http://www.owlnet.rice.edu/~xs6/stderr It shows something like 16/03/02 02:07:35 WARN yarn.YarnAllocator: Container marked as failed: container_1456905762620_0002_01_000002 on host: bold-x.rice.edu. Exit status: 1. Diagnostics: Exception from container-launch. Is there anybody know what is the problem here? Best, Xiaoye