Could you try yarn-cluster mode ? Make sure your cluster nodes can reach your client machine and no firewall.
On Wed, Mar 16, 2016 at 10:54 AM, <sychu...@tsmc.com> wrote: > > Hi all, > > We're trying to submit a python file, pi.py in this case, to yarn from java > code but this kept failing(1.6.0). > It seems the AM uses the arguments we passed to pi.py as the driver IP > address. > Could someone help me figuring out how to get the job done. Thanks in > advance. > > The java code looks like below: > > String[] args = new String[]{ > "--name", > "Test Submit Python To Yarn From Java", > "--primary-py-file", > SPARK_HOME + "/examples/src/main/python/pi.py", > "--num-executors", > "5", > "--driver-memory", > "512m", > "--executor-memory", > "512m", > "--executor-cores", > "1", > "--arg", > args[0] > }; > > Configuration config = new Configuration(); > SparkConf sparkConf = new SparkConf(); > ClientArguments clientArgs = new ClientArguments(args, > sparkConf > ); > Client client = new Client(clientArgs, config, sparkConf); > client.run(); > > > The jar is submitted by spark-submit:: > ./bin/spark-submit --class SubmitPyYARNJobFromJava --master yarn-client > TestSubmitPythonFromJava.jar 10 > > > The job submit to yarn just stay in ACCEPTED before it failed > What I can't figure out is, yarn log shows AM couldn't reach the driver at > 10:0, which is my argument passed to pi.py > > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > > [jar:file:/data/1/yarn/local/usercache/root/filecache/2084/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > SLF4J: Found binding in > > [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 16/03/15 17:54:44 INFO yarn.ApplicationMaster: Registered signal handlers > for [TERM, HUP, INT] > 16/03/15 17:54:45 INFO yarn.ApplicationMaster: ApplicationAttemptId: > appattempt_1458023046377_0499_000001 > 16/03/15 17:54:45 INFO spark.SecurityManager: Changing view acls to: > yarn,root > 16/03/15 17:54:45 INFO spark.SecurityManager: Changing modify acls to: > yarn,root > 16/03/15 17:54:45 INFO spark.SecurityManager: SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: Set > (yarn, root); users with modify permissions: Set(yarn, root) > 16/03/15 17:54:45 INFO yarn.ApplicationMaster: Waiting for Spark driver to > be reachable. > 16/03/15 17:54:45 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 10:0, retrying ... > 16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 10:0, retrying ... > 16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 10:0, retrying ... > ......... > 16/03/15 17:56:25 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 10:0, retrying ... > 16/03/15 17:56:26 ERROR yarn.ApplicationMaster: Uncaught exception: > org.apache.spark.SparkException: Failed to connect to driver! > at > org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver > (ApplicationMaster.scala:484) > at > org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher > (ApplicationMaster.scala:345) > at org.apache.spark.deploy.yarn.ApplicationMaster.run > (ApplicationMaster.scala:187) > at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun > $main$1.apply$mcV$sp(ApplicationMaster.scala:653) > at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run > (SparkHadoopUtil.scala:69) > at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run > (SparkHadoopUtil.scala:68) > at java.security.AccessController.doPrivileged(Native > Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs > (UserGroupInformation.java:1628) > at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser > (SparkHadoopUtil.scala:68) > at org.apache.spark.deploy.yarn.ApplicationMaster$.main > (ApplicationMaster.scala:651) > at org.apache.spark.deploy.yarn.ExecutorLauncher$.main > (ApplicationMaster.scala:674) > at org.apache.spark.deploy.yarn.ExecutorLauncher.main > (ApplicationMaster.scala) > 16/03/15 17:56:26 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: > Failed to connect to driver!) > 16/03/15 17:56:26 INFO util.ShutdownHookManager: Shutdown hook called > > Best regards, > > S.Y. Chung 鍾學毅 > F14MITD > Taiwan Semiconductor Manufacturing Company, Ltd. > Tel: 06-5056688 Ext: 734-6325 > > --------------------------------------------------------------------------- > TSMC PROPERTY > This email communication (and any attachments) is proprietary information > for the sole use of its > intended recipient. Any unauthorized review, use or distribution by anyone > other than the intended > recipient is strictly prohibited. If you are not the intended recipient, > please notify the sender by > replying to this email, and then delete this email and any copies of it > immediately. Thank you. > > --------------------------------------------------------------------------- > > > -- Best Regards Jeff Zhang