Could you try yarn-cluster mode ? Make sure your cluster nodes can reach
your client machine and no firewall.

On Wed, Mar 16, 2016 at 10:54 AM, <sychu...@tsmc.com> wrote:

>
> Hi all,
>
> We're trying to submit a python file, pi.py in this case, to yarn from java
> code but this kept failing(1.6.0).
> It seems the AM uses the arguments we passed to pi.py as the driver IP
> address.
> Could someone help me figuring out how to get the job done. Thanks in
> advance.
>
> The java code looks like below:
>
>           String[] args = new String[]{
>                 "--name",
>                 "Test Submit Python To Yarn From Java",
>                 "--primary-py-file",
>                 SPARK_HOME + "/examples/src/main/python/pi.py",
>                 "--num-executors",
>                 "5",
>                 "--driver-memory",
>                 "512m",
>                 "--executor-memory",
>                 "512m",
>                 "--executor-cores",
>                 "1",
>                 "--arg",
>                 args[0]
>             };
>
>             Configuration config = new Configuration();
>             SparkConf sparkConf = new SparkConf();
>             ClientArguments clientArgs = new ClientArguments(args,
> sparkConf
> );
>             Client client = new Client(clientArgs, config, sparkConf);
>             client.run();
>
>
> The jar is submitted by spark-submit::
> ./bin/spark-submit --class SubmitPyYARNJobFromJava --master yarn-client
> TestSubmitPythonFromJava.jar 10
>
>
> The job submit to yarn just stay in ACCEPTED before it failed
> What I can't figure out is, yarn log shows AM couldn't reach the driver at
> 10:0, which is my argument passed to pi.py
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/data/1/yarn/local/usercache/root/filecache/2084/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
>
> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/03/15 17:54:44 INFO yarn.ApplicationMaster: Registered signal handlers
> for [TERM, HUP, INT]
> 16/03/15 17:54:45 INFO yarn.ApplicationMaster: ApplicationAttemptId:
> appattempt_1458023046377_0499_000001
> 16/03/15 17:54:45 INFO spark.SecurityManager: Changing view acls to:
> yarn,root
> 16/03/15 17:54:45 INFO spark.SecurityManager: Changing modify acls to:
> yarn,root
> 16/03/15 17:54:45 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions: Set
> (yarn, root); users with modify permissions: Set(yarn, root)
> 16/03/15 17:54:45 INFO yarn.ApplicationMaster: Waiting for Spark driver to
> be reachable.
> 16/03/15 17:54:45 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 10:0, retrying ...
> 16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 10:0, retrying ...
> 16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 10:0, retrying ...
> .........
> 16/03/15 17:56:25 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 10:0, retrying ...
> 16/03/15 17:56:26 ERROR yarn.ApplicationMaster: Uncaught exception:
> org.apache.spark.SparkException: Failed to connect to driver!
>                  at
> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver
> (ApplicationMaster.scala:484)
>                  at
> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher
> (ApplicationMaster.scala:345)
>                  at org.apache.spark.deploy.yarn.ApplicationMaster.run
> (ApplicationMaster.scala:187)
>                  at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun
> $main$1.apply$mcV$sp(ApplicationMaster.scala:653)
>                  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run
> (SparkHadoopUtil.scala:69)
>                  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run
> (SparkHadoopUtil.scala:68)
>                  at java.security.AccessController.doPrivileged(Native
> Method)
>                  at javax.security.auth.Subject.doAs(Subject.java:422)
>                  at org.apache.hadoop.security.UserGroupInformation.doAs
> (UserGroupInformation.java:1628)
>                  at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser
> (SparkHadoopUtil.scala:68)
>                  at org.apache.spark.deploy.yarn.ApplicationMaster$.main
> (ApplicationMaster.scala:651)
>                  at org.apache.spark.deploy.yarn.ExecutorLauncher$.main
> (ApplicationMaster.scala:674)
>                  at org.apache.spark.deploy.yarn.ExecutorLauncher.main
> (ApplicationMaster.scala)
> 16/03/15 17:56:26 INFO yarn.ApplicationMaster: Final app status: FAILED,
> exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException:
> Failed to connect to driver!)
> 16/03/15 17:56:26 INFO util.ShutdownHookManager: Shutdown hook called
>
> Best regards,
>
> S.Y. Chung 鍾學毅
> F14MITD
> Taiwan Semiconductor Manufacturing Company, Ltd.
> Tel: 06-5056688 Ext: 734-6325
>
>  ---------------------------------------------------------------------------
>                                                          TSMC PROPERTY
>  This email communication (and any attachments) is proprietary information
>  for the sole use of its
>  intended recipient. Any unauthorized review, use or distribution by anyone
>  other than the intended
>  recipient is strictly prohibited.  If you are not the intended recipient,
>  please notify the sender by
>  replying to this email, and then delete this email and any copies of it
>  immediately. Thank you.
>
>  ---------------------------------------------------------------------------
>
>
>


-- 
Best Regards

Jeff Zhang

Reply via email to