Re: Job failed while submitting python to yarn programatically

2016-03-15 Thread sychungd
Hi Jeff,
sorry forgot to mention that
the same java code works fine if we replace the python pi.py file with the
jar version of pi example.



|->
|Jeff Zhang   |
|<zjf...@gmail.com>   |
| |
| |
| |
|2016/03/16 上午 11:05|
|->
  
>|
  | 
   |
  | 
   |
  | 
 To|
  |sychu...@tsmc.com
   |
  | 
 cc|
  |user <user@spark.apache.org> 
   |
  | 
Subject|
  |    Re: Job failed while submitting python to yarn programatically   
   |
  | 
   |
  | 
   |
  | 
   |
  | 
   |
  | 
   |
  
>|




Could you try yarn-cluster mode ? Make sure your cluster nodes can reach
your client machine and no firewall.

On Wed, Mar 16, 2016 at 10:54 AM, <sychu...@tsmc.com> wrote:

  Hi all,

  We're trying to submit a python file, pi.py in this case, to yarn from
  java
  code but this kept failing(1.6.0).
  It seems the AM uses the arguments we passed to pi.py as the driver IP
  address.
  Could someone help me figuring out how to get the job done. Thanks in
  advance.

  The java code looks like below:

            String[] args = new String[]{
                  "--name",
                  "Test Submit Python To Yarn From Java",
                  "--primary-py-file",
                  SPARK_HOME + "/examples/src/main/python/pi.py",
                  "--num-executors",
                  "5",
                  "--driver-memory",
                  "512m",
                  "--executor-memory",
                  "512m",
                  "--executor-cores",
                  "1",
                  "--arg",
                  args[0]
              };

              Configuration config = new Configuration();
              SparkConf sparkConf = new SparkConf();
              ClientArguments clientArgs = new ClientArguments(args,
  sparkConf
  );
              Client client = new Client(clientArgs, config, sparkConf);
              client.run();


  The jar is submitted by spark-submit::
  ./bin/spark-submit --class SubmitPyYARNJobFromJava --master yarn-client
  TestSubmitPythonFromJava.jar 10


  The job submit to yarn just stay in ACCEPTED before it failed
  What I can't figure out is, yarn log shows AM couldn't reach the driver
  at
  10:0, which is my argument passed to pi.py

  SLF4J: Class path contains multiple SLF4J bindings.
  SLF4J: Found binding in
  
[jar:file:/data/1/yarn/local/usercache/root/filecache/2084/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]


  SLF4J: Found binding in
  
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]


  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
  explanation.
  SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  16/03/15 17:54:44 INFO yarn.ApplicationMaster: Registered signal handlers
  for [TERM, HUP, INT]
  16/03/15 17:54:45 INFO yarn.ApplicationMaster: ApplicationAttemptId:
  appattempt_1458023046377_0499_01
  16/03/15 17:54:45 INFO spark.SecurityManager: Changing view acls to:
  yarn,root
  16/03/15 17:54:45 INFO spark.SecurityM

Re: Job failed while submitting python to yarn programatically

2016-03-15 Thread Saisai Shao
You cannot directly invoke Spark application by using yarn#client like what
you mentioned, it is deprecated and not supported. you have to use
spark-submit to submit a Spark application to yarn.

Also here the specific problem is that you're invoking yarn#client to run
spark app as yarn-client mode (by default), in which AM expected that
driver is already started,  but here apparently not, so AM will throw such
exception.

Anyway, this way of submitting spark application is not a supported way for
now, please refer to the docs for spark-submit.

Thanks
Saisai

On Wed, Mar 16, 2016 at 11:05 AM, Jeff Zhang  wrote:

> Could you try yarn-cluster mode ? Make sure your cluster nodes can reach
> your client machine and no firewall.
>
> On Wed, Mar 16, 2016 at 10:54 AM,  wrote:
>
>>
>> Hi all,
>>
>> We're trying to submit a python file, pi.py in this case, to yarn from
>> java
>> code but this kept failing(1.6.0).
>> It seems the AM uses the arguments we passed to pi.py as the driver IP
>> address.
>> Could someone help me figuring out how to get the job done. Thanks in
>> advance.
>>
>> The java code looks like below:
>>
>>   String[] args = new String[]{
>> "--name",
>> "Test Submit Python To Yarn From Java",
>> "--primary-py-file",
>> SPARK_HOME + "/examples/src/main/python/pi.py",
>> "--num-executors",
>> "5",
>> "--driver-memory",
>> "512m",
>> "--executor-memory",
>> "512m",
>> "--executor-cores",
>> "1",
>> "--arg",
>> args[0]
>> };
>>
>> Configuration config = new Configuration();
>> SparkConf sparkConf = new SparkConf();
>> ClientArguments clientArgs = new ClientArguments(args,
>> sparkConf
>> );
>> Client client = new Client(clientArgs, config, sparkConf);
>> client.run();
>>
>>
>> The jar is submitted by spark-submit::
>> ./bin/spark-submit --class SubmitPyYARNJobFromJava --master yarn-client
>> TestSubmitPythonFromJava.jar 10
>>
>>
>> The job submit to yarn just stay in ACCEPTED before it failed
>> What I can't figure out is, yarn log shows AM couldn't reach the driver at
>> 10:0, which is my argument passed to pi.py
>>
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>
>> [jar:file:/data/1/yarn/local/usercache/root/filecache/2084/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>
>> SLF4J: Found binding in
>>
>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 16/03/15 17:54:44 INFO yarn.ApplicationMaster: Registered signal handlers
>> for [TERM, HUP, INT]
>> 16/03/15 17:54:45 INFO yarn.ApplicationMaster: ApplicationAttemptId:
>> appattempt_1458023046377_0499_01
>> 16/03/15 17:54:45 INFO spark.SecurityManager: Changing view acls to:
>> yarn,root
>> 16/03/15 17:54:45 INFO spark.SecurityManager: Changing modify acls to:
>> yarn,root
>> 16/03/15 17:54:45 INFO spark.SecurityManager: SecurityManager:
>> authentication disabled; ui acls disabled; users with view permissions:
>> Set
>> (yarn, root); users with modify permissions: Set(yarn, root)
>> 16/03/15 17:54:45 INFO yarn.ApplicationMaster: Waiting for Spark driver to
>> be reachable.
>> 16/03/15 17:54:45 ERROR yarn.ApplicationMaster: Failed to connect to
>> driver
>> at 10:0, retrying ...
>> 16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to
>> driver
>> at 10:0, retrying ...
>> 16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to
>> driver
>> at 10:0, retrying ...
>> .
>> 16/03/15 17:56:25 ERROR yarn.ApplicationMaster: Failed to connect to
>> driver
>> at 10:0, retrying ...
>> 16/03/15 17:56:26 ERROR yarn.ApplicationMaster: Uncaught exception:
>> org.apache.spark.SparkException: Failed to connect to driver!
>>  at
>> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver
>> (ApplicationMaster.scala:484)
>>  at
>> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher
>> (ApplicationMaster.scala:345)
>>  at org.apache.spark.deploy.yarn.ApplicationMaster.run
>> (ApplicationMaster.scala:187)
>>  at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun
>> $main$1.apply$mcV$sp(ApplicationMaster.scala:653)
>>  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run
>> (SparkHadoopUtil.scala:69)
>>  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run
>> (SparkHadoopUtil.scala:68)
>>  at java.security.AccessController.doPrivileged(Native
>> Method)
>>

Re: Job failed while submitting python to yarn programatically

2016-03-15 Thread Jeff Zhang
Could you try yarn-cluster mode ? Make sure your cluster nodes can reach
your client machine and no firewall.

On Wed, Mar 16, 2016 at 10:54 AM,  wrote:

>
> Hi all,
>
> We're trying to submit a python file, pi.py in this case, to yarn from java
> code but this kept failing(1.6.0).
> It seems the AM uses the arguments we passed to pi.py as the driver IP
> address.
> Could someone help me figuring out how to get the job done. Thanks in
> advance.
>
> The java code looks like below:
>
>   String[] args = new String[]{
> "--name",
> "Test Submit Python To Yarn From Java",
> "--primary-py-file",
> SPARK_HOME + "/examples/src/main/python/pi.py",
> "--num-executors",
> "5",
> "--driver-memory",
> "512m",
> "--executor-memory",
> "512m",
> "--executor-cores",
> "1",
> "--arg",
> args[0]
> };
>
> Configuration config = new Configuration();
> SparkConf sparkConf = new SparkConf();
> ClientArguments clientArgs = new ClientArguments(args,
> sparkConf
> );
> Client client = new Client(clientArgs, config, sparkConf);
> client.run();
>
>
> The jar is submitted by spark-submit::
> ./bin/spark-submit --class SubmitPyYARNJobFromJava --master yarn-client
> TestSubmitPythonFromJava.jar 10
>
>
> The job submit to yarn just stay in ACCEPTED before it failed
> What I can't figure out is, yarn log shows AM couldn't reach the driver at
> 10:0, which is my argument passed to pi.py
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/data/1/yarn/local/usercache/root/filecache/2084/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: Found binding in
>
> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/03/15 17:54:44 INFO yarn.ApplicationMaster: Registered signal handlers
> for [TERM, HUP, INT]
> 16/03/15 17:54:45 INFO yarn.ApplicationMaster: ApplicationAttemptId:
> appattempt_1458023046377_0499_01
> 16/03/15 17:54:45 INFO spark.SecurityManager: Changing view acls to:
> yarn,root
> 16/03/15 17:54:45 INFO spark.SecurityManager: Changing modify acls to:
> yarn,root
> 16/03/15 17:54:45 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions: Set
> (yarn, root); users with modify permissions: Set(yarn, root)
> 16/03/15 17:54:45 INFO yarn.ApplicationMaster: Waiting for Spark driver to
> be reachable.
> 16/03/15 17:54:45 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 10:0, retrying ...
> 16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 10:0, retrying ...
> 16/03/15 17:54:46 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 10:0, retrying ...
> .
> 16/03/15 17:56:25 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 10:0, retrying ...
> 16/03/15 17:56:26 ERROR yarn.ApplicationMaster: Uncaught exception:
> org.apache.spark.SparkException: Failed to connect to driver!
>  at
> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver
> (ApplicationMaster.scala:484)
>  at
> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher
> (ApplicationMaster.scala:345)
>  at org.apache.spark.deploy.yarn.ApplicationMaster.run
> (ApplicationMaster.scala:187)
>  at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun
> $main$1.apply$mcV$sp(ApplicationMaster.scala:653)
>  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run
> (SparkHadoopUtil.scala:69)
>  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run
> (SparkHadoopUtil.scala:68)
>  at java.security.AccessController.doPrivileged(Native
> Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at org.apache.hadoop.security.UserGroupInformation.doAs
> (UserGroupInformation.java:1628)
>  at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser
> (SparkHadoopUtil.scala:68)
>  at org.apache.spark.deploy.yarn.ApplicationMaster$.main
> (ApplicationMaster.scala:651)
>  at org.apache.spark.deploy.yarn.ExecutorLauncher$.main
> (ApplicationMaster.scala:674)
>  at org.apache.spark.deploy.yarn.ExecutorLauncher.main
> (ApplicationMaster.scala)
> 16/03/15 17:56:26 INFO yarn.ApplicationMaster: Final app status: FAILED,
> exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException:
> Failed to connect to