[ 
https://issues.apache.org/jira/browse/SPARK-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6681.
------------------------------
    Resolution: Cannot Reproduce

We can reopen this if there is a follow-up with more detail but looks like a 
YARN / env issue.

> JAVA_HOME error with upgrade to Spark 1.3.0
> -------------------------------------------
>
>                 Key: SPARK-6681
>                 URL: https://issues.apache.org/jira/browse/SPARK-6681
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 1.3.0
>         Environment: Client is Mac OS X version 10.10.2, cluster is running 
> HDP 2.1 stack.
>            Reporter: Ken Williams
>
> I’m trying to upgrade a Spark project, written in Scala, from Spark 1.2.1 to 
> 1.3.0, so I changed my `build.sbt` like so:
> {code}
>     -libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1" % 
> "provided"
>     +libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % 
> "provided"
> {code}
> then make an `assembly` jar, and submit it:
> {code}
>     HADOOP_CONF_DIR=/etc/hadoop/conf \
>         spark-submit \
>         --driver-class-path=/etc/hbase/conf \
>         --conf spark.hadoop.validateOutputSpecs=false \
>         --conf 
> spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.3.0-hadoop2.4.0.jar \
>         --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
>         --deploy-mode=cluster \
>         --master=yarn \
>         --class=TestObject \
>         --num-executors=54 \
>         target/scala-2.11/myapp-assembly-1.2.jar
> {code}
> The job fails to submit, with the following exception in the terminal:
> {code}
>     15/03/19 10:30:07 INFO yarn.Client: 
>     15/03/19 10:20:03 INFO yarn.Client: 
>        client token: N/A
>        diagnostics: Application application_1420225286501_4698 failed 2 times 
> due to AM 
>          Container for appattempt_1420225286501_4698_000002 exited with  
> exitCode: 127 
>          due to: Exception from container-launch: 
>     org.apache.hadoop.util.Shell$ExitCodeException: 
>       at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>       at org.apache.hadoop.util.Shell.run(Shell.java:379)
>       at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:662)
> {code}
> Finally, I go and check the YARN app master’s web interface (since the job is 
> there, I know it at least made it that far), and the only logs it shows are 
> these:
> {code}
>         Log Type: stderr
>         Log Length: 61
>         /bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory
>         
>         Log Type: stdout
>         Log Length: 0
> {code}
> I’m not sure how to interpret that - is {{ {{JAVA_HOME}} }} a literal 
> (including the brackets) that’s somehow making it into a script?  Is this 
> coming from the worker nodes or the driver?  Anything I can do to experiment 
> & troubleshoot?
> I do have {{JAVA_HOME}} set in the hadoop config files on all the nodes of 
> the cluster:
> {code}
>     % grep JAVA_HOME /etc/hadoop/conf/*.sh
>     /etc/hadoop/conf/hadoop-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
>     /etc/hadoop/conf/yarn-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
> {code}
> Has this behavior changed in 1.3.0 since 1.2.1?  Using 1.2.1 and making no 
> other changes, the job completes fine.
> (Note: I originally posted this on the Spark mailing list and also on Stack 
> Overflow, I'll update both places if/when I find a solution.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to