Re: [PR] [SPARK-48651][DOC] Configuring different JDK for Spark on YARN [spark]

via GitHub Tue, 18 Jun 2024 07:08:40 -0700


pan3793 commented on code in PR #47010:
URL: https://github.com/apache/spark/pull/47010#discussion_r1644535063



##########
docs/running-on-yarn.md:
##########
@@ -1032,3 +1035,34 @@ and one should be configured with:
   spark.shuffle.service.name = spark_shuffle_y
   spark.shuffle.service.port = <other value>
 ```
+
+# Configuring different JDKs for Spark Applications
+
+In some cases it may be desirable to use a different JDK from YARN node 
manager to run Spark applications,
+this can be achieved by setting the `JAVA_HOME` environment variable for YARN 
containers and the `spark-submit`
+process.
+
+Note that, Spark assumes that all JVM processes runs in one application use 
the same version of JDK, otherwise,
+you may encounter JDK serialization issues.
+
+To configure a Spark application to use a JDK which has been pre-installed on 
all nodes at `/opt/openjdk-17`:
+
+    $ export JAVA_HOME=/opt/openjdk-17
+    $ ./bin/spark-submit --class path.to.your.Class \
+        --master yarn \
+        --conf spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17 \
+        --conf spark.executorEnv.JAVA_HOME=/opt/openjdk-17 \
+        <app jar> [app options]
+
+Optionally, the user may want to avoid installing a different JDK on the YARN 
cluster nodes, in such a case,
+it's also possible to distribute the JDK using YARN's Distributed Cache. For 
example, to use Java 21 to run
+a Spark application, prepare a JDK 21 tarball `openjdk-21.tar.gz` and untar it 
to `/opt` on the local node,
+then submit a Spark application:
+
+    $ export JAVA_HOME=/opt/openjdk-21
+    $ ./bin/spark-submit --class path.to.your.Class \
+        --master yarn \
+        --archives path/to/openjdk-21.tar.gz \
+        --conf 
spark.yarn.appMasterEnv.JAVA_HOME=./openjdk-21.tar.gz/openjdk-21 \
+        --conf spark.executorEnv.JAVA_HOME=./openjdk-21.tar.gz/openjdk-21 \

Review Comment:
   @yaooqinn @tgravescs sorry for correcting this in 
https://github.com/apache/spark/pull/47010/commits/5bbe2008ead4586383d87d78fda6ea9636687c14
 after your approval, I also updated the PR description to add the manual test 
result on a YARN cluster



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48651][DOC] Configuring different JDK for Spark on YARN [spark]

Reply via email to