4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread George Magiros
I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn
using 4.0.0-preview1.  However I got it to work only after fixing an issue
with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue
was:
1. If the nodemanagers used java 11, Yarn threw an error about not finding
the jdk.incubator.vector module.
2. If the nodemanagers used java 17, which has the jdk.incubator.vector
module, Yarn threw a reflection error about class not found.

To resolve the error and successfully calculate pi,
1. I ran java 17 on the nodemanagers and
2. added 'export HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"'
to their conf/hadoop-env.sh file.

George


Re: 4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread Wenchen Fan
Thanks for sharing! Yea Spark 4.0 is built using Java 17.

On Tue, Jun 18, 2024 at 5:07 AM George Magiros  wrote:

> I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn
> using 4.0.0-preview1.  However I got it to work only after fixing an issue
> with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue
> was:
> 1. If the nodemanagers used java 11, Yarn threw an error about not finding
> the jdk.incubator.vector module.
> 2. If the nodemanagers used java 17, which has the jdk.incubator.vector
> module, Yarn threw a reflection error about class not found.
>
> To resolve the error and successfully calculate pi,
> 1. I ran java 17 on the nodemanagers and
> 2. added 'export
> HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"' to their
> conf/hadoop-env.sh file.
>
> George
>
>


Re: 4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread Cheng Pan
You don’t need to upgrade Java for HDFS and YARN. Just keep using Java 8 for 
Hadoop and set JAVA_HOME to Java 17 for Spark applications[1].

0. Install Java 17 on all nodes, for example, under /opt/openjdk-17

1. Modify $SPARK_CONF_DIR/spark-env.sh
export JAVA_HOME=/opt/openjdk-17

2. Modify $SPARK_CONF_DIR/spark-defaults.conf
spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17
spark.executorEnv.JAVA_HOME=/opt/openjdk-17

[1] 
https://github.com/awesome-kyuubi/hadoop-testing/commit/9f7c0d7388dfc7fbe6e4658515a6c28d5ba93c8e

Thanks,
Cheng Pan


> On Jun 18, 2024, at 02:00, George Magiros  wrote:
> 
> I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn 
> using 4.0.0-preview1.  However I got it to work only after fixing an issue 
> with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue was:
> 1. If the nodemanagers used java 11, Yarn threw an error about not finding 
> the jdk.incubator.vector module.
> 2. If the nodemanagers used java 17, which has the jdk.incubator.vector 
> module, Yarn threw a reflection error about class not found.
> 
> To resolve the error and successfully calculate pi, 
> 1. I ran java 17 on the nodemanagers and 
> 2. added 'export HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"' 
> to their conf/hadoop-env.sh file.
> 
> George
> 



Re: 4.0.0-preview1 test report: running on Yarn

2024-06-18 Thread Cheng Pan
FYI, I have submitted SPARK-48651(https://github.com/apache/spark/pull/47010) 
to update the Spark on YARN docs for JDK configuration, looking forward to your 
feedback.

Thanks,
Cheng Pan


> On Jun 18, 2024, at 02:00, George Magiros  wrote:
> 
> I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn 
> using 4.0.0-preview1.  However I got it to work only after fixing an issue 
> with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue was:
> 1. If the nodemanagers used java 11, Yarn threw an error about not finding 
> the jdk.incubator.vector module.
> 2. If the nodemanagers used java 17, which has the jdk.incubator.vector 
> module, Yarn threw a reflection error about class not found.
> 
> To resolve the error and successfully calculate pi, 
> 1. I ran java 17 on the nodemanagers and 
> 2. added 'export HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"' 
> to their conf/hadoop-env.sh file.
> 
> George
> 



Re: 4.0.0-preview1 test report: running on Yarn

2024-06-18 Thread George Magiros
Thank you all so much for the kind words of encouragement on my first test
report.  As a follow up, I ran all my HDFS and Yarn nodes on Java 8 -
including my Nodemanagers.  I then modified Spark's
conf/spark-defaults.conf according to Mr. Pan's prior post, and it worked:
I was able to submit SparkPi and my PySpark code using 4.0.0-preview1 to
Yarn, successfully deploying in both client and cluster mode.  Without the
changes, Yarn would have otherwise thrown an Unsupported Class Version
Error about org/apache/spark/deploy/yarn/ExecutorLauncher.  George

On Tue, Jun 18, 2024 at 6:26 AM Cheng Pan  wrote:

> FYI, I have submitted SPARK-48651(
> https://github.com/apache/spark/pull/47010) to update the Spark on YARN
> docs for JDK configuration, looking forward to your feedback.
>
> Thanks,
> Cheng Pan
>
>
> On Jun 18, 2024, at 02:00, George Magiros  wrote:
>
> I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn
> using 4.0.0-preview1.  However I got it to work only after fixing an issue
> with the Yarn nodemanagers (Hadoop v3.3.6 and v3.4.0).  Namely the issue
> was:
> 1. If the nodemanagers used java 11, Yarn threw an error about not finding
> the jdk.incubator.vector module.
> 2. If the nodemanagers used java 17, which has the jdk.incubator.vector
> module, Yarn threw a reflection error about class not found.
>
> To resolve the error and successfully calculate pi,
> 1. I ran java 17 on the nodemanagers and
> 2. added 'export
> HADOOP_OPTS="--add-opens=java.base/java.lang=ALL-UNNAMED"' to their
> conf/hadoop-env.sh file.
>
> George
>
>
>