In Spark 3.0, if you use the `with-hadoop` Spark distribution that has
embedded Hadoop 3.2, you can set
`spark.yarn.populateHadoopClasspath=false` to not populate the
cluster's hadoop classpath. In this scenario, Spark will use hadoop
3.2 client to connect to hadoop 2.6 which should work fine. In
If it's standalone mode, it's even easier. You should be able to
connect to hadoop 2.6 hdfs using 3.2 client. In your k8s cluster, just
don't put hadoop 2.6 into your classpath.
On Sun, Jul 19, 2020 at 10:25 PM Ashika Umanga Umagiliya
wrote:
>
> Hello
>
> "spark.yarn.populateHadoopClasspath" is
Hello
"spark.yarn.populateHadoopClasspath" is used in YARN mode correct?
However our Spark cluster is standalone cluster not using YARN.
We only connect to HDFS/Hive to access data.Computation is done on our
spark cluster running on K8s (not Yarn)
On Mon, Jul 20, 2020 at 2:04 PM DB Tsai wrote:
Hi Ashika,
Hadoop 2.6 is now no longer supported, and since it has not been maintained
in the last 2 years, it means it may have some security issues unpatched.
Spark 3.0 onwards, we no longer support it, in other words, we have
modified our codebase in a way that Hadoop 2.6 won't work. However,
Greetings,
Hadoop 2.6 has been removed according to this ticket
https://issues.apache.org/jira/browse/SPARK-25016
We run our Spark cluster on K8s in standalone mode.
We access HDFS/Hive running on a Hadoop 2.6 cluster.
We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0