Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-20 Thread DB Tsai
In Spark 3.0, if you use the `with-hadoop` Spark distribution that has embedded Hadoop 3.2, you can set `spark.yarn.populateHadoopClasspath=false` to not populate the cluster's hadoop classpath. In this scenario, Spark will use hadoop 3.2 client to connect to hadoop 2.6 which should work fine. In

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-20 Thread DB Tsai
If it's standalone mode, it's even easier. You should be able to connect to hadoop 2.6 hdfs using 3.2 client. In your k8s cluster, just don't put hadoop 2.6 into your classpath. On Sun, Jul 19, 2020 at 10:25 PM Ashika Umanga Umagiliya wrote: > > Hello > > "spark.yarn.populateHadoopClasspath" is

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Ashika Umanga Umagiliya
Hello "spark.yarn.populateHadoopClasspath" is used in YARN mode correct? However our Spark cluster is standalone cluster not using YARN. We only connect to HDFS/Hive to access data.Computation is done on our spark cluster running on K8s (not Yarn) On Mon, Jul 20, 2020 at 2:04 PM DB Tsai wrote:

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Prashant Sharma
Hi Ashika, Hadoop 2.6 is now no longer supported, and since it has not been maintained in the last 2 years, it means it may have some security issues unpatched. Spark 3.0 onwards, we no longer support it, in other words, we have modified our codebase in a way that Hadoop 2.6 won't work. However,

Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Ashika Umanga
Greetings, Hadoop 2.6 has been removed according to this ticket https://issues.apache.org/jira/browse/SPARK-25016 We run our Spark cluster on K8s in standalone mode. We access HDFS/Hive running on a Hadoop 2.6 cluster. We've been using Spark 2.4.5 and planning on upgrading to Spark 3.0.0