Will pyspark users care much about Hadoop version? they won't if running
locally. They will if connecting to a Hadoop cluster. Then again in that
context, they're probably using a distro anyway that harmonizes it.
Hadoop 3's installed based can't be that large yet; it's been around far
less time.

The bigger question indeed is dropping Hadoop 2.x / Hive 1.x etc
eventually, not now.
But if the question now is build defaults, is it a big deal either way?

On Tue, Jun 23, 2020 at 11:03 PM Xiao Li <lix...@databricks.com> wrote:

> I think we just need to provide two options and let end users choose the
> ones they need. Hadoop 3.2 or Hadoop 2.7. Thus, SPARK-32017 (Make Pyspark
> Hadoop 3.2+ Variant available in PyPI) is a high priority task for Spark
> 3.1 release to me.
>
> I do not know how to track the popularity of Hadoop 2 vs Hadoop 3. Based
> on this link
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs , it
> sounds like Hadoop 3.x is not as popular as Hadoop 2.7.
>
>
>

Reply via email to