I'm also genuinely curious when PyPI users would care about the bundled Hadoop jars - do we even need two versions? that itself is extra complexity for end users. I do think Hadoop 3 is the better choice for the user who doesn't care, and better long term. OK but let's at least move ahead with changing defaults.
On Wed, Jun 24, 2020 at 12:38 PM Xiao Li <lix...@databricks.com> wrote: > > Hi, Dongjoon, > > Please do not misinterpret my point. I already clearly said "I do not know > how to track the popularity of Hadoop 2 vs Hadoop 3." > > Also, let me repeat my opinion: the top priority is to provide two options > for PyPi distribution and let the end users choose the ones they need. Hadoop > 3.2 or Hadoop 2.7. In general, when we want to make any breaking change, let > us follow our protocol documented in > https://spark.apache.org/versioning-policy.html. > > If you just want to change the Jenkins setup, I am OK about it. If you want > to change the default distribution, we need more discussions in the community > for getting an agreement. > > Thanks, > > Xiao > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org