Hi, All. I bump up this thread again with the title "Use Hadoop-3.2 as a default Hadoop profile in 3.1.0?" There exists some recent discussion on the following PR. Please let us know your thoughts.
https://github.com/apache/spark/pull/28897 Bests, Dongjoon. On Fri, Nov 1, 2019 at 9:41 AM Xiao Li <lix...@databricks.com> wrote: > Hi, Steve, > > Thanks for your comments! My major quality concern is not against Hadoop > 3.2. In this release, Hive execution module upgrade [from 1.2 to 2.3], Hive > thrift-server upgrade, and JDK11 supports are added to Hadoop 3.2 profile > only. Compared with Hadoop 2.x profile, the Hadoop 3.2 profile is more > risky due to these changes. > > To speed up the adoption of Spark 3.0, which has many other highly > desirable features, I am proposing to keep Hadoop 2.x profile as the > default. > > Cheers, > > Xiao. > > > > On Fri, Nov 1, 2019 at 5:33 AM Steve Loughran <ste...@cloudera.com> wrote: > >> What is the current default value? as the 2.x releases are becoming EOL; >> 2.7 is dead, there might be a 2.8.x; for now 2.9 is the branch-2 release >> getting attention. 2.10.0 shipped yesterday, but the ".0" means there will >> inevitably be surprises. >> >> One issue about using a older versions is that any problem reported >> -especially at stack traces you can blame me for- Will generally be met by >> a response of "does it go away when you upgrade?" The other issue is how >> much test coverage are things getting? >> >> w.r.t Hadoop 3.2 stability, nothing major has been reported. The ABFS >> client is there, and I the big guava update (HADOOP-16213) went in. People >> will either love or hate that. >> >> No major changes in s3a code between 3.2.0 and 3.2.1; I have a large >> backport planned though, including changes to better handle AWS caching of >> 404s generatd from HEAD requests before an object was actually created. >> >> It would be really good if the spark distributions shipped with later >> versions of the hadoop artifacts. >> >> On Mon, Oct 28, 2019 at 7:53 PM Xiao Li <lix...@databricks.com> wrote: >> >>> The stability and quality of Hadoop 3.2 profile are unknown. The changes >>> are massive, including Hive execution and a new version of Hive >>> thriftserver. >>> >>> To reduce the risk, I would like to keep the current default version >>> unchanged. When it becomes stable, we can change the default profile to >>> Hadoop-3.2. >>> >>> Cheers, >>> >>> Xiao >>> >>> On Mon, Oct 28, 2019 at 12:51 PM Sean Owen <sro...@gmail.com> wrote: >>> >>>> I'm OK with that, but don't have a strong opinion nor info about the >>>> implications. >>>> That said my guess is we're close to the point where we don't need to >>>> support Hadoop 2.x anyway, so, yeah. >>>> >>>> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>>> wrote: >>>> > >>>> > Hi, All. >>>> > >>>> > There was a discussion on publishing artifacts built with Hadoop 3 . >>>> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will >>>> be the same because we didn't change anything yet. >>>> > >>>> > Technically, we need to change two places for publishing. >>>> > >>>> > 1. Jenkins Snapshot Publishing >>>> > >>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ >>>> > >>>> > 2. Release Snapshot/Release Publishing >>>> > >>>> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh >>>> > >>>> > To minimize the change, we need to switch our default Hadoop profile. >>>> > >>>> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and >>>> `hadoop-3.2 (3.2.0)` is optional. >>>> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7` >>>> optionally. >>>> > >>>> > Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7` >>>> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x. >>>> > >>>> > Bests, >>>> > Dongjoon. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> >>> >>> -- >>> [image: Databricks Summit - Watch the talks] >>> <https://databricks.com/sparkaisummit/north-america> >>> >> > > -- > [image: Databricks Summit - Watch the talks] > <https://databricks.com/sparkaisummit/north-america> >