Cheng, could you elaborate on your criteria, `Hive 2.3 code paths are
proven to be stable`?
For me, it's difficult to image that we can reach any stable situation when
we don't use it at all by ourselves.
> The Hive 1.2 code paths can only be removed once the Hive 2.3 code
paths are proven to
> I don't think the default Hadoop version matters except for the
spark-hadoop-cloud module, which is only meaningful under the hadoop-3.2
profile.
What do you mean by "only meaningful under the hadoop-3.2 profile"?
On Tue, Nov 19, 2019 at 5:40 PM Cheng Lian wrote:
> Hey Steve,
>
> In terms of
We don't have an official Spark with Hadoop 3 yet (except the preview) if I
am not mistaken.
I think it's more natural to one minor release term before switching this
...
How about we target Hadoop 3 as default in Spark 3.1?
2019년 11월 20일 (수) 오전 7:40, Cheng Lian 님이 작성:
> Hey Steve,
>
> In terms
> Should Hadoop 2 + Hive 2 be considered to work on JDK 11?
This seems being investigated by Yuming's PR (
https://github.com/apache/spark/pull/26533) if I am not mistaken.
Oh, yes, what I meant by (default) was the default profiles we will use in
Spark.
2019년 11월 20일 (수) 오전 10:14, Sean Owen 님이
Should Hadoop 2 + Hive 2 be considered to work on JDK 11? I wasn't
sure if 2.7 did, but honestly I've lost track.
Anyway, it doesn't matter much as the JDK doesn't cause another build
permutation. All are built targeting Java 8.
I also don't know if we have to declare a binary release a default.
So, are we able to conclude our plans as below?
1. In Spark 3, we release as below:
- Hadoop 3.2 + Hive 2.3 + JDK8 build that also works JDK 11
- Hadoop 2.7 + Hive 2.3 + JDK8 build that also works JDK 11
- Hadoop 2.7 + Hive 1.2.1 (fork) + JDK8 (default)
2. In Spark 3.1, we target:
-
Thanks for taking care of this, Dongjoon!
We can target SPARK-20202 to 3.1.0, but I don't think we should do it
immediately after cutting the branch-3.0. The Hive 1.2 code paths can only
be removed once the Hive 2.3 code paths are proven to be stable. If it
turned out to be buggy in Spark 3.1, we
Great work, Bo! Would love to hear the details.
On Tue, Nov 19, 2019 at 4:05 PM Ryan Blue wrote:
> I'm interested in remote shuffle services as well. I'd love to hear about
> what you're using in production!
>
> rb
>
> On Tue, Nov 19, 2019 at 2:43 PM bo yang wrote:
>
>> Hi Ben,
>>
>> Thanks
I'm interested in remote shuffle services as well. I'd love to hear about
what you're using in production!
rb
On Tue, Nov 19, 2019 at 2:43 PM bo yang wrote:
> Hi Ben,
>
> Thanks for the writing up! This is Bo from Uber. I am in Felix's team in
> Seattle, and working on disaggregated shuffle
Same idea? support this combo in 3.0 and then remove Hadoop 2 support
in 3.1 or something? or at least make them non-default, not
necessarily publish special builds?
On Tue, Nov 19, 2019 at 4:53 PM Dongjoon Hyun wrote:
> For additional `hadoop-2.7 with Hive 2.3` pre-built distribution, how do
Yes. It does. I meant SPARK-20202.
Thanks. I understand that it can be considered like Scala version issue.
So, that's the reason why I put this as a `policy` issue from the beginning.
> First of all, I want to put this as a policy issue instead of a technical
issue.
In the policy perspective,
Hi Ben,
Thanks for the writing up! This is Bo from Uber. I am in Felix's team in
Seattle, and working on disaggregated shuffle (we called it remote shuffle
service, RSS, internally). We have put RSS into production for a while, and
learned a lot during the work (tried quite a few techniques to
Hey Steve,
In terms of Maven artifact, I don't think the default Hadoop version
matters except for the spark-hadoop-cloud module, which is only meaningful
under the hadoop-3.2 profile. All the other spark-* artifacts published to
Maven central are Hadoop-version-neutral.
Another issue about
Thank you, Sean, Shane, and Xiao!
Bests,
Dongjoon.
On Tue, Nov 19, 2019 at 2:15 PM Shane Knapp wrote:
> i had a few minutes and everything has been deleted!
>
> On Tue, Nov 19, 2019 at 2:02 PM Shane Knapp wrote:
> >
> > thank sean!
> >
> > i am all for moving these jobs to github actions,
i had a few minutes and everything has been deleted!
On Tue, Nov 19, 2019 at 2:02 PM Shane Knapp wrote:
>
> thank sean!
>
> i am all for moving these jobs to github actions, and will be doing
> this 'soon' as i'm @ kubecon this week.
>
> btw the R ecosystem definitely needs some attention,
It's kinda like Scala version upgrade. Historically, we only remove the
support of an older Scala version when the newer version is proven to be
stable after one or more Spark minor versions.
On Tue, Nov 19, 2019 at 2:07 PM Cheng Lian wrote:
> Hmm, what exactly did you mean by "remove the usage
Hmm, what exactly did you mean by "remove the usage of forked `hive` in
Apache Spark 3.0 completely officially"? I thought you wanted to remove the
forked Hive 1.2 dependencies completely, no? As long as we still keep the
Hive 1.2 in Spark 3.0, I'm fine with that. I personally don't have a
thank sean!
i am all for moving these jobs to github actions, and will be doing
this 'soon' as i'm @ kubecon this week.
btw the R ecosystem definitely needs some attention, however, but
that's an issue for another time. :)
On Tue, Nov 19, 2019 at 1:49 PM Sean Owen wrote:
>
> I would favor
I would favor moving whatever we can to Github. It's difficult to
modify the Jenkins instances without Shane's valiant help, and over
time makes more sense to modernize and integrate it into the project.
On Tue, Nov 19, 2019 at 3:35 PM Dongjoon Hyun wrote:
>
> Hi, All.
>
> Apache Spark community
Hi, All.
Apache Spark community used the following dashboard as post-hook
verifications.
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
There are six registered jobs.
1. spark-branch-2.4-compile-maven-hadoop-2.6
2. spark-branch-2.4-compile-maven-hadoop-2.7
3.
BTW, `hive.version.short` is a directory name. We are using 2.3.6 only.
For directory name, we use '1.2.1' and '2.3.5' because we just delayed the
renaming the directories until 3.0.0 deadline to minimize the diff.
We can replace it immediately if we want right now.
On Tue, Nov 19, 2019 at
Hi, Cheng.
This is irrelevant to JDK11 and Hadoop 3. I'm talking about JDK8 world.
If we consider them, it could be the followings.
+--+-++
| | Hive 1.2.1 fork | Apache Hive 2.3.6 |
+-+
Dongjoon, I'm with Hyukjin. There should be at least one Spark 3.x minor
release to stabilize Hive 2.3 code paths before retiring the Hive 1.2
fork. Even today, the Hive 2.3.6 version bundled in Spark 3.0 is still
buggy in terms of JDK 11 support. (BTW, I just found that our root POM is
referring
Thank you for feedback, Hyujkjin and Sean.
I proposed `preview-2` for that purpose but I'm also +1 for do that at 3.1
if we can make a decision to eliminate the illegitimate Hive fork reference
immediately after `branch-3.0` cut.
Sean, I'm referencing Cheng Lian's email for the status of
Just to clarify, as even I have lost the details over time: hadoop-2.7
works with hive-2.3? it isn't tied to hadoop-3.2?
Roughly how much risk is there in using the Hive 1.x fork over Hive
2.x, for end users using Hive via Spark?
I don't have a strong opinion, other than sharing the view that we
Hi @Sean Owen ,
Thanks for your reply and patient.
First, we are so apologized for the bad words in the previous emails. We
just want to make the users can see the current support status in some
place of spark community. I'm really appreciated that you and spark
community make spark better on
26 matches
Mail list logo