date:20191119

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun

Cheng, could you elaborate on your criteria, `Hive 2.3 code paths are proven to be stable`? For me, it's difficult to image that we can reach any stable situation when we don't use it at all by ourselves. > The Hive 1.2 code paths can only be removed once the Hive 2.3 code paths are proven to

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-19 Thread Nicholas Chammas

> I don't think the default Hadoop version matters except for the spark-hadoop-cloud module, which is only meaningful under the hadoop-3.2 profile. What do you mean by "only meaningful under the hadoop-3.2 profile"? On Tue, Nov 19, 2019 at 5:40 PM Cheng Lian wrote: > Hey Steve, > > In terms of

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-19 Thread Hyukjin Kwon

We don't have an official Spark with Hadoop 3 yet (except the preview) if I am not mistaken. I think it's more natural to one minor release term before switching this ... How about we target Hadoop 3 as default in Spark 3.1? 2019년 11월 20일 (수) 오전 7:40, Cheng Lian 님이 작성: > Hey Steve, > > In terms

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Hyukjin Kwon

> Should Hadoop 2 + Hive 2 be considered to work on JDK 11? This seems being investigated by Yuming's PR ( https://github.com/apache/spark/pull/26533) if I am not mistaken. Oh, yes, what I meant by (default) was the default profiles we will use in Spark. 2019년 11월 20일 (수) 오전 10:14, Sean Owen 님이

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Sean Owen

Should Hadoop 2 + Hive 2 be considered to work on JDK 11? I wasn't sure if 2.7 did, but honestly I've lost track. Anyway, it doesn't matter much as the JDK doesn't cause another build permutation. All are built targeting Java 8. I also don't know if we have to declare a binary release a default.

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Hyukjin Kwon

So, are we able to conclude our plans as below? 1. In Spark 3, we release as below: - Hadoop 3.2 + Hive 2.3 + JDK8 build that also works JDK 11 - Hadoop 2.7 + Hive 2.3 + JDK8 build that also works JDK 11 - Hadoop 2.7 + Hive 1.2.1 (fork) + JDK8 (default) 2. In Spark 3.1, we target: -

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Cheng Lian

Thanks for taking care of this, Dongjoon! We can target SPARK-20202 to 3.1.0, but I don't think we should do it immediately after cutting the branch-3.0. The Hive 1.2 code paths can only be removed once the Hive 2.3 code paths are proven to be stable. If it turned out to be buggy in Spark 3.1, we

Re: Enabling fully disaggregated shuffle on Spark

2019-11-19 Thread John Zhuge

Great work, Bo! Would love to hear the details. On Tue, Nov 19, 2019 at 4:05 PM Ryan Blue wrote: > I'm interested in remote shuffle services as well. I'd love to hear about > what you're using in production! > > rb > > On Tue, Nov 19, 2019 at 2:43 PM bo yang wrote: > >> Hi Ben, >> >> Thanks

Re: Enabling fully disaggregated shuffle on Spark

2019-11-19 Thread Ryan Blue

I'm interested in remote shuffle services as well. I'd love to hear about what you're using in production! rb On Tue, Nov 19, 2019 at 2:43 PM bo yang wrote: > Hi Ben, > > Thanks for the writing up! This is Bo from Uber. I am in Felix's team in > Seattle, and working on disaggregated shuffle

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Sean Owen

Same idea? support this combo in 3.0 and then remove Hadoop 2 support in 3.1 or something? or at least make them non-default, not necessarily publish special builds? On Tue, Nov 19, 2019 at 4:53 PM Dongjoon Hyun wrote: > For additional `hadoop-2.7 with Hive 2.3` pre-built distribution, how do

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun

Yes. It does. I meant SPARK-20202. Thanks. I understand that it can be considered like Scala version issue. So, that's the reason why I put this as a `policy` issue from the beginning. > First of all, I want to put this as a policy issue instead of a technical issue. In the policy perspective,

Re: Enabling fully disaggregated shuffle on Spark

2019-11-19 Thread bo yang

Hi Ben, Thanks for the writing up! This is Bo from Uber. I am in Felix's team in Seattle, and working on disaggregated shuffle (we called it remote shuffle service, RSS, internally). We have put RSS into production for a while, and learned a lot during the work (tried quite a few techniques to

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-19 Thread Cheng Lian

Hey Steve, In terms of Maven artifact, I don't think the default Hadoop version matters except for the spark-hadoop-cloud module, which is only meaningful under the hadoop-3.2 profile. All the other spark-* artifacts published to Maven central are Hadoop-version-neutral. Another issue about

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Dongjoon Hyun

Thank you, Sean, Shane, and Xiao! Bests, Dongjoon. On Tue, Nov 19, 2019 at 2:15 PM Shane Knapp wrote: > i had a few minutes and everything has been deleted! > > On Tue, Nov 19, 2019 at 2:02 PM Shane Knapp wrote: > > > > thank sean! > > > > i am all for moving these jobs to github actions,

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Shane Knapp

i had a few minutes and everything has been deleted! On Tue, Nov 19, 2019 at 2:02 PM Shane Knapp wrote: > > thank sean! > > i am all for moving these jobs to github actions, and will be doing > this 'soon' as i'm @ kubecon this week. > > btw the R ecosystem definitely needs some attention,

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Cheng Lian

It's kinda like Scala version upgrade. Historically, we only remove the support of an older Scala version when the newer version is proven to be stable after one or more Spark minor versions. On Tue, Nov 19, 2019 at 2:07 PM Cheng Lian wrote: > Hmm, what exactly did you mean by "remove the usage

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Cheng Lian

Hmm, what exactly did you mean by "remove the usage of forked `hive` in Apache Spark 3.0 completely officially"? I thought you wanted to remove the forked Hive 1.2 dependencies completely, no? As long as we still keep the Hive 1.2 in Spark 3.0, I'm fine with that. I personally don't have a

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Shane Knapp

thank sean! i am all for moving these jobs to github actions, and will be doing this 'soon' as i'm @ kubecon this week. btw the R ecosystem definitely needs some attention, however, but that's an issue for another time. :) On Tue, Nov 19, 2019 at 1:49 PM Sean Owen wrote: > > I would favor

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Sean Owen

I would favor moving whatever we can to Github. It's difficult to modify the Jenkins instances without Shane's valiant help, and over time makes more sense to modernize and integrate it into the project. On Tue, Nov 19, 2019 at 3:35 PM Dongjoon Hyun wrote: > > Hi, All. > > Apache Spark community

Migration `Spark QA Compile` Jenkins jobs to GitHub Action

2019-11-19 Thread Dongjoon Hyun

Hi, All. Apache Spark community used the following dashboard as post-hook verifications. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/ There are six registered jobs. 1. spark-branch-2.4-compile-maven-hadoop-2.6 2. spark-branch-2.4-compile-maven-hadoop-2.7 3.

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun

BTW, `hive.version.short` is a directory name. We are using 2.3.6 only. For directory name, we use '1.2.1' and '2.3.5' because we just delayed the renaming the directories until 3.0.0 deadline to minimize the diff. We can replace it immediately if we want right now. On Tue, Nov 19, 2019 at

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun

Hi, Cheng. This is irrelevant to JDK11 and Hadoop 3. I'm talking about JDK8 world. If we consider them, it could be the followings. +--+-++ | | Hive 1.2.1 fork | Apache Hive 2.3.6 | +-+

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Cheng Lian

Dongjoon, I'm with Hyukjin. There should be at least one Spark 3.x minor release to stabilize Hive 2.3 code paths before retiring the Hive 1.2 fork. Even today, the Hive 2.3.6 version bundled in Spark 3.0 is still buggy in terms of JDK 11 support. (BTW, I just found that our root POM is referring

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Dongjoon Hyun

Thank you for feedback, Hyujkjin and Sean. I proposed `preview-2` for that purpose but I'm also +1 for do that at 3.1 if we can make a decision to eliminate the illegitimate Hive fork reference immediately after `branch-3.0` cut. Sean, I'm referencing Cheng Lian's email for the status of

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Sean Owen

Just to clarify, as even I have lost the details over time: hadoop-2.7 works with hive-2.3? it isn't tied to hadoop-3.2? Roughly how much risk is there in using the Hive 1.x fork over Hive 2.x, for end users using Hive via Spark? I don't have a strong opinion, other than sharing the view that we

Re: Ask for ARM CI for spark

2019-11-19 Thread bo zhaobo

Hi @Sean Owen , Thanks for your reply and patient. First, we are so apologized for the bad words in the previous emails. We just want to make the users can see the current support status in some place of spark community. I'm really appreciated that you and spark community make spark better on

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Enabling fully disaggregated shuffle on Spark

Re: Enabling fully disaggregated shuffle on Spark

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Enabling fully disaggregated shuffle on Spark

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

Re: Migration `Spark QA Compile` Jenkins jobs to GitHub Action

Migration `Spark QA Compile` Jenkins jobs to GitHub Action

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

Re: Ask for ARM CI for spark

26 matches

Site Navigation

Mail list logo

Footer information