Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-04 Thread Steve Loughran
On Mon, Nov 4, 2019 at 12:39 AM Nicholas Chammas wrote: > On Fri, Nov 1, 2019 at 8:41 AM Steve Loughran > wrote: > >> It would be really good if the spark distributions shipped with later >> versions of the hadoop artifacts. >> > > I second this. If we need to keep a Hadoop 2.x profile around,

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-04 Thread Steve Loughran
I'd move spark's branch-2 line to 2.9.x as (a) spark's version of httpclient hits a bug in the AWS SDK used in hadoop-2.8 unless you revert that patch https://issues.apache.org/jira/browse/SPARK-22919 (b) there's only one future version of 2.8x planned, which is expected once myself or someone

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-11-04 Thread Dongjoon Hyun
Hi, Xingbo. Could you sent a vote result email to finalize this vote, please? Bests, Dongjoon. On Fri, Nov 1, 2019 at 2:55 PM Takeshi Yamamuro wrote: > +1, too. > > On Sat, Nov 2, 2019 at 3:36 AM Hyukjin Kwon wrote: > >> +1 >> >> On Fri, 1 Nov 2019, 15:36 Wenchen Fan, wrote: >> >>> The PR

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-04 Thread Koert Kuipers
i get that cdh and hdp backport a lot and in that way left 2.7 behind. but they kept the public apis stable at the 2.7 level, because thats kind of the point. arent those the hadoop apis spark uses? On Mon, Nov 4, 2019 at 10:07 AM Steve Loughran wrote: > > > On Mon, Nov 4, 2019 at 12:39 AM

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-11-04 Thread Xingbo Jiang
This vote passes! I'll follow up with a formal release announcement soon. +1: Sean Owen (binding) Wenchen Fan (binding) Hyukjin Kwon (binding) Dongjoon Hyun (binding) Takeshi Yamamuro +0: None -1: None Thanks, everyone! Xingbo On Mon, Nov 4, 2019 at 9:35 AM Dongjoon Hyun wrote: > Hi,

[DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-04 Thread Bryan Cutler
Currently, when a PySpark Row is created with keyword arguments, the fields are sorted alphabetically. This has created a lot of confusion with users because it is not obvious (although it is stated in the pydocs) that they will be sorted alphabetically. Then later when applying a schema and the

A question about skew join hint

2019-11-04 Thread zhangliyun
Hi all: i saw skewed join hint optimization in https://docs.azuredatabricks.net/delta/join-performance/skew-join.html. it is a great feature to help users to avoid the problem brought from skewed data. My question 1. which version we will have this ? i have not found the feature in the

Fw: A question about skew join hint

2019-11-04 Thread Mayank Pradhan
watch this thread folks. @Eugene Koifman if you recall this is the same question I was asking you earlier. -Mayank From: zhangliyun Sent: Monday, November 4, 2019 5:21 PM To: Spark Dev List ; u...@spark.apache.org Subject:

Build customized resource manager

2019-11-04 Thread Klaus Ma
Hi team, AFAIK, we built k8s/yarn/mesos as resource manager; but I'd like to did some enhancement to them, e.g. integrate with Volcano in k8s. Is that possible to do that without fork the whole spark project? For example, enable customized resource manager