Re: Automatic PR labeling

2020-04-01 Thread Hyukjin Kwon
@Nicholas Chammas Would you be interested in tacking a look? I would love this to be done. 2020년 3월 25일 (수) 오전 10:30, Hyukjin Kwon 님이 작성: > That should be cool. There were a bit of discussions about which account > should label. If we can replace it, I think it sounds great! > > 2020년 3월 25일

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Jungtaek Lim
I didn't point out actual case "intentionally", because I want to avoid unnecessary debate and make sure we don't decide with bias. Note that the context would include people. I have been seen these requests consistently (at least consistently for 1, but I feel I also saw 2 more than couple of

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Nicholas Chammas
Probably the discussion here about Improvement Jira tickets and the "Affects Version" field: https://github.com/apache/spark/pull/27534#issuecomment-588416416 On Wed, Apr 1, 2020 at 9:59 PM Hyukjin Kwon wrote: > > 2) check with older versions to fill up affects version for bug > I don't agree

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Hyukjin Kwon
> 2) check with older versions to fill up affects version for bug I don't agree with this in general. To me usually it's "For the type of bug, assign one valid version" instead. > The only place where I can see some amount of investigation being required would be for security issues or

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Mridul Muralidharan
I agree with what Sean detailed. The only place where I can see some amount of investigation being required would be for security issues or correctness issues. Knowing the affected versions, particularly if an earlier supported version does not have the bug, will help users understand the

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Sean Owen
I think we discussed this briefly on a PR. It's not as clear what it means for an Improvement to 'affect a version'. Certainly, an improvement to a feature introduced in 1.2.3 can't affect anything earlier, and implicitly affects everything after. It's not wrong to say it affects the latest

[DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Jungtaek Lim
Hi devs, I know we're busy with making Spark 3.0 be out, but I think the topic is good to discuss at any time and actually be better to be resolved sooner than later. In the page "Contributing to Spark", we describe the guide of "affects version" as "For Bugs, assign at least one version that is

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Ryan Blue
-1 (non-binding) I agree with Jungtaek. The change to create datasource tables instead of Hive tables by default (no USING or STORED AS clauses) has created confusing behavior and should either be rolled back or fixed before 3.0. On Wed, Apr 1, 2020 at 5:12 AM Sean Owen wrote: > Those are not

Re: Need to order iterator values in spark dataframe

2020-04-01 Thread Ranjan, Abhinav
Enrico, The below solution works but there is a little glitch. It is working fine in spark-shell but failing for *_/skewed keys/_* while doing a spark-submit. while looking into the execution plan, the partitioning value is same for both repartition and groupByKey and is driven by the value

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Sean Owen
Those are not per se release blockers. They are (perhaps important) improvements to functionality. I don't know who is active and able to review that part of the code; I'd look for authors of changes in the surrounding code. The question here isn't so much what one would like to see in this

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Dr. Kent Yao
-1 Do not release this package because v3.0.0 is the 3rd major release since we added Spark On Kubernetes. Can we make it more production-ready as it has been experimental for more than 2 years? The main practical adoption of Spark on Kubernetes is to take on the role of other cluster

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Reynold Xin
The Apache Software Foundation requires voting before any release can be published. On Tue, Mar 31, 2020 at 11:27 PM, Stephen Coy < s...@infomedia.com.au.invalid > wrote: > > >> On 1 Apr 2020, at 5:20 pm, Sean Owen < srowen@ gmail. com ( >> sro...@gmail.com ) > wrote: >> >> It can be

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Stephen Coy
On 1 Apr 2020, at 5:20 pm, Sean Owen mailto:sro...@gmail.com>> wrote: It can be published as "3.0.0-rc1" but how do we test that to vote on it without some other RC1 RC1 I’m not sure what you mean by this question? This email contains confidential information of and is the copyright of

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Sean Owen
You just mvn -DskipTests install the source release. That is the primary artifact we're testing. But yes you could put the jars in your local repo too. I think this is pretty standard practice. Obviously the RC can't be published as "3.0.0". It can be published as "3.0.0-rc1" but how do we test