Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-22 Thread Hyukjin Kwon
There's another PR open to expose this more publicity in Python side ( https://github.com/apache/spark/pull/27331). To sum up, I would like to make sure we know these below: - Is this expression only for partition or do we plan to expose this to replace other existing expressions as some kind of

Re: Adding Maven Central mirror from Google to the build?

2020-01-22 Thread Tom Graves
+1 for proposal. Tom On Tuesday, January 21, 2020, 04:37:04 PM CST, Sean Owen wrote: See https://github.com/apache/spark/pull/27307 for some context. We've had to add, in at least one place, some settings to resolve artifacts from a mirror besides Maven Central to work around some

[Dataset API] SPARK-27249

2020-01-22 Thread Nick Afshartous
Hello, I'm looking into starting work on this ticket https://issues.apache.org/jira/browse/SPARK-27249 which involves adding an API for transforming Datasets. In the comments I have a question about whether or not this ticket is still necessary. Could someone please review and advise.

Unsubscribe

2020-01-22 Thread sadhana avasarala
From: Dongjoon Hyun Date: Wednesday, January 22, 2020 at 1:57 AM To: Wenchen Fan Cc: dev Subject: Re: Correctness and data loss issues Thank you for checking, Wenchen! Sure, we need to do that. Another question is "What can we do for 2.4.5 release"? Some of the fixes cannot be

Re: Correctness and data loss issues

2020-01-22 Thread Tom Graves
I agree, I think we just need to go through all of them and individual assess each one. If it's really a correctness issue we should hold 3.0 for it. On the 2.4 release I didn't see an explanation on   https://issues.apache.org/jira/browse/SPARK-26154 why it can't be back ported, I think in the

Re: Correctness and data loss issues

2020-01-22 Thread Dongjoon Hyun
Hi, Tom. Then, along with the following, do you think we need to hold on 2.4.5 release, too? > If it's really a correctness issue we should hold 3.0 for it. Recently, (1) 2.4.4 delivered 9 correctness patches. (2) 2.4.5 RC1 aimed to deliver the following 9 correctness patches, too.

Re: Correctness and data loss issues

2020-01-22 Thread Tom Graves
My thoughts on your list, would be good to get people who worked on these issues input. Obviously we can weigh the importance of these vs getting 2.4.5 out that has a bunch of other correctness fixes you mention as well.  I think you have already pinged on most of the jira to get feedback.  

Re: Correctness and data loss issues

2020-01-22 Thread Dongjoon Hyun
Hi, All. BTW, based on the AS-IS feedbacks, I updated all open `correctness` and `dataloss` issues like the followings. 1. Raised the issue priority into `Blocker`. 2. Set the target version to `3.0.0`. It's a time to give more visibility to those issues in order to close or resolve.