Re: code freeze and branch cut for Apache Spark 2.4

2018-09-05 Thread Hyukjin Kwon
Oops, one more - https://github.com/apache/spark/pull/6. I just read this thread. 2018년 9월 6일 (목) 오후 12:12, Sean Owen 님이 작성: > (I slipped https://github.com/apache/spark/pull/22340 in for Scala 2.12. > Maybe it really is the last one. In any event, yes go ahead with a 2.4 RC) > > On Wed, Sep

Re: code freeze and branch cut for Apache Spark 2.4

2018-09-05 Thread Sean Owen
(I slipped https://github.com/apache/spark/pull/22340 in for Scala 2.12. Maybe it really is the last one. In any event, yes go ahead with a 2.4 RC) On Wed, Sep 5, 2018 at 8:14 PM Wenchen Fan wrote: > The repartition correctness bug fix is merged. The Scala 2.12 PRs > mentioned in this thread

Re: code freeze and branch cut for Apache Spark 2.4

2018-09-05 Thread Wenchen Fan
tps://github.com/apache/spark/pull/22308 > > https://github.com/apache/spark/pull/22310 > > > These two might be the last fixes for Scala 2.12 :) > > > Please review. > > 原始邮件 > *发件人:* Sean Owen > *收件人:* antonkulaga > *抄送:* dev > *发送时间:* 2018年8月31日(周五) 05:00 > *主题:*

Re: code freeze and branch cut for Apache Spark 2.4

2018-09-01 Thread sadhen
freeze and branch cut for Apache Spark 2.4 I know it's famous last words, but we really might be down to the last fix:https://github.com/apache/spark/pull/22264More a question of making tests happy at this point I think than fundamental problems. My goal is to make sure we can release a usable

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-30 Thread shane knapp
+1 on beta support for scala 2.12 On Thu, Aug 30, 2018 at 2:33 PM, Stavros Kontopoulos < stavros.kontopou...@lightbend.com> wrote: > +1 that would be great Sean, also you put a lot of effort in there, would > make sense to wait a bit. > > Stavros > > On Fri, Aug 31, 2018 at 12:00 AM, Sean Owen

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-30 Thread Stavros Kontopoulos
+1 that would be great Sean, also you put a lot of effort in there, would make sense to wait a bit. Stavros On Fri, Aug 31, 2018 at 12:00 AM, Sean Owen wrote: > I know it's famous last words, but we really might be down to the last > fix: https://github.com/apache/spark/pull/22264 More a

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-30 Thread Sean Owen
I know it's famous last words, but we really might be down to the last fix: https://github.com/apache/spark/pull/22264 More a question of making tests happy at this point I think than fundamental problems. My goal is to make sure we can release a usable, but beta-quality, 2.12 release of Spark in

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-30 Thread Reynold Xin
Let's see how they go. At some point we do need to cut the release. That argument can be made on every feature, and different people place different value / importance on different features, so we could just end up never making a release. On Thu, Aug 30, 2018 at 1:56 PM antonkulaga wrote: >

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-29 Thread Wenchen Fan
A few updates on this thread: We still have a blocking issue, the repartition correctness bug: https://github.com/apache/spark/pull/22112 It's close to merging. There are a few PRs to fix Scala 2.12 issues. I think they will keep coming up and we don't need to block Spark 2.4 on this. All other

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-13 Thread Xingbo Jiang
I'm working on the fix of SPARK-23243 and should be able push another commit in 1~2 days. More detailed discussions can go to the PR. Thanks for pushing this issue forward! I really appreciate efforts by submit PRs or involve in the discussions

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-13 Thread Tom Graves
I agree with Imran, we need to fix SPARK-23243 and any correctness issues for that matter. Tom On Wednesday, August 8, 2018, 9:06:43 AM CDT, Imran Rashid wrote: On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan wrote: SPARK-23243: Shuffle+Repartition on an RDD could lead to incorrect

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-11 Thread Petar Zečević
Hi, I made some changes to SPARK-24020 (https://github.com/apache/spark/pull/21109) and implemented spill-over to disk. I believe there are no objections to the implementation left and that this can now be merged. Please take a look. Thanks, Petar Zečević Wenchen Fan @ 1970-01-01 01:00

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp
> > > I also think it's a good idea to test against newer Python versions. But I > don't know how difficult it is and whether or not it's feasible to resolve > that between branch cut and RC cut. > > unless someone pops in to this thread and tells me w/o a doubt that all spark branches will

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread Li Jin
I agree with Byran. If it's acceptable to have another job to test with Python 3.5 and pyarrow 0.10.0, I am leaning towards upgrading arrow. Arrow 0.10.0 has tons of bug fixes and improves from 0.8.0, including important memory leak fixes such as https://issues.apache.org/jira/browse/ARROW-1973.

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp
python 3.5/pyarrow 0.10.0 build: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/ On Fri, Aug 10, 2018 at 10:44 AM, shane knapp wrote: > see:

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp
see: https://github.com/apache/spark/pull/21939#issuecomment-412154343 yes, i can set up a build. have some Qs in the PR about building the spark package before running the python tests. On Fri, Aug 10, 2018 at 10:41 AM, Bryan Cutler wrote: > I agree that we should hold off on the Arrow

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread Bryan Cutler
I agree that we should hold off on the Arrow upgrade if it requires major changes to our testing. I did have another thought that maybe we could just add another job to test against Python 3.5 and pyarrow 0.10.0 and keep all current testing the same? I'm not sure how doable that is right now and

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp
On Fri, Aug 10, 2018 at 9:47 AM, Wenchen Fan wrote: > It seems safer to skip the arrow 0.10.0 upgrade for Spark 2.4 and leave it > to Spark 3.0, so that we have more time to test. Any objections? > none here. -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread Wenchen Fan
It seems safer to skip the arrow 0.10.0 upgrade for Spark 2.4 and leave it to Spark 3.0, so that we have more time to test. Any objections? On Fri, Aug 10, 2018 at 11:53 PM shane knapp wrote: > quick update from my end: > > SPARK-24433 (SparkR/k8s) depends on SPARK-25087 (move builds to ubuntu)

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread shane knapp
quick update from my end: SPARK-24433 (SparkR/k8s) depends on SPARK-25087 (move builds to ubuntu) SPARK-23874 (arrow -> 0.10.0) now depends on SPARK-25079 (python 3.5 upgrade) both SPARK-25087 and SPARK-25079 are in progress and i'm very very hesitant to do these upgrades before the code

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-08 Thread Mark Hamstra
I'm inclined to agree. Just saying that it is not a regression doesn't really cut it when it is a now known data correctness issue. We need something a lot more than nothing before releasing 2.4.0. At a barest minimum, that has to be much more complete and publicly highlighted documentation of the

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-08 Thread Imran Rashid
On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan wrote: > > SPARK-23243 : > Shuffle+Repartition > on an RDD could lead to incorrect answers > It turns out to be a very complicated issue, there is no consensus about > what is the right fix yet. Likely

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread John Zhuge
+1 on SPARK-25004. We have found it quite useful to diagnose PySpark OOM. On Tue, Aug 7, 2018 at 1:21 PM Holden Karau wrote: > I'd like to suggest we consider SPARK-25004 (hopefully it goes in soon), > but solving some of the consistent Python memory issues we've had for years > would be

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread Holden Karau
I'd like to suggest we consider SPARK-25004 (hopefully it goes in soon), but solving some of the consistent Python memory issues we've had for years would be really amazing to get in. On Tue, Aug 7, 2018 at 1:07 PM, Tom Graves wrote: > I would like to get clarification on our avro

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread Tom Graves
I would like to get clarification on our avro compatibility story before the release.  anyone interested please look at -  https://issues.apache.org/jira/browse/SPARK-24924 . I probably should have filed a separate jira and can if we don't resolve via discussion there. Tom  On Tuesday,

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread shane knapp
> > According to the status, I think we should wait a few more days. Any > objections? > > none here. i'm also pretty certain that waiting until after the code freeze to start testing the GHPRB on ubuntu is the wisest course of action for us. shane -- Shane Knapp UC Berkeley EECS Research /

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread Wenchen Fan
Some updates for the JIRA tickets that we want to resolve before Spark 2.4. green: merged orange: in progress red: likely to miss SPARK-24374 : Support Barrier Execution Mode in Apache Spark The core functionality is finished, but we still need

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-06 Thread Sean Owen
... and we still have a few snags with Scala 2.12 support at https://issues.apache.org/jira/browse/SPARK-25029 There is some hope of resolving it on the order of a week, so for the moment, seems worth holding 2.4 for. On Mon, Aug 6, 2018 at 2:37 PM Bryan Cutler wrote: > Hi All, > > I'd like to

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-06 Thread Bryan Cutler
Hi All, I'd like to request a few days extension to the code freeze to complete the upgrade to Apache Arrow 0.10.0, SPARK-23874. This upgrade includes several key improvements and bug fixes. The RC vote just passed this morning and code changes are complete in

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-01 Thread shane knapp
++ssuchter (who kindly set up the initial k8s builds while i hammered on the backend) while i'm pretty confident (read: 99%) that the pull request builds will work on the new ubuntu workers: 1) i'd like to do more stress testing of other spark builds (in progress) 2) i'd like to reimage more

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-01 Thread Erik Erlandson
The PR for SparkR support on the kube back-end is completed, but waiting for Shane to make some tweaks to the CI machinery for full testing support. If the code freeze is being delayed, this PR could be merged as well. On Fri, Jul 6, 2018 at 9:47 AM, Reynold Xin wrote: > FYI 6 mo is coming up

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-01 Thread Erik Erlandson
I agree that looking at it from the pov of "code paths where isBarrier tests were introduced" seems right. >From pr-21758 (the one already merged) there are 13 files touched under core/src/main/scala/org/apache/spark/scheduler/, although most of

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-01 Thread Imran Rashid
I still would like to do more review on barrier mode changes, but from what I've seen so far I agree. I dunno if it'll really be ready for use, but it should not pose much risk for code which doesn't touch the new features. of course, every change has some risk, especially in the scheduler which

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-01 Thread Xingbo Jiang
Speaking of the code from hydrogen PRs, actually we didn't remove any of the existing logic, and I tried my best to hide almost all of the newly added logic behind a `isBarrier` tag (or something similar). I have to add some new variables and new methods to the core code paths, but I think they

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-01 Thread Xiangrui Meng
Sorry for late response on Hydrogen discussions! I was traveling last week. On Tue, Jul 31, 2018 at 1:20 PM Reynold Xin wrote: > I actually totally agree that we should make sure it should have no impact > on existing code if the feature is not used. > > > On Tue, Jul 31, 2018 at 1:18 PM Erik

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Imran Rashid
I'd like to add SPARK-24296, replicating large blocks over 2GB. Its been up for review for a while, and would end the 2GB block limit (well ... subject to a couple of caveats on SPARK-6235). On Mon, Jul 30, 2018 at 9:01 PM, Wenchen Fan wrote: > I went through the open JIRA tickets and here is

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Reynold Xin
I actually totally agree that we should make sure it should have no impact on existing code if the feature is not used. On Tue, Jul 31, 2018 at 1:18 PM Erik Erlandson wrote: > I don't have a comprehensive knowledge of the project hydrogen PRs, > however I've perused them, and they make

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Erik Erlandson
I don't have a comprehensive knowledge of the project hydrogen PRs, however I've perused them, and they make substantial modifications to Spark's core DAG scheduler code. What I'm wondering is: how high is the confidence level that the "traditional" code paths are still stable. Put another way,

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Mark Hamstra
No reasonable amount of time is likely going to be sufficient to fully vet the code as a PR. I'm not entirely happy with the design and code as they currently are (and I'm still trying to find the time to more publicly express my thoughts and concerns), but I'm fine with them going into 2.4 much

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Erik Erlandson
Barrier mode seems like a high impact feature on Spark's core code: is one additional week enough time to properly vet this feature? On Tue, Jul 31, 2018 at 7:10 AM, Joseph Torres wrote: > Full continuous processing aggregation support ran into unanticipated > scalability and scheduling

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Joseph Torres
Full continuous processing aggregation support ran into unanticipated scalability and scheduling problems. We’re planning to overcome those by using some of the barrier execution machinery, but since barrier execution itself is still in progress the full support isn’t going to make it into 2.4.

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Stavros Kontopoulos
I have a PR out for SPARK-14540 (Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner). This should allows us to add support for Scala 2.12, I think we can resolve this long standing issue with 2.4. Best, Stavros On Tue, Jul 31, 2018 at 4:07 PM, Tomasz Gawęda wrote: > Hi, > > what

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Tomasz Gawęda
Hi, what is the status of Continuous Processing + Aggregations? As far as I remember, Jose Torres said it should  be easy to perform aggregations if coalesce(1) work. IIRC it's already merged to master. Is this work in progress? If yes, it would be great to have full aggregation/join support

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Petar Zečević
This one is important to us: https://issues.apache.org/jira/browse/SPARK-24020 (Sort-merge join inner range optimization) but I think it could be useful to others too. It is finished and is ready to be merged (was ready a month ago at least). Do you think you could consider including it in

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Marco Gaido
Hi Wenchen, I think it would be great to consider also - SPARK-24598 : Datatype overflow conditions gives incorrect result As it is a correctness bug. What do you think? Thanks, Marco 2018-07-31 4:01 GMT+02:00 Wenchen Fan : > I went through

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Wenchen Fan
I went through the open JIRA tickets and here is a list that we should consider for Spark 2.4: *High Priority*: SPARK-24374 : Support Barrier Execution Mode in Apache Spark This one is critical to the Spark ecosystem for deep learning. It only

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Sean Owen
In theory releases happen on a time-based cadence, so it's pretty much wrap up what's ready by the code freeze and ship it. In practice, the cadence slips frequently, and it's very much a negotiation about what features should push the code freeze out a few weeks every time. So, kind of a hybrid

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Tom Graves
Shouldn't this be a discuss thread?   I'm also happy to see more release managers and agree the time is getting close, but we should see what features are in progress and see how close things are and propose a date based on that.  Cutting a branch to soon just creates more work for committers

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-29 Thread Holden Karau
I’m excited to have more folks rotate through release manager :) On Sun, Jul 29, 2018 at 3:57 PM Stavros Kontopoulos < stavros.kontopou...@lightbend.com> wrote: > +1. That would great! > > Thanks, > Stavros > > On Sun, Jul 29, 2018 at 5:05 PM, Wenchen Fan wrote: > >> If no one objects, how

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-29 Thread Stavros Kontopoulos
+1. That would great! Thanks, Stavros On Sun, Jul 29, 2018 at 5:05 PM, Wenchen Fan wrote: > If no one objects, how about we make the code freeze one week later(Aug > 8th)? > > BTW I'd like to volunteer to serve as the release manager for Spark 2.4. > I'm familiar with most of the major

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-29 Thread Wenchen Fan
If no one objects, how about we make the code freeze one week later(Aug 8th)? BTW I'd like to volunteer to serve as the release manager for Spark 2.4. I'm familiar with most of the major features targeted for the 2.4 release. I also have a lot of free time during this release timeframe and should

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-27 Thread Stavros Kontopoulos
Extending code freeze date would be great for me too, I am working on a PR for supporting scala 2.12, I am close but need some more time. We could get it into 2.4. Stavros On Fri, Jul 27, 2018 at 9:27 AM, Wenchen Fan wrote: > This seems fine to me. > > BTW Ryan Blue and I are working on some

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-27 Thread Wenchen Fan
This seems fine to me. BTW Ryan Blue and I are working on some data source v2 stuff and hopefully we can get more things done with one more week. Thanks, Wenchen On Thu, Jul 26, 2018 at 1:14 PM Xingbo Jiang wrote: > Xiangrui and I are leading an effort to implement a highly desirable >

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-25 Thread Xingbo Jiang
Xiangrui and I are leading an effort to implement a highly desirable feature, Barrier Execution Mode. https://issues.apache.org/jira/browse/SPARK-24374. This introduces a new scheduling model to Apache Spark so users can properly embed distributed DL training as a Spark stage to simplify the

code freeze and branch cut for Apache Spark 2.4

2018-07-06 Thread Reynold Xin
FYI 6 mo is coming up soon since the last release. We will cut the branch and code freeze on Aug 1st in order to get 2.4 out on time.