quick update from my end: SPARK-24433 (SparkR/k8s) depends on SPARK-25087 (move builds to ubuntu)
SPARK-23874 (arrow -> 0.10.0) now depends on SPARK-25079 (python 3.5 upgrade) both SPARK-25087 and SPARK-25079 are in progress and i'm very very hesitant to do these upgrades before the code freeze/branch cut. i've done a TON of testing, but even as of yesterday afternoon, i'm still uncovering bugs and things that need fixing both on the infrastructure side and spark itself. h/t sean owen for helping out on SPARK-24950 On Wed, Aug 8, 2018 at 10:51 AM, Mark Hamstra <m...@clearstorydata.com> wrote: > I'm inclined to agree. Just saying that it is not a regression doesn't > really cut it when it is a now known data correctness issue. We need > something a lot more than nothing before releasing 2.4.0. At a barest > minimum, that has to be much more complete and publicly highlighted > documentation of the issue so that users are less likely to stumble into > this unaware; but really we need to fix at least the most common cases of > this bug. Backports to maintenance branches are also probably in order. > > On Wed, Aug 8, 2018 at 7:06 AM Imran Rashid <iras...@cloudera.com.invalid> > wrote: > >> On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >>> >>> SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>: >>> Shuffle+Repartition >>> on an RDD could lead to incorrect answers >>> It turns out to be a very complicated issue, there is no consensus about >>> what is the right fix yet. Likely to miss it in Spark 2.4 because it's a >>> long-standing issue, not a regression. >>> >> >> This is a really serious data loss bug. Yes its very complex, but we >> absolutely have to fix this, I really think it should be in 2.4. >> Has worked on it stopped? >> > -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu