quick update from my end:

SPARK-24433 (SparkR/k8s) depends on SPARK-25087 (move builds to ubuntu)

SPARK-23874 (arrow -> 0.10.0) now depends on SPARK-25079 (python 3.5
upgrade)

both SPARK-25087 and SPARK-25079 are in progress and i'm very very hesitant
to do these upgrades before the code freeze/branch cut.  i've done a TON of
testing, but even as of yesterday afternoon, i'm still uncovering bugs and
things that need fixing both on the infrastructure side and spark itself.

h/t sean owen for helping out on SPARK-24950

On Wed, Aug 8, 2018 at 10:51 AM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> I'm inclined to agree. Just saying that it is not a regression doesn't
> really cut it when it is a now known data correctness issue. We need
> something a lot more than nothing before releasing 2.4.0. At a barest
> minimum, that has to be much more complete and publicly highlighted
> documentation of the issue so that users are less likely to stumble into
> this unaware; but really we need to fix at least the most common cases of
> this bug. Backports to maintenance branches are also probably in order.
>
> On Wed, Aug 8, 2018 at 7:06 AM Imran Rashid <iras...@cloudera.com.invalid>
> wrote:
>
>> On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
>>>
>>> SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>: 
>>> Shuffle+Repartition
>>> on an RDD could lead to incorrect answers
>>> It turns out to be a very complicated issue, there is no consensus about
>>> what is the right fix yet. Likely to miss it in Spark 2.4 because it's a
>>> long-standing issue, not a regression.
>>>
>>
>> This is a really serious data loss bug.  Yes its very complex, but we
>> absolutely have to fix this, I really think it should be in 2.4.
>> Has worked on it stopped?
>>
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Reply via email to