Hi, All.
BTW, based on the AS-IS feedbacks,
I updated all open `correctness` and `dataloss` issues like the followings.
1. Raised the issue priority into `Blocker`.
2. Set the target version to `3.0.0`.
It's a time to give more visibility to those issues in order to close or
resolve.
The remaining things are the followings:
1. Revisit `3.0.0`-only correctness patches?
2. Set the target version to `2.4.5`? (Specifically, is this feasible
in terms of timeline?)
Bests,
Dongjoon.
On Wed, Jan 22, 2020 at 9:43 AM Dongjoon Hyun <[email protected]>
wrote:
> Hi, Tom.
>
> Then, along with the following, do you think we need to hold on 2.4.5
> release, too?
>
> > If it's really a correctness issue we should hold 3.0 for it.
>
> Recently,
>
> (1) 2.4.4 delivered 9 correctness patches.
> (2) 2.4.5 RC1 aimed to deliver the following 9 correctness patches,
> too.
>
> SPARK-29101 CSV datasource returns incorrect .count() from file
> with malformed records
> SPARK-30447 Constant propagation nullability issue
> SPARK-29708 Different answers in aggregates of duplicate grouping
> sets
> SPARK-29651 Incorrect parsing of interval seconds fraction
> SPARK-29918 RecordBinaryComparator should check endianness when
> compared by long
> SPARK-29042 Sampling-based RDD with unordered input should be
> INDETERMINATE
> SPARK-30082 Zeros are being treated as NaNs
> SPARK-29743 sample should set needCopyResult to true if its child
> is
> SPARK-26985 Test "access only some column of the all of columns "
> fails on big endian
>
> Without the official Apache Spark 2.4.5 binaries,
> there is no official way to deliver the 9 correctness fixes in (2) to the
> users.
> In addition, usually, the correctness fixes are independent to each other.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jan 22, 2020 at 7:02 AM Tom Graves <[email protected]> wrote:
>
>> I agree, I think we just need to go through all of them and individual
>> assess each one. If it's really a correctness issue we should hold 3.0 for
>> it.
>>
>> On the 2.4 release I didn't see an explanation on
>> https://issues.apache.org/jira/browse/SPARK-26154 why it can't be back
>> ported, I think in the very least we need that in each jira comment.
>>
>> spark-29701 looks more like compatibility with Postgres then a purely
>> wrong answer to me, if Spark has been consistent about that it feels like
>> it can wait for 3.0 but would be good to get others input and I'm not an
>> expert on SQL standard and what do the other sql engines do in this case.
>>
>> Tom
>>
>> On Monday, January 20, 2020, 12:07:54 AM CST, Dongjoon Hyun <
>> [email protected]> wrote:
>>
>>
>> Hi, All.
>>
>> According to our policy, "Correctness and data loss issues should be
>> considered Blockers".
>>
>> - http://spark.apache.org/contributing.html
>>
>> Since we are close to branch-3.0 cut,
>> I want to ask your opinions on the following correctness and data loss
>> issues.
>>
>> SPARK-30218 Columns used in inequality conditions for joins not
>> resolved correctly in case of common lineage
>> SPARK-29701 Different answers when empty input given in GROUPING SETS
>> SPARK-29699 Different answers in nested aggregates with window
>> functions
>> SPARK-29419 Seq.toDS / spark.createDataset(Seq) is not thread-safe
>> SPARK-28125 dataframes created by randomSplit have overlapping rows
>> SPARK-28067 Incorrect results in decimal aggregation with whole-stage
>> code gen enabled
>> SPARK-28024 Incorrect numeric values when out of range
>> SPARK-27784 Alias ID reuse can break correctness when substituting
>> foldable expressions
>> SPARK-27619 MapType should be prohibited in hash expressions
>> SPARK-27298 Dataset except operation gives different results(dataset
>> count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment
>> SPARK-27282 Spark incorrect results when using UNION with GROUP BY
>> clause
>> SPARK-27213 Unexpected results when filter is used after distinct
>> SPARK-26836 Columns get switched in Spark SQL using Avro backed Hive
>> table if schema evolves
>> SPARK-25150 Joining DataFrames derived from the same source yields
>> confusing/incorrect results
>> SPARK-21774 The rule PromoteStrings cast string to a wrong data type
>> SPARK-19248 Regex_replace works in 1.6 but not in 2.0
>>
>> Some of them are targeted on 3.0.0, but the others are not.
>> Although we will work on them until 3.0.0,
>> I'm not sure we can reach a status with no known correctness and data
>> loss issue.
>>
>> How do you think about the above issues?
>>
>> Bests,
>> Dongjoon.
>>
>