Hi, All. BTW, based on the AS-IS feedbacks, I updated all open `correctness` and `dataloss` issues like the followings.
1. Raised the issue priority into `Blocker`. 2. Set the target version to `3.0.0`. It's a time to give more visibility to those issues in order to close or resolve. The remaining things are the followings: 1. Revisit `3.0.0`-only correctness patches? 2. Set the target version to `2.4.5`? (Specifically, is this feasible in terms of timeline?) Bests, Dongjoon. On Wed, Jan 22, 2020 at 9:43 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Hi, Tom. > > Then, along with the following, do you think we need to hold on 2.4.5 > release, too? > > > If it's really a correctness issue we should hold 3.0 for it. > > Recently, > > (1) 2.4.4 delivered 9 correctness patches. > (2) 2.4.5 RC1 aimed to deliver the following 9 correctness patches, > too. > > SPARK-29101 CSV datasource returns incorrect .count() from file > with malformed records > SPARK-30447 Constant propagation nullability issue > SPARK-29708 Different answers in aggregates of duplicate grouping > sets > SPARK-29651 Incorrect parsing of interval seconds fraction > SPARK-29918 RecordBinaryComparator should check endianness when > compared by long > SPARK-29042 Sampling-based RDD with unordered input should be > INDETERMINATE > SPARK-30082 Zeros are being treated as NaNs > SPARK-29743 sample should set needCopyResult to true if its child > is > SPARK-26985 Test "access only some column of the all of columns " > fails on big endian > > Without the official Apache Spark 2.4.5 binaries, > there is no official way to deliver the 9 correctness fixes in (2) to the > users. > In addition, usually, the correctness fixes are independent to each other. > > Bests, > Dongjoon. > > > On Wed, Jan 22, 2020 at 7:02 AM Tom Graves <tgraves...@yahoo.com> wrote: > >> I agree, I think we just need to go through all of them and individual >> assess each one. If it's really a correctness issue we should hold 3.0 for >> it. >> >> On the 2.4 release I didn't see an explanation on >> https://issues.apache.org/jira/browse/SPARK-26154 why it can't be back >> ported, I think in the very least we need that in each jira comment. >> >> spark-29701 looks more like compatibility with Postgres then a purely >> wrong answer to me, if Spark has been consistent about that it feels like >> it can wait for 3.0 but would be good to get others input and I'm not an >> expert on SQL standard and what do the other sql engines do in this case. >> >> Tom >> >> On Monday, January 20, 2020, 12:07:54 AM CST, Dongjoon Hyun < >> dongjoon.h...@gmail.com> wrote: >> >> >> Hi, All. >> >> According to our policy, "Correctness and data loss issues should be >> considered Blockers". >> >> - http://spark.apache.org/contributing.html >> >> Since we are close to branch-3.0 cut, >> I want to ask your opinions on the following correctness and data loss >> issues. >> >> SPARK-30218 Columns used in inequality conditions for joins not >> resolved correctly in case of common lineage >> SPARK-29701 Different answers when empty input given in GROUPING SETS >> SPARK-29699 Different answers in nested aggregates with window >> functions >> SPARK-29419 Seq.toDS / spark.createDataset(Seq) is not thread-safe >> SPARK-28125 dataframes created by randomSplit have overlapping rows >> SPARK-28067 Incorrect results in decimal aggregation with whole-stage >> code gen enabled >> SPARK-28024 Incorrect numeric values when out of range >> SPARK-27784 Alias ID reuse can break correctness when substituting >> foldable expressions >> SPARK-27619 MapType should be prohibited in hash expressions >> SPARK-27298 Dataset except operation gives different results(dataset >> count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment >> SPARK-27282 Spark incorrect results when using UNION with GROUP BY >> clause >> SPARK-27213 Unexpected results when filter is used after distinct >> SPARK-26836 Columns get switched in Spark SQL using Avro backed Hive >> table if schema evolves >> SPARK-25150 Joining DataFrames derived from the same source yields >> confusing/incorrect results >> SPARK-21774 The rule PromoteStrings cast string to a wrong data type >> SPARK-19248 Regex_replace works in 1.6 but not in 2.0 >> >> Some of them are targeted on 3.0.0, but the others are not. >> Although we will work on them until 3.0.0, >> I'm not sure we can reach a status with no known correctness and data >> loss issue. >> >> How do you think about the above issues? >> >> Bests, >> Dongjoon. >> >