I think we need to go through them during the 3.0 QA period, and try to fix
the valid ones.

For example, the first ticket should be fixed already in
https://issues.apache.org/jira/browse/SPARK-28344

On Mon, Jan 20, 2020 at 2:07 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Hi, All.
>
> According to our policy, "Correctness and data loss issues should be
> considered Blockers".
>
>     - http://spark.apache.org/contributing.html
>
> Since we are close to branch-3.0 cut,
> I want to ask your opinions on the following correctness and data loss
> issues.
>
>     SPARK-30218 Columns used in inequality conditions for joins not
> resolved correctly in case of common lineage
>     SPARK-29701 Different answers when empty input given in GROUPING SETS
>     SPARK-29699 Different answers in nested aggregates with window
> functions
>     SPARK-29419 Seq.toDS / spark.createDataset(Seq) is not thread-safe
>     SPARK-28125 dataframes created by randomSplit have overlapping rows
>     SPARK-28067 Incorrect results in decimal aggregation with whole-stage
> code gen enabled
>     SPARK-28024 Incorrect numeric values when out of range
>     SPARK-27784 Alias ID reuse can break correctness when substituting
> foldable expressions
>     SPARK-27619 MapType should be prohibited in hash expressions
>     SPARK-27298 Dataset except operation gives different results(dataset
> count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment
>     SPARK-27282 Spark incorrect results when using UNION with GROUP BY
> clause
>     SPARK-27213 Unexpected results when filter is used after distinct
>     SPARK-26836 Columns get switched in Spark SQL using Avro backed Hive
> table if schema evolves
>     SPARK-25150 Joining DataFrames derived from the same source yields
> confusing/incorrect results
>     SPARK-21774 The rule PromoteStrings cast string to a wrong data type
>     SPARK-19248 Regex_replace works in 1.6 but not in 2.0
>
> Some of them are targeted on 3.0.0, but the others are not.
> Although we will work on them until 3.0.0,
> I'm not sure we can reach a status with no known correctness and data loss
> issue.
>
> How do you think about the above issues?
>
> Bests,
> Dongjoon.
>

Reply via email to