Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21699 I merged the `master` branch to my branch `pivot-column`, and the changes break my test. It seems recent changes in pivoting introduced a correctness bug. See the test https://github.com/apache/spark/pull/21699/files#diff-50aa7d3b7b7934a7df6f414396e74c3cR271 . Here is the result without pivots: ``` val df = trainingSales .groupBy($"sales.year", lower($"sales.course")) .agg(sum($"sales.earnings")) df.show(false) ``` ``` +----+-------------------+-------------------+ |year|lower(sales.course)|sum(sales.earnings)| +----+-------------------+-------------------+ |2012|java |20000.0 | |2012|dotnet |15000.0 | |2013|java |30000.0 | |2013|dotnet |48000.0 | +----+-------------------+-------------------+ ``` with pivoting: ``` val df = trainingSales .groupBy($"sales.year") .pivot(lower($"sales.course"), Seq("dotNet", "Java").map(_.toLowerCase)) .agg(sum($"sales.earnings")) df.show(false) ``` the result must be as the test expects: ``` +----+--------+-------+ |year|dotnet |java | +----+--------+-------+ |2012|15000.0 |20000.0| |2013|48000.0 |30000.0| +----+--------+-------+ ``` but the returned result for `dotnet` in `2012` is wrong: ``` +----+-------+-------+ |year|dotnet |java | +----+-------+-------+ |2012|5000.0 |20000.0| |2013|48000.0|30000.0| +----+-------+-------+ ``` @maryannxue Please, take a look at it. Maybe the bug was introduced by your recent changes. /cc @gatorsmile
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org