Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/21699
  
    I merged the `master` branch to my branch `pivot-column`, and the changes 
break my test. It seems recent changes in pivoting introduced a correctness 
bug. See the test 
https://github.com/apache/spark/pull/21699/files#diff-50aa7d3b7b7934a7df6f414396e74c3cR271
 . Here is the result without pivots:
    ```
    val df = trainingSales
      .groupBy($"sales.year", lower($"sales.course"))
      .agg(sum($"sales.earnings"))
    df.show(false)
    ```
    ```
    +----+-------------------+-------------------+
    |year|lower(sales.course)|sum(sales.earnings)|
    +----+-------------------+-------------------+
    |2012|java               |20000.0            |
    |2012|dotnet             |15000.0            |
    |2013|java               |30000.0            |
    |2013|dotnet             |48000.0            |
    +----+-------------------+-------------------+
    ```
    with pivoting:
    ```
    val df = trainingSales
      .groupBy($"sales.year")
      .pivot(lower($"sales.course"), Seq("dotNet", "Java").map(_.toLowerCase))
      .agg(sum($"sales.earnings"))
    df.show(false)
    ```
    the result must be as the test expects:
    ```
    +----+--------+-------+
    |year|dotnet  |java   |
    +----+--------+-------+
    |2012|15000.0 |20000.0|
    |2013|48000.0 |30000.0|
    +----+--------+-------+
    ```
    but the returned result for `dotnet` in `2012` is wrong:
    ```
    +----+-------+-------+
    |year|dotnet |java   |
    +----+-------+-------+
    |2012|5000.0 |20000.0|
    |2013|48000.0|30000.0|
    +----+-------+-------+
    ```
    @maryannxue Please, take a look at it. Maybe the bug was introduced by your 
recent changes. /cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to