Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174663562
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
 ---
    @@ -331,4 +330,37 @@ class ComplexTypesSuite extends PlanTest with 
ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 
0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // These tests must use nullable attributes from the base relation for 
the following reason:
    +    // in the 'original' plans below, the Aggregate node produced by 
groupBy() has a
    +    // nullable AttributeReference to a1, because both array indexing and 
map lookup are
    +    // nullable expressions. After optimization, the same attribute is now 
non-nullable,
    +    // but the AttributeReference is not updated to reflect this. In the 
'expected' plans,
    +    // the grouping expressions have the same nullability as the original 
attribute in the
    +    // relation. If that attribute is non-nullable, the tests will fail as 
the plans will
    +    // compare differently, so for these tests we must use a nullable 
attribute. See
    +    // SPARK-23634.
    +    val arrayRel = relation
    +      .select(GetArrayItem(CreateArray(Seq('nullable_id, 'nullable_id + 
1L)), 0) as "a1")
    +      .groupBy($"a1")("1").analyze
    +    val arrayExpected = relation.select('nullable_id as 
"a1").groupBy($"a1")("1").analyze
    +    comparePlans(Optimizer execute arrayRel, arrayExpected)
    +
    +    val mapRel = relation
    +      .select(GetMapValue(CreateMap(Seq("id", 'nullable_id)), "id") as 
"m1")
    +      .groupBy($"m1")("1").analyze
    +    val mapExpected = relation
    +      .select('nullable_id as "m1")
    +      .groupBy($"m1")("1").analyze
    +    comparePlans(Optimizer execute mapRel, mapExpected)
    --- End diff --
    
    @henryr .
    Could you add more test cases mentioned today, for example, like the 
following?
    ```
        val structRel = relation.groupBy(
          CreateNamedStruct(Seq("att1", 'nullable_id)))(
          GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, 
None)).analyze
        comparePlans(Optimizer execute structRel, structRel)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to