[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...

peter-toth Mon, 03 Sep 2018 05:02:22 -0700

Github user peter-toth commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22318#discussion_r214666748
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
    @@ -295,4 +295,17 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
           df.join(df, df("id") <=> df("id")).queryExecution.optimizedPlan
         }
       }
    +
    +  test("SPARK-25150: Attribute deduplication handles attributes in join 
condition properly") {
    +    withSQLConf(SQLConf.CROSS_JOINS_ENABLED.key -> "false") {
    +      val a = spark.range(1, 5)
    +      val b = spark.range(10)
    +      val c = b.filter($"id" % 2 === 0)
    +
    +      val r = a.join(b, a("id") === b("id"), "inner").join(c, a("id") === 
c("id"), "inner")
    --- End diff --
    
    I think we do need `a` here.
    If we dropped `a` and the test would become like:
    ```scala
        val b = spark.range(1, 5)
        val c = b.filter($"id" % 2 === 0)
        val r = b.join(c, b("id") === c("id"), "inner")
    
        checkAnswer(r, Row(2, 2) :: Row(4, 4) :: Nil)
    ```
    then the test would pass even without the fix. This is because we have a 
special case to handle {{id = id}} like conditions in case of EqualTo and 
EqualNullSafe in Dataset.
    
    My fix comes into play in some other cases where there is an 
AttributeReference change in the right side of a join due to deduplication.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22318: [SPARK-25150][SQL] Fix attribute deduplication in...

Reply via email to