Github user peter-toth commented on a diff in the pull request: https://github.com/apache/spark/pull/22318#discussion_r214666748 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala --- @@ -295,4 +295,17 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { df.join(df, df("id") <=> df("id")).queryExecution.optimizedPlan } } + + test("SPARK-25150: Attribute deduplication handles attributes in join condition properly") { + withSQLConf(SQLConf.CROSS_JOINS_ENABLED.key -> "false") { + val a = spark.range(1, 5) + val b = spark.range(10) + val c = b.filter($"id" % 2 === 0) + + val r = a.join(b, a("id") === b("id"), "inner").join(c, a("id") === c("id"), "inner") --- End diff -- I think we do need `a` here. If we dropped `a` and the test would become like: ```scala val b = spark.range(1, 5) val c = b.filter($"id" % 2 === 0) val r = b.join(c, b("id") === c("id"), "inner") checkAnswer(r, Row(2, 2) :: Row(4, 4) :: Nil) ``` then the test would pass even without the fix. This is because we have a special case to handle {{id = id}} like conditions in case of EqualTo and EqualNullSafe in Dataset. My fix comes into play in some other cases where there is an AttributeReference change in the right side of a join due to deduplication.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org