[
https://issues.apache.org/jira/browse/SPARK-52498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-52498:
-----------------------------------
Labels: SQL pull-request-available (was: SQL)
> The self joins behaviour is broken and inconsistent in general and
> different between single pass resolver and regular resolver
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-52498
> URL: https://issues.apache.org/jira/browse/SPARK-52498
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Asif
> Priority: Major
> Labels: SQL, pull-request-available
>
> As described in previous bug
> [SPARK-47320|https://issues.apache.org/jira/projects/SPARK/issues/SPARK-47320],
> the problems with the regular analyzer while dealing with self joins,
> this bug highlights the inconsistent behaviour between regular analyzer and
> single pass analyzer.
> There is an existing test in ExpressionIdAssignerSuite
> {{test("DataFrame Join, same table, several layers") {
> withTable("t1") {
> spark.sql("CREATE TABLE t1 (col1 INT, col2 INT, col3 INT)")
> val result = withSQLConf(
> SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "true",
> SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "false"
> ) {
> val df1 = spark.sql("SELECT col1, 1 AS a, col2, 2 AS b, col3, 3 AS c
> FROM t1")
> val df2 = df1
> .join(df1, df1("col1") === 0)
> .select(df1("col1"), df1("a"), df1("col2"), df1("b"), df1("col3"),
> df1("c"))
> val df3 = df2
> .join(df2, df2("col1") === 0)
> .select(df2("col1"), df2("a"), df2("col2"), df2("b"), df2("col3"),
> df2("c"))
> df3
> .join(df3, df3("col1") === 0)
> .select(df3("col1"), df3("a"), df3("col2"), df3("b"), df3("col3"),
> df3("c"))
> }
> checkExpressionIdAssignment(result.queryExecution.analyzed)
> }
> }}}
> The above test also passes when SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key
> -> "true"
> But the tests fail in both the combinations:
> Combination1
> SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "false"
> SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true"
> Combination2
> SQLConf.ANALYZER_SINGLE_PASS_RESOLVER_ENABLED.key -> "false"
> SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "false"
> Ideally the test should fail if SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key
> -> "true", in both resolvers ( single pass and original)
> and should pass if SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "false" ,
> in case of both the resolvers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]