zzzzming95 created SPARK-42503: ---------------------------------- Summary: Spark SQL should do further validation on join condition fields Key: SPARK-42503 URL: https://issues.apache.org/jira/browse/SPARK-42503 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.2 Reporter: zzzzming95 Fix For: 3.4.0
In Spark SQL, the conditions for the join use fields that are allowed to be fields from a non-left table or a non-right table. In this case, the join will degenerate into a cross join. Suppose you have two tables, test1 and test2, which have the same table schema: ``` CREATE TABLE `default`.`test1` ( `id` INT, `name` STRING, `age` INT, `dt` STRING) USING parquet PARTITIONED BY (dt) ``` The following SQL has three joins, but in the last left join, the conditions is `t1.name=t2.name`, and t3.name is not used. So the last left join will be cross join. ``` select * from (select * from test1 where dt="20230215" and age=1 ) t1 left join (select * from test1 where dt=="20230215" and age=2) t2 on t1.name=t2.name left join (select * from test2 where dt="20230215") t3 on t1.name=t2.name; ``` So i think Spark SQL should do further validation on join condition, the fields of join condition must be a left table or right table field , otherwise it is thrown `AnalysisException`. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org