zzzzming95 created SPARK-42503:
----------------------------------

             Summary: Spark SQL should do further validation on join condition 
fields
                 Key: SPARK-42503
                 URL: https://issues.apache.org/jira/browse/SPARK-42503
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.2
            Reporter: zzzzming95
             Fix For: 3.4.0


In Spark SQL, the conditions for the join use fields that are allowed to be 
fields from a non-left table or a non-right table. In this case, the join will 
degenerate into a cross join. 

Suppose you have two tables, test1 and test2, which have the same table schema:

```
CREATE TABLE `default`.`test1` (
  `id` INT,
  `name` STRING,
  `age` INT,
  `dt` STRING)
USING parquet
PARTITIONED BY (dt)
```

The following SQL has three joins, but in the last left join, the conditions is 
`t1.name=t2.name`, and t3.name is not used. So the last left join will be cross 
join. 
```
select *
from 
(select * from test1 where dt="20230215"  and age=1 ) t1 
left join
(select * from test1 where dt=="20230215"  and age=2) t2
on t1.name=t2.name
left join
(select * from test2 where dt="20230215") t3
on
t1.name=t2.name;
```

So i think Spark SQL should do further validation on join condition, the fields 
of join condition must be a left table or right table field , otherwise it is 
thrown `AnalysisException`.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to