[jira] [Commented] (SPARK-42503) Spark SQL should do further validation on join condition fields

2023-02-20 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691391#comment-17691391
 ] 

ming95 commented on SPARK-42503:


I test hive and mysql , they also do not have this validation . But I still 
think this restriction should be added, because the condition of non-left and 
right tables in join is meaningless.

> Spark SQL should do further validation on join condition fields
> ---
>
> Key: SPARK-42503
> URL: https://issues.apache.org/jira/browse/SPARK-42503
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
>
> In Spark SQL, the conditions for the join use fields that are allowed to be 
> fields from a non-left table or a non-right table. In this case, the join 
> will degenerate into a cross join. 
> Suppose you have two tables, test1 and test2, which have the same table 
> schema:
> {code:java}
> ```
> CREATE TABLE `default`.`test1` (
>   `id` INT,
>   `name` STRING,
>   `age` INT,
>   `dt` STRING)
> USING parquet
> PARTITIONED BY (dt)
> ```{code}
> The following SQL has three joins, but in the last left join, the conditions 
> is `t1.name=t2.name`, and t3.name is not used. So the last left join will be 
> cross join. 
> {code:java}
> ```
> select *
> from 
> (select * from test1 where dt="20230215"  and age=1 ) t1 
> left join
> (select * from test1 where dt=="20230215"  and age=2) t2
> on t1.name=t2.name
> left join
> (select * from test2 where dt="20230215") t3
> on
> t1.name=t2.name;
> ```{code}
> So i think Spark SQL should do further validation on join condition, the 
> fields of join condition must be a left table or right table field , 
> otherwise it is thrown `AnalysisException`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42503) Spark SQL should do further validation on join condition fields

2023-02-20 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691321#comment-17691321
 ] 

Yuming Wang commented on SPARK-42503:
-

Do other databases also have this validation?

> Spark SQL should do further validation on join condition fields
> ---
>
> Key: SPARK-42503
> URL: https://issues.apache.org/jira/browse/SPARK-42503
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
>
> In Spark SQL, the conditions for the join use fields that are allowed to be 
> fields from a non-left table or a non-right table. In this case, the join 
> will degenerate into a cross join. 
> Suppose you have two tables, test1 and test2, which have the same table 
> schema:
> {code:java}
> ```
> CREATE TABLE `default`.`test1` (
>   `id` INT,
>   `name` STRING,
>   `age` INT,
>   `dt` STRING)
> USING parquet
> PARTITIONED BY (dt)
> ```{code}
> The following SQL has three joins, but in the last left join, the conditions 
> is `t1.name=t2.name`, and t3.name is not used. So the last left join will be 
> cross join. 
> {code:java}
> ```
> select *
> from 
> (select * from test1 where dt="20230215"  and age=1 ) t1 
> left join
> (select * from test1 where dt=="20230215"  and age=2) t2
> on t1.name=t2.name
> left join
> (select * from test2 where dt="20230215") t3
> on
> t1.name=t2.name;
> ```{code}
> So i think Spark SQL should do further validation on join condition, the 
> fields of join condition must be a left table or right table field , 
> otherwise it is thrown `AnalysisException`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42503) Spark SQL should do further validation on join condition fields

2023-02-20 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691235#comment-17691235
 ] 

ming95 commented on SPARK-42503:


[~yumwang] [~gurwls223] cc

> Spark SQL should do further validation on join condition fields
> ---
>
> Key: SPARK-42503
> URL: https://issues.apache.org/jira/browse/SPARK-42503
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: ming95
>Priority: Major
> Fix For: 3.4.0
>
>
> In Spark SQL, the conditions for the join use fields that are allowed to be 
> fields from a non-left table or a non-right table. In this case, the join 
> will degenerate into a cross join. 
> Suppose you have two tables, test1 and test2, which have the same table 
> schema:
> ```
> CREATE TABLE `default`.`test1` (
>   `id` INT,
>   `name` STRING,
>   `age` INT,
>   `dt` STRING)
> USING parquet
> PARTITIONED BY (dt)
> ```
> The following SQL has three joins, but in the last left join, the conditions 
> is `t1.name=t2.name`, and t3.name is not used. So the last left join will be 
> cross join. 
> ```
> select *
> from 
> (select * from test1 where dt="20230215"  and age=1 ) t1 
> left join
> (select * from test1 where dt=="20230215"  and age=2) t2
> on t1.name=t2.name
> left join
> (select * from test2 where dt="20230215") t3
> on
> t1.name=t2.name;
> ```
> So i think Spark SQL should do further validation on join condition, the 
> fields of join condition must be a left table or right table field , 
> otherwise it is thrown `AnalysisException`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org