[jira] [Commented] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join

2023-05-30 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727779#comment-17727779
 ] 

Snoot.io commented on SPARK-36612:
--

User 'szehon-ho' has created a pull request for this issue:
https://github.com/apache/spark/pull/41398

> Support left outer join build left or right outer join build right in 
> shuffled hash join
> 
>
> Key: SPARK-36612
> URL: https://issues.apache.org/jira/browse/SPARK-36612
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: mcdull_zhang
>Priority: Major
>
> Currently spark sql does not support build left side when left outer join (or 
> build right side when right outer join).
> However, in our production environment, there are a large number of scenarios 
> where small tables are left join large tables, and many times, large tables 
> have data skew (currently AQE can't handle this kind of skew).
> Inspired by SPARK-32399, we can use similar ideas to realize left outer join 
> build left.
> I think this treatment is very meaningful, but I don’t know how members 
> consider this matter?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join

2021-08-31 Thread Cheng Su (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407823#comment-17407823
 ] 

Cheng Su commented on SPARK-36612:
--

I agree some queries do fit in this scenario. We can save the sort before join 
for these queries if we are able to do shuffled hash join on it, instead of 
sort merge join.

I don't think it solves the AQE skew problem though. We still cannot split the 
skewed partition from the right side of LEFT OUTER join, because across 
multiple tasks, they don't have common knowledge of which rows are matched or 
not during runtime.

> Support left outer join build left or right outer join build right in 
> shuffled hash join
> 
>
> Key: SPARK-36612
> URL: https://issues.apache.org/jira/browse/SPARK-36612
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: mcdull_zhang
>Priority: Major
>
> Currently spark sql does not support build left side when left outer join (or 
> build right side when right outer join).
> However, in our production environment, there are a large number of scenarios 
> where small tables are left join large tables, and many times, large tables 
> have data skew (currently AQE can't handle this kind of skew).
> Inspired by SPARK-32399, we can use similar ideas to realize left outer join 
> build left.
> I think this treatment is very meaningful, but I don’t know how members 
> consider this matter?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join

2021-08-31 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407809#comment-17407809
 ] 

Hyukjin Kwon commented on SPARK-36612:
--

cc [~chengsu] FYI

> Support left outer join build left or right outer join build right in 
> shuffled hash join
> 
>
> Key: SPARK-36612
> URL: https://issues.apache.org/jira/browse/SPARK-36612
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: mcdull_zhang
>Priority: Major
>
> Currently spark sql does not support build left side when left outer join (or 
> build right side when right outer join).
> However, in our production environment, there are a large number of scenarios 
> where small tables are left join large tables, and many times, large tables 
> have data skew (currently AQE can't handle this kind of skew).
> Inspired by SPARK-32399, we can use similar ideas to realize left outer join 
> build left.
> I think this treatment is very meaningful, but I don’t know how members 
> consider this matter?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org