[jira] [Commented] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727779#comment-17727779 ] Snoot.io commented on SPARK-36612: -- User 'szehon-ho' has created a pull request for this issue: https://github.com/apache/spark/pull/41398 > Support left outer join build left or right outer join build right in > shuffled hash join > > > Key: SPARK-36612 > URL: https://issues.apache.org/jira/browse/SPARK-36612 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: mcdull_zhang >Priority: Major > > Currently spark sql does not support build left side when left outer join (or > build right side when right outer join). > However, in our production environment, there are a large number of scenarios > where small tables are left join large tables, and many times, large tables > have data skew (currently AQE can't handle this kind of skew). > Inspired by SPARK-32399, we can use similar ideas to realize left outer join > build left. > I think this treatment is very meaningful, but I don’t know how members > consider this matter? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407823#comment-17407823 ] Cheng Su commented on SPARK-36612: -- I agree some queries do fit in this scenario. We can save the sort before join for these queries if we are able to do shuffled hash join on it, instead of sort merge join. I don't think it solves the AQE skew problem though. We still cannot split the skewed partition from the right side of LEFT OUTER join, because across multiple tasks, they don't have common knowledge of which rows are matched or not during runtime. > Support left outer join build left or right outer join build right in > shuffled hash join > > > Key: SPARK-36612 > URL: https://issues.apache.org/jira/browse/SPARK-36612 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: mcdull_zhang >Priority: Major > > Currently spark sql does not support build left side when left outer join (or > build right side when right outer join). > However, in our production environment, there are a large number of scenarios > where small tables are left join large tables, and many times, large tables > have data skew (currently AQE can't handle this kind of skew). > Inspired by SPARK-32399, we can use similar ideas to realize left outer join > build left. > I think this treatment is very meaningful, but I don’t know how members > consider this matter? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36612) Support left outer join build left or right outer join build right in shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-36612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407809#comment-17407809 ] Hyukjin Kwon commented on SPARK-36612: -- cc [~chengsu] FYI > Support left outer join build left or right outer join build right in > shuffled hash join > > > Key: SPARK-36612 > URL: https://issues.apache.org/jira/browse/SPARK-36612 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: mcdull_zhang >Priority: Major > > Currently spark sql does not support build left side when left outer join (or > build right side when right outer join). > However, in our production environment, there are a large number of scenarios > where small tables are left join large tables, and many times, large tables > have data skew (currently AQE can't handle this kind of skew). > Inspired by SPARK-32399, we can use similar ideas to realize left outer join > build left. > I think this treatment is very meaningful, but I don’t know how members > consider this matter? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org