[ 
https://issues.apache.org/jira/browse/SPARK-35264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-35264:
--------------------------------
    Description: 
The main idea here is that make join config isolation between normal planner 
and aqe planner which shared the same code path.

Actually we don not very trust using the static stat to consider if it can 
build broadcast hash join. In our experience it's very common that Spark throw 
broadcast timeout or driver side OOM exception when execute a bit large plan. 
And due to braodcast join is not reversed which means if we covert join to 
braodcast hash join at first time, we(aqe) can not optimize it again, so it 
should make sense to decide if we can do broadcast at aqe side using different 
sql config.


 In order to achieve this we use a specific join hint in advance during AQE 
framework and then at JoinSelection side it will take and follow the inserted 
hint.

For now we only support select strategy for equi join, and follow this order
 1. mark join as broadcast hash join if possible
 2. mark join as shuffled hash join if possible

Note that, we don't override join strategy if user specifies a join hint.

 

  was:
The main idea here is that make join config isolation between normal planner 
and aqe planner which shared the same code path.
 In order to achieve this we use a specific join hint in advance during AQE 
framework and then at JoinSelection side it will take and follow the inserted 
hint.

For now we only support select strategy for equi join, and follow this order
 1. mark join as broadcast hash join if possible
 2. mark join as shuffled hash join if possible


 Note that, we don't override join strategy if user specifies a join hint.

 


> Support AQE side broadcastJoin threshold
> ----------------------------------------
>
>                 Key: SPARK-35264
>                 URL: https://issues.apache.org/jira/browse/SPARK-35264
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: ulysses you
>            Priority: Major
>
> The main idea here is that make join config isolation between normal planner 
> and aqe planner which shared the same code path.
> Actually we don not very trust using the static stat to consider if it can 
> build broadcast hash join. In our experience it's very common that Spark 
> throw broadcast timeout or driver side OOM exception when execute a bit large 
> plan. And due to braodcast join is not reversed which means if we covert join 
> to braodcast hash join at first time, we(aqe) can not optimize it again, so 
> it should make sense to decide if we can do broadcast at aqe side using 
> different sql config.
>  In order to achieve this we use a specific join hint in advance during AQE 
> framework and then at JoinSelection side it will take and follow the inserted 
> hint.
> For now we only support select strategy for equi join, and follow this order
>  1. mark join as broadcast hash join if possible
>  2. mark join as shuffled hash join if possible
> Note that, we don't override join strategy if user specifies a join hint.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to