[jira] [Updated] (SPARK-47284) We should ensure enough parallelism when ShuffleExchangeLike join with specs without shuffle

Qi Zhu (Jira) Tue, 05 Mar 2024 01:30:05 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-47284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Qi Zhu updated SPARK-47284:
---------------------------
    Description: 
The following case is introduced by 
https://issues.apache.org/jira/browse/SPARK-35703

// When choosing specs, we should consider those children with no 
`ShuffleExchangeLike` node
// first. For instance, if we have:
// A: (No_Exchange, 100) <---> B: (Exchange, 120)
// it's better to pick A and change B to (Exchange, 100) instead of picking B 
and insert a
// new shuffle for A.


*But we'd better improve it in some cases, for example:*
A: (No_Exchange, 2) <---> B: (Exchange, 100)

The current logic will change to:
A: (No_Exchange, 2) <---> B: (Exchange,2)

It actually not ensure enough parallelism, it will reduce the performance i 
think.

  was:
The following case is introduced by 
https://issues.apache.org/jira/browse/SPARK-35703


// When choosing specs, we should consider those children with no 
`ShuffleExchangeLike` node
// first. For instance, if we have:
// A: (No_Exchange, 100) <---> B: (Exchange, 120)
// it's better to pick A and change B to (Exchange, 100) instead of picking B 
and insert a
// new shuffle for A.



But we'd better improve it in some cases, for example:
A: (No_Exchange, 2) <---> B: (Exchange, 100)


The current logic will change to:
A: (No_Exchange, 2) <---> B: (Exchange,2)

It actually not ensure enough parallelism, it will reduce the performance i 
think.


> We should ensure enough parallelism when ShuffleExchangeLike join with specs 
> without shuffle
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-47284
>                 URL: https://issues.apache.org/jira/browse/SPARK-47284
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Qi Zhu
>            Priority: Major
>
> The following case is introduced by 
> https://issues.apache.org/jira/browse/SPARK-35703
> // When choosing specs, we should consider those children with no 
> `ShuffleExchangeLike` node
> // first. For instance, if we have:
> // A: (No_Exchange, 100) <---> B: (Exchange, 120)
> // it's better to pick A and change B to (Exchange, 100) instead of picking B 
> and insert a
> // new shuffle for A.
> *But we'd better improve it in some cases, for example:*
> A: (No_Exchange, 2) <---> B: (Exchange, 100)
> The current logic will change to:
> A: (No_Exchange, 2) <---> B: (Exchange,2)
> It actually not ensure enough parallelism, it will reduce the performance i 
> think.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47284) We should ensure enough parallelism when ShuffleExchangeLike join with specs without shuffle

Reply via email to