[ https://issues.apache.org/jira/browse/SPARK-47284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Qi Zhu updated SPARK-47284: --------------------------- Description: The following case is introduced by https://issues.apache.org/jira/browse/SPARK-35703 // When choosing specs, we should consider those children with no `ShuffleExchangeLike` node // first. For instance, if we have: // A: (No_Exchange, 100) <---> B: (Exchange, 120) // it's better to pick A and change B to (Exchange, 100) instead of picking B and insert a // new shuffle for A. *But we'd better improve it in some cases, for example:* A: (No_Exchange, 2) <---> B: (Exchange, 100) The current logic will change to: A: (No_Exchange, 2) <---> B: (Exchange,2) It actually not ensure enough parallelism, it will reduce the performance i think. was: The following case is introduced by https://issues.apache.org/jira/browse/SPARK-35703 // When choosing specs, we should consider those children with no `ShuffleExchangeLike` node // first. For instance, if we have: // A: (No_Exchange, 100) <---> B: (Exchange, 120) // it's better to pick A and change B to (Exchange, 100) instead of picking B and insert a // new shuffle for A. But we'd better improve it in some cases, for example: A: (No_Exchange, 2) <---> B: (Exchange, 100) The current logic will change to: A: (No_Exchange, 2) <---> B: (Exchange,2) It actually not ensure enough parallelism, it will reduce the performance i think. > We should ensure enough parallelism when ShuffleExchangeLike join with specs > without shuffle > -------------------------------------------------------------------------------------------- > > Key: SPARK-47284 > URL: https://issues.apache.org/jira/browse/SPARK-47284 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 4.0.0 > Reporter: Qi Zhu > Priority: Major > > The following case is introduced by > https://issues.apache.org/jira/browse/SPARK-35703 > // When choosing specs, we should consider those children with no > `ShuffleExchangeLike` node > // first. For instance, if we have: > // A: (No_Exchange, 100) <---> B: (Exchange, 120) > // it's better to pick A and change B to (Exchange, 100) instead of picking B > and insert a > // new shuffle for A. > *But we'd better improve it in some cases, for example:* > A: (No_Exchange, 2) <---> B: (Exchange, 100) > The current logic will change to: > A: (No_Exchange, 2) <---> B: (Exchange,2) > It actually not ensure enough parallelism, it will reduce the performance i > think. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org