[jira] [Commented] (SPARK-35725) Support repartition expand partitions in AQE

2023-01-31 Thread Penglei Shi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682475#comment-17682475
 ] 

Penglei Shi commented on SPARK-35725:
-

[~ulysses] Thx!

> Support repartition expand partitions in AQE
> 
>
> Key: SPARK-35725
> URL: https://issues.apache.org/jira/browse/SPARK-35725
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, we don't support expand partition dynamically in AQE which is not 
> friendly for some data skew job.
> Let's say if we have a simple query:
> {code:java}
> SELECT * FROM table DISTRIBUTE BY col
> {code}
> The column of `col` is skewed, then some shuffle partitions would handle too 
> much data than others.
> If we haven't inroduced extra shuffle, we can optimize this case by expanding 
> partitions in AQE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35725) Support repartition expand partitions in AQE

2023-01-31 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682473#comment-17682473
 ] 

XiDuo You commented on SPARK-35725:
---

[~Penglei Shi] Kyuubi community has a Spark extension to support similar stuff 
about `finalStageConfigIsolation`, see docs 
[https://kyuubi.readthedocs.io/en/v1.6.1-incubating/extensions/engines/spark/rules.html.]
 Hope it can help you.

> Support repartition expand partitions in AQE
> 
>
> Key: SPARK-35725
> URL: https://issues.apache.org/jira/browse/SPARK-35725
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, we don't support expand partition dynamically in AQE which is not 
> friendly for some data skew job.
> Let's say if we have a simple query:
> {code:java}
> SELECT * FROM table DISTRIBUTE BY col
> {code}
> The column of `col` is skewed, then some shuffle partitions would handle too 
> much data than others.
> If we haven't inroduced extra shuffle, we can optimize this case by expanding 
> partitions in AQE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35725) Support repartition expand partitions in AQE

2023-01-31 Thread Penglei Shi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682469#comment-17682469
 ] 

Penglei Shi commented on SPARK-35725:
-

[~ulysses] Yes,  the final shuffle coming with Rebalance needs a target size 
different from the existed one for optimizing skew partitions and coalescing 
small partitions.  Do you think it's reasonable to use a different target size 
depending on shuffle origin in CoalesceShufflePartitions and 
OptimizeSkewInRebalancePartitions?

> Support repartition expand partitions in AQE
> 
>
> Key: SPARK-35725
> URL: https://issues.apache.org/jira/browse/SPARK-35725
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, we don't support expand partition dynamically in AQE which is not 
> friendly for some data skew job.
> Let's say if we have a simple query:
> {code:java}
> SELECT * FROM table DISTRIBUTE BY col
> {code}
> The column of `col` is skewed, then some shuffle partitions would handle too 
> much data than others.
> If we haven't inroduced extra shuffle, we can optimize this case by expanding 
> partitions in AQE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35725) Support repartition expand partitions in AQE

2023-01-31 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682460#comment-17682460
 ] 

XiDuo You commented on SPARK-35725:
---

[~Penglei Shi] , It will cause inconsistent. The feature `rebalance` acutally 
depends on two rule: 1. optimize skew in rebalance, 2: coalesce shuffle 
partition. The key is rule2 always uses the advisory size to coalesce small 
partitions. so I think what you need is a config that can control both rule1 
and rule2 rather than rule1 only in final shuffle for writefiles. That's why I 
reuse the existed advisory size config.

> Support repartition expand partitions in AQE
> 
>
> Key: SPARK-35725
> URL: https://issues.apache.org/jira/browse/SPARK-35725
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, we don't support expand partition dynamically in AQE which is not 
> friendly for some data skew job.
> Let's say if we have a simple query:
> {code:java}
> SELECT * FROM table DISTRIBUTE BY col
> {code}
> The column of `col` is skewed, then some shuffle partitions would handle too 
> much data than others.
> If we haven't inroduced extra shuffle, we can optimize this case by expanding 
> partitions in AQE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35725) Support repartition expand partitions in AQE

2023-01-31 Thread Penglei Shi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682455#comment-17682455
 ] 

Penglei Shi commented on SPARK-35725:
-

Hi [~ulysses]  , advisory partition size for shuffle will be used for splitting 
skew partitions at present, which is default 64MB. Usually we prefer a samll 
value such as 32/64mb for bertter performance for intermediate shuffle. But in 
last shuffle which is come with Rebalance, we prefer a large value such as 
128/256mb so that the written files are in appropriate size. So i think we need 
another target size to split skew partitions in Rebalance, what's your 
suggestion?

> Support repartition expand partitions in AQE
> 
>
> Key: SPARK-35725
> URL: https://issues.apache.org/jira/browse/SPARK-35725
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, we don't support expand partition dynamically in AQE which is not 
> friendly for some data skew job.
> Let's say if we have a simple query:
> {code:java}
> SELECT * FROM table DISTRIBUTE BY col
> {code}
> The column of `col` is skewed, then some shuffle partitions would handle too 
> much data than others.
> If we haven't inroduced extra shuffle, we can optimize this case by expanding 
> partitions in AQE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35725) Support repartition expand partitions in AQE

2021-06-11 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17361566#comment-17361566
 ] 

Apache Spark commented on SPARK-35725:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/32883

> Support repartition expand partitions in AQE
> 
>
> Key: SPARK-35725
> URL: https://issues.apache.org/jira/browse/SPARK-35725
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Major
>
> Currently, we don't support expand partition dynamically in AQE which is not 
> friendly for some data skew job.
> Let's say if we have a simple query:
> {code:java}
> SELECT * FROM table DISTRIBUTE BY col
> {code}
> The column of `col` is skewed, then some shuffle partitions would handle too 
> much data than others.
> If we haven't inroduced extra shuffle, we can optimize this case by expanding 
> partitions in AQE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org