[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597894#comment-17597894
 ] 

Apache Spark commented on SPARK-39915:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/37730

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597893#comment-17597893
 ] 

Apache Spark commented on SPARK-39915:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/37730

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597138#comment-17597138
 ] 

Apache Spark commented on SPARK-39915:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/37706

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582864#comment-17582864
 ] 

Apache Spark commented on SPARK-39915:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/37612

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-21 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582707#comment-17582707
 ] 

XiDuo You commented on SPARK-39915:
---

We may need a more strict machine to ensure the output partition number of 
repartition

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-21 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582703#comment-17582703
 ] 

XiDuo You commented on SPARK-39915:
---

Thank you [~yumwang] for ping me. I see this issue.

 

This is not only for empty relation optimization but also for other unary node 
which is at top of repartition, e.g.:
{code:java}
val df1 = spark.range(1).selectExpr("id as c1")
val df2 = spark.range(1).selectExpr("id as c2")
df1.join(df2, col("c1") === col("c2")).repartition(200, 
col("c1")).rdd.getNumPartitions 

-- output
1{code}
the `.rdd` of dataset will inject a unary node `DeserializeToObject`, so the 
protection of current AQE for repartition does not work. see `AQEUtils`. And 
the protection does not retain the  `RoundRobinPartitioning`, which makes this 
issue more complex.

 

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-20 Thread Shixiong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582457#comment-17582457
 ] 

Shixiong Zhu commented on SPARK-39915:
--

Yeah. I would consider this is a bug since the doc of `repartition` explicitly 
says 

{code:java}
Returns a new Dataset that has exactly `numPartitions` partitions.
{code}


> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-20 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582402#comment-17582402
 ] 

Yuming Wang commented on SPARK-39915:
-

The reason is that it will return empty local relation since SPARK-35455:
https://github.com/apache/spark/blob/a077701d4cc36a9a6ce898ddd3b4e5fd506f6162/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala#L129-L130

cc [~ulysses]

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-06 Thread Shixiong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576319#comment-17576319
 ] 

Shixiong Zhu commented on SPARK-39915:
--


{code:java}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.0
  /_/
 
Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.range(10, 0).repartition(5).rdd.getNumPartitions
res0: Int = 0
{code}


> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-08-06 Thread Pablo Langa Blanco (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576282#comment-17576282
 ] 

Pablo Langa Blanco commented on SPARK-39915:


Hi [~zsxwing] ,

I can't reproduce it, do you have a typo in range?
{code:java}
scala> spark.range(0, 10).repartition(5).rdd.getNumPartitions
res53: Int = 5{code}

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions

2022-07-28 Thread Shixiong Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572685#comment-17572685
 ] 

Shixiong Zhu commented on SPARK-39915:
--

cc [~cloud_fan] 

> Dataset.repartition(N) may not create N partitions
> --
>
> Key: SPARK-39915
> URL: https://issues.apache.org/jira/browse/SPARK-39915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Looks like there is a behavior change in Dataset.repartition in 3.3.0. For 
> example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 
> in Spark 3.2.0, but 0 in Spark 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org