[jira] [Commented] (SPARK-38172) Adaptive coalesce not working with df persist

2022-02-24 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497870#comment-17497870
 ] 

XiDuo You commented on SPARK-38172:
---

thanks [~Naveenmts]  for the confirming !

> Adaptive coalesce not working with df persist
> -
>
> Key: SPARK-38172
> URL: https://issues.apache.org/jira/browse/SPARK-38172
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
> Environment: OS: Linux
> Spark Version: 3.2.3
>Reporter: Naveen Nagaraj
>Priority: Major
> Attachments: image-2022-02-10-15-32-30-355.png, 
> image-2022-02-10-15-33-08-018.png, image-2022-02-10-15-33-32-607.png
>
>
> {code:java}
> // code placeholder
> val spark = SparkSession.builder().master("local[4]").appName("Test")
>                         .config("spark.sql.adaptive.enabled", "true")
>                         
> .config("spark.sql.adaptive.coalescePartitions.enabled", "true")
>                         
> .config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "50m")
>                         
> .config("spark.sql.adaptive.coalescePartitions.minPartitionNum", "1")
>                         
> .config("spark.sql.adaptive.coalescePartitions.initialPartitionNum", "1024")
>                         .getOrCreate()
> val df = spark.read.csv("")
> val df1 = df.distinct()
> df1.persist() // On removing this line. Code works as expected
> df1.write.csv("") {code}
> Without df1.persist, df1.write.csv writes 4 partition files of 50 MB each 
> which is expected
> [https://i.stack.imgur.com/tDxpV.png]
> If I include df1.persist, Spark is writing 200 partitions(adaptive coalesce 
> not working) With persist
> [https://i.stack.imgur.com/W13hA.png]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38172) Adaptive coalesce not working with df persist

2022-02-23 Thread Naveen Nagaraj (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496652#comment-17496652
 ] 

Naveen Nagaraj commented on SPARK-38172:


Hi [~ulysses], It is working. Thank you so much

> Adaptive coalesce not working with df persist
> -
>
> Key: SPARK-38172
> URL: https://issues.apache.org/jira/browse/SPARK-38172
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
> Environment: OS: Linux
> Spark Version: 3.2.3
>Reporter: Naveen Nagaraj
>Priority: Major
> Attachments: image-2022-02-10-15-32-30-355.png, 
> image-2022-02-10-15-33-08-018.png, image-2022-02-10-15-33-32-607.png
>
>
> {code:java}
> // code placeholder
> val spark = SparkSession.builder().master("local[4]").appName("Test")
>                         .config("spark.sql.adaptive.enabled", "true")
>                         
> .config("spark.sql.adaptive.coalescePartitions.enabled", "true")
>                         
> .config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "50m")
>                         
> .config("spark.sql.adaptive.coalescePartitions.minPartitionNum", "1")
>                         
> .config("spark.sql.adaptive.coalescePartitions.initialPartitionNum", "1024")
>                         .getOrCreate()
> val df = spark.read.csv("")
> val df1 = df.distinct()
> df1.persist() // On removing this line. Code works as expected
> df1.write.csv("") {code}
> Without df1.persist, df1.write.csv writes 4 partition files of 50 MB each 
> which is expected
> [https://i.stack.imgur.com/tDxpV.png]
> If I include df1.persist, Spark is writing 200 partitions(adaptive coalesce 
> not working) With persist
> [https://i.stack.imgur.com/W13hA.png]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38172) Adaptive coalesce not working with df persist

2022-02-11 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490773#comment-17490773
 ] 

XiDuo You commented on SPARK-38172:
---

hi [~Naveenmts]  have you tried enable this config ?
{code:java}
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning {code}

> Adaptive coalesce not working with df persist
> -
>
> Key: SPARK-38172
> URL: https://issues.apache.org/jira/browse/SPARK-38172
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.1
> Environment: OS: Linux
> Spark Version: 3.2.3
>Reporter: Naveen Nagaraj
>Priority: Major
> Attachments: image-2022-02-10-15-32-30-355.png, 
> image-2022-02-10-15-33-08-018.png, image-2022-02-10-15-33-32-607.png
>
>
> {code:java}
> // code placeholder
> val spark = SparkSession.builder().master("local[4]").appName("Test")
>                         .config("spark.sql.adaptive.enabled", "true")
>                         
> .config("spark.sql.adaptive.coalescePartitions.enabled", "true")
>                         
> .config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "50m")
>                         
> .config("spark.sql.adaptive.coalescePartitions.minPartitionNum", "1")
>                         
> .config("spark.sql.adaptive.coalescePartitions.initialPartitionNum", "1024")
>                         .getOrCreate()
> val df = spark.read.csv("")
> val df1 = df.distinct()
> df1.persist() // On removing this line. Code works as expected
> df1.write.csv("") {code}
> Without df1.persist, df1.write.csv writes 4 partition files of 50 MB each 
> which is expected
> [https://i.stack.imgur.com/tDxpV.png]
> If I include df1.persist, Spark is writing 200 partitions(adaptive coalesce 
> not working) With persist
> [https://i.stack.imgur.com/W13hA.png]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org