This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new f1fe275e5a85 [SPARK-52174][CORE] Enable `spark.checkpoint.compress` by 
default
f1fe275e5a85 is described below

commit f1fe275e5a85677b2d8f1ab732d55d9e488cf3a8
Author: Dongjoon Hyun <dongj...@apache.org>
AuthorDate: Fri May 16 12:28:47 2025 -0700

    [SPARK-52174][CORE] Enable `spark.checkpoint.compress` by default
    
    ### What changes were proposed in this pull request?
    
    This PR aims to enable `spark.checkpoint.compress` by default at Apache 
Spark 4.1.0.
    
    ### Why are the changes needed?
    
    Apache Spark 4.0.0 added `spark.checkpoint.dir` configuration officially.
    
    
https://github.com/apache/spark/blob/781031cd716039e7e3034e462e3292f79c000ff6/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1354-L1361
    
    In line with that, `spark.checkpoint.compress` was introduced at Apache 
Spark 2.2.0 and has been serving well. It would be great if can enable this by 
default to save the space for checkpointing location.
    
    
https://github.com/apache/spark/blob/781031cd716039e7e3034e462e3292f79c000ff6/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1363-L1369
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes only for the users who use RDD checkpoints.
    
    ### How was this patch tested?
    
    Pass the CIs.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #50908 from dongjoon-hyun/SPARK-52174.
    
    Authored-by: Dongjoon Hyun <dongj...@apache.org>
    Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 docs/configuration.md                                              | 2 +-
 docs/core-migration-guide.md                                       | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 4e2912b9f803..7cb3d068b676 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1366,7 +1366,7 @@ package object config {
         "spark.io.compression.codec.")
       .version("2.2.0")
       .booleanConf
-      .createWithDefault(false)
+      .createWithDefault(true)
 
   private[spark] val CACHE_CHECKPOINT_PREFERRED_LOCS_EXPIRE_TIME =
     ConfigBuilder("spark.rdd.checkpoint.cachePreferredLocsExpireTime")
diff --git a/docs/configuration.md b/docs/configuration.md
index 9ee7ea2c9317..634786991647 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1840,7 +1840,7 @@ Apart from these, the following properties are also 
available, and may be useful
 </tr>
 <tr>
   <td><code>spark.checkpoint.compress</code></td>
-  <td>false</td>
+  <td>true</td>
   <td>
     Whether to compress RDD checkpoints. Generally a good idea.
     Compression will use <code>spark.io.compression.codec</code>.
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 914c48a09582..a560a6da91a4 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -25,6 +25,7 @@ license: |
 ## Upgrading from Core 4.0 to 4.1
 
 - Since Spark 4.1, Spark Master deamon provides REST API by default. To 
restore the behavior before Spark 4.1, you can set `spark.master.rest.enabled` 
to `false`.
+- Since Spark 4.1, Spark will compress RDD checkpoints by default. To restore 
the behavior before Spark 4.1, you can set `spark.checkpoint.compress` to 
`false`.
 
 ## Upgrading from Core 3.5 to 4.0
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to