This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new f1fe275e5a85 [SPARK-52174][CORE] Enable `spark.checkpoint.compress` by default f1fe275e5a85 is described below commit f1fe275e5a85677b2d8f1ab732d55d9e488cf3a8 Author: Dongjoon Hyun <dongj...@apache.org> AuthorDate: Fri May 16 12:28:47 2025 -0700 [SPARK-52174][CORE] Enable `spark.checkpoint.compress` by default ### What changes were proposed in this pull request? This PR aims to enable `spark.checkpoint.compress` by default at Apache Spark 4.1.0. ### Why are the changes needed? Apache Spark 4.0.0 added `spark.checkpoint.dir` configuration officially. https://github.com/apache/spark/blob/781031cd716039e7e3034e462e3292f79c000ff6/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1354-L1361 In line with that, `spark.checkpoint.compress` was introduced at Apache Spark 2.2.0 and has been serving well. It would be great if can enable this by default to save the space for checkpointing location. https://github.com/apache/spark/blob/781031cd716039e7e3034e462e3292f79c000ff6/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1363-L1369 ### Does this PR introduce _any_ user-facing change? Yes only for the users who use RDD checkpoints. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50908 from dongjoon-hyun/SPARK-52174. Authored-by: Dongjoon Hyun <dongj...@apache.org> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- docs/configuration.md | 2 +- docs/core-migration-guide.md | 1 + 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index 4e2912b9f803..7cb3d068b676 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -1366,7 +1366,7 @@ package object config { "spark.io.compression.codec.") .version("2.2.0") .booleanConf - .createWithDefault(false) + .createWithDefault(true) private[spark] val CACHE_CHECKPOINT_PREFERRED_LOCS_EXPIRE_TIME = ConfigBuilder("spark.rdd.checkpoint.cachePreferredLocsExpireTime") diff --git a/docs/configuration.md b/docs/configuration.md index 9ee7ea2c9317..634786991647 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1840,7 +1840,7 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.checkpoint.compress</code></td> - <td>false</td> + <td>true</td> <td> Whether to compress RDD checkpoints. Generally a good idea. Compression will use <code>spark.io.compression.codec</code>. diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index 914c48a09582..a560a6da91a4 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -25,6 +25,7 @@ license: | ## Upgrading from Core 4.0 to 4.1 - Since Spark 4.1, Spark Master deamon provides REST API by default. To restore the behavior before Spark 4.1, you can set `spark.master.rest.enabled` to `false`. +- Since Spark 4.1, Spark will compress RDD checkpoints by default. To restore the behavior before Spark 4.1, you can set `spark.checkpoint.compress` to `false`. ## Upgrading from Core 3.5 to 4.0 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org