This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 87cae7bc7870 [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing 87cae7bc7870 is described below commit 87cae7bc7870bacafc6afad99ba86a6efca2a464 Author: Dongjoon Hyun <dh...@apple.com> AuthorDate: Mon Mar 25 16:06:03 2024 -0700 [SPARK-47552][CORE] Set `spark.hadoop.fs.s3a.connection.establish.timeout` to 30s if missing ### What changes were proposed in this pull request? This PR aims to handle HADOOP-19097 from Apache Spark side. We can remove this when Apache Hadoop `3.4.1` releases. - https://github.com/apache/hadoop/pull/6601 ### Why are the changes needed? Apache Hadoop shows a warning to its default configuration. This default value issue is fixed at Apache Spark 3.4.1. ``` 24/03/25 14:46:21 WARN ConfigurationHelper: Option fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms instead ``` This change will suppress Apache Hadoop default warning in the consistent way with the future Hadoop releases. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Manually. **BUILD** ``` $ dev/make-distribution.sh -Phadoop-cloud ``` **BEFORE** ``` scala> spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/") ... 24/03/25 15:50:46 WARN ConfigurationHelper: Option fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms instead ``` **AFTER** ``` scala> spark.range(10).write.mode("overwrite").orc("s3a://express-1-zone--***--x-s3/orc/") ...(ConfigurationHelper warning is gone)... ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45710 from dongjoon-hyun/SPARK-47552. Authored-by: Dongjoon Hyun <dh...@apple.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- core/src/main/scala/org/apache/spark/SparkContext.scala | 3 +++ 1 file changed, 3 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index d519617c4095..f8f0107ed139 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -417,6 +417,9 @@ class SparkContext(config: SparkConf) extends Logging { if (!_conf.contains("spark.app.name")) { throw new SparkException("An application name must be set in your configuration") } + // HADOOP-19097 Set fs.s3a.connection.establish.timeout to 30s + // We can remove this after Apache Hadoop 3.4.1 releases + conf.setIfMissing("spark.hadoop.fs.s3a.connection.establish.timeout", "30s") // This should be set as early as possible. SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org