This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new dd4f112a3f3 [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM dd4f112a3f3 is described below commit dd4f112a3f375ccc762a9bb7fa674f32a2adb6e1 Author: yangjie01 <yangji...@baidu.com> AuthorDate: Sat Jan 21 22:03:04 2023 -0600 [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM ### What changes were proposed in this pull request? The env `SPARK_USE_CONC_INCR_GC` use to enable CMS GC for Yarn AM, but user can achieve the same goal by directly configuring `spark.driver.extraJavaOptions(cluster mode)` or `spark.yarn.am.extraJavaOptions(client mode)` and this env is not exposed to users in any user documents. At the same time, Java 14 starts to remove support for CMS([JEP-363](https://openjdk.org/jeps/363)). So when we use Java 17 to run Spark and set `SPARK_USE_CONC_INCR_GC=true`, CMS will not take effect (No error be reported due to `-XX:+IgnoreUnrecognizedVMOptions` is configured by default). This pr aims to remove this env and a similar commented out code in `yarn.ExecutorRunnable` for code cleanup. ### Why are the changes needed? Clean up an env `SPARK_USE_CONC_INCR_GC` that is not exposed to users in the any document. ### Does this PR introduce _any_ user-facing change? Yes, env `SPARK_USE_CONC_INCR_GC` is no longer effective, but users can enable CMS GC(when Java version less than 14) by directly configuring `spark.driver.extraJavaOptions(cluster mode)` or `spark.yarn.am.extraJavaOptions(client mode)`. ### How was this patch tested? Pass GitHub Actions Closes #39674 from LuciferYang/remove-SPARK_USE_CONC_INCR_GC. Authored-by: yangjie01 <yangji...@baidu.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> --- .../org/apache/spark/deploy/yarn/Client.scala | 20 ----------------- .../spark/deploy/yarn/ExecutorRunnable.scala | 26 ---------------------- 2 files changed, 46 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index 2daf9cd1df4..d8e9cd8b47d 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new Path(Environment.PWD.$$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR) javaOpts += "-Djava.io.tmpdir=" + tmpDir - // TODO: Remove once cpuset version is pushed out. - // The context is, default gc for server class machines ends up using all cores to do gc - - // hence if there are multiple containers in same node, Spark GC affects all other containers' - // performance (which can be that of other Spark containers) - // Instead of using this, rely on cpusets by YARN to enforce "proper" Spark behavior in - // multi-tenant environments. Not sure how default Java GC behaves if it is limited to subset - // of cores on a node. - val useConcurrentAndIncrementalGC = launchEnv.get("SPARK_USE_CONC_INCR_GC").exists(_.toBoolean) - if (useConcurrentAndIncrementalGC) { - // In our expts, using (default) throughput collector has severe perf ramifications in - // multi-tenant machines - javaOpts += "-XX:+UseConcMarkSweepGC" - javaOpts += "-XX:MaxTenuringThreshold=31" - javaOpts += "-XX:SurvivorRatio=8" - javaOpts += "-XX:+CMSIncrementalMode" - javaOpts += "-XX:+CMSIncrementalPacing" - javaOpts += "-XX:CMSIncrementalDutyCycleMin=0" - javaOpts += "-XX:CMSIncrementalDutyCycle=10" - } - // Include driver-specific java options if we are launching a driver if (isClusterMode) { sparkConf.get(DRIVER_JAVA_OPTIONS).foreach { opts => diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala index dbf4a0a8052..0148b6f3c95 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala @@ -160,32 +160,6 @@ private[yarn] class ExecutorRunnable( .filter { case (k, v) => SparkConf.isExecutorStartupConf(k) } .foreach { case (k, v) => javaOpts += YarnSparkHadoopUtil.escapeForShell(s"-D$k=$v") } - // Commenting it out for now - so that people can refer to the properties if required. Remove - // it once cpuset version is pushed out. - // The context is, default gc for server class machines end up using all cores to do gc - hence - // if there are multiple containers in same node, spark gc effects all other containers - // performance (which can also be other spark containers) - // Instead of using this, rely on cpusets by YARN to enforce spark behaves 'properly' in - // multi-tenant environments. Not sure how default java gc behaves if it is limited to subset - // of cores on a node. - /* - else { - // If no java_opts specified, default to using -XX:+CMSIncrementalMode - // It might be possible that other modes/config is being done in - // spark.executor.extraJavaOptions, so we don't want to mess with it. - // In our expts, using (default) throughput collector has severe perf ramifications in - // multi-tenant machines - // The options are based on - // http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#0.0.0.%20When%20to%20Use - // %20the%20Concurrent%20Low%20Pause%20Collector|outline - javaOpts += "-XX:+UseConcMarkSweepGC" - javaOpts += "-XX:+CMSIncrementalMode" - javaOpts += "-XX:+CMSIncrementalPacing" - javaOpts += "-XX:CMSIncrementalDutyCycleMin=0" - javaOpts += "-XX:CMSIncrementalDutyCycle=10" - } - */ - // For log4j configuration to reference javaOpts += ("-Dspark.yarn.app.container.log.dir=" + ApplicationConstants.LOG_DIR_EXPANSION_VAR) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org