This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.3 by this push: new 4bb2967 [SPARK-38194][FOLLOWUP] Update executor config description for memoryOverheadFactor 4bb2967 is described below commit 4bb2967ea321dd656a28ec685fecc2f97391968e Author: Adam Binford <adam...@gmail.com> AuthorDate: Tue Mar 22 18:10:41 2022 -0500 [SPARK-38194][FOLLOWUP] Update executor config description for memoryOverheadFactor Follow up for https://github.com/apache/spark/pull/35912#pullrequestreview-915755215, update the executor memoryOverheadFactor to mention the 0.4 default for non-jvm jobs as well. ### What changes were proposed in this pull request? Doc update ### Why are the changes needed? To clarify new configs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs Closes #35934 from Kimahriman/memory-overhead-executor-docs. Authored-by: Adam Binford <adam...@gmail.com> Signed-off-by: Sean Owen <sro...@gmail.com> (cherry picked from commit 768ab55e00cb0ec639db1444250ef40471c4a417) Signed-off-by: Sean Owen <sro...@gmail.com> --- core/src/main/scala/org/apache/spark/internal/config/package.scala | 6 +++++- docs/configuration.md | 4 ++++ docs/running-on-kubernetes.md | 2 +- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index ffe4501..fa048f5 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -336,7 +336,11 @@ package object config { .doc("Fraction of executor memory to be allocated as additional non-heap memory per " + "executor process. This is memory that accounts for things like VM overheads, " + "interned strings, other native overheads, etc. This tends to grow with the container " + - "size. This value is ignored if spark.executor.memoryOverhead is set directly.") + "size. This value defaults to 0.10 except for Kubernetes non-JVM jobs, which defaults " + + "to 0.40. This is done as non-JVM tasks need more non-JVM heap space and such tasks " + + "commonly fail with \"Memory Overhead Exceeded\" errors. This preempts this error " + + "with a higher default. This value is ignored if spark.executor.memoryOverhead is set " + + "directly.") .version("3.3.0") .doubleConf .checkValue(factor => factor > 0, diff --git a/docs/configuration.md b/docs/configuration.md index a2e6797..a2cf233 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -309,6 +309,10 @@ of the most common options to set are: Fraction of executor memory to be allocated as additional non-heap memory per executor process. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size. + This value defaults to 0.10 except for Kubernetes non-JVM jobs, which defaults to + 0.40. This is done as non-JVM tasks need more non-JVM heap space and such tasks + commonly fail with "Memory Overhead Exceeded" errors. This preempts this error + with a higher default. This value is ignored if <code>spark.executor.memoryOverhead</code> is set directly. </td> <td>3.3.0</td> diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index de37e22..6fec9ba 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -359,7 +359,7 @@ If no volume is set as local storage, Spark uses temporary scratch space to spil `emptyDir` volumes use the nodes backing storage for ephemeral storage by default, this behaviour may not be appropriate for some compute environments. For example if you have diskless nodes with remote storage mounted over a network, having lots of executors doing IO to this remote storage may actually degrade performance. -In this case it may be desirable to set `spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes. When configured like this Spark's local storage usage will count towards your pods memory usage therefore you may wish to increase your memory requests by increasing the value of `spark.kubernetes.memoryOverheadFactor` as appropriate. +In this case it may be desirable to set `spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes. When configured like this Spark's local storage usage will count towards your pods memory usage therefore you may wish to increase your memory requests by increasing the value of `spark.{driver,executor}.memoryOverheadFactor` as appropriate. ## Introspection and Debugging --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org