This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
     new 4bb2967  [SPARK-38194][FOLLOWUP] Update executor config description 
for memoryOverheadFactor
4bb2967 is described below

commit 4bb2967ea321dd656a28ec685fecc2f97391968e
Author: Adam Binford <adam...@gmail.com>
AuthorDate: Tue Mar 22 18:10:41 2022 -0500

    [SPARK-38194][FOLLOWUP] Update executor config description for 
memoryOverheadFactor
    
    Follow up for 
https://github.com/apache/spark/pull/35912#pullrequestreview-915755215, update 
the executor memoryOverheadFactor to mention the 0.4 default for non-jvm jobs 
as well.
    
    ### What changes were proposed in this pull request?
    
    Doc update
    
    ### Why are the changes needed?
    
    To clarify new configs
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Existing UTs
    
    Closes #35934 from Kimahriman/memory-overhead-executor-docs.
    
    Authored-by: Adam Binford <adam...@gmail.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
    (cherry picked from commit 768ab55e00cb0ec639db1444250ef40471c4a417)
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 6 +++++-
 docs/configuration.md                                              | 4 ++++
 docs/running-on-kubernetes.md                                      | 2 +-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index ffe4501..fa048f5 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -336,7 +336,11 @@ package object config {
       .doc("Fraction of executor memory to be allocated as additional non-heap 
memory per " +
         "executor process. This is memory that accounts for things like VM 
overheads, " +
         "interned strings, other native overheads, etc. This tends to grow 
with the container " +
-        "size. This value is ignored if spark.executor.memoryOverhead is set 
directly.")
+        "size. This value defaults to 0.10 except for Kubernetes non-JVM jobs, 
which defaults " +
+        "to 0.40. This is done as non-JVM tasks need more non-JVM heap space 
and such tasks " +
+        "commonly fail with \"Memory Overhead Exceeded\" errors. This preempts 
this error " +
+        "with a higher default. This value is ignored if 
spark.executor.memoryOverhead is set " +
+        "directly.")
       .version("3.3.0")
       .doubleConf
       .checkValue(factor => factor > 0,
diff --git a/docs/configuration.md b/docs/configuration.md
index a2e6797..a2cf233 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -309,6 +309,10 @@ of the most common options to set are:
     Fraction of executor memory to be allocated as additional non-heap memory 
per executor process.
     This is memory that accounts for things like VM overheads, interned 
strings,
     other native overheads, etc. This tends to grow with the container size.
+    This value defaults to 0.10 except for Kubernetes non-JVM jobs, which 
defaults to
+    0.40. This is done as non-JVM tasks need more non-JVM heap space and such 
tasks
+    commonly fail with "Memory Overhead Exceeded" errors. This preempts this 
error
+    with a higher default.
     This value is ignored if <code>spark.executor.memoryOverhead</code> is set 
directly.
   </td>
   <td>3.3.0</td>
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index de37e22..6fec9ba 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -359,7 +359,7 @@ If no volume is set as local storage, Spark uses temporary 
scratch space to spil
 
 `emptyDir` volumes use the nodes backing storage for ephemeral storage by 
default, this behaviour may not be appropriate for some compute environments.  
For example if you have diskless nodes with remote storage mounted over a 
network, having lots of executors doing IO to this remote storage may actually 
degrade performance.
 
-In this case it may be desirable to set 
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause 
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  
When configured like this Spark's local storage usage will count towards your 
pods memory usage therefore you may wish to increase your memory requests by 
increasing the value of `spark.kubernetes.memoryOverheadFactor` as appropriate.
+In this case it may be desirable to set 
`spark.kubernetes.local.dirs.tmpfs=true` in your configuration which will cause 
the `emptyDir` volumes to be configured as `tmpfs` i.e. RAM backed volumes.  
When configured like this Spark's local storage usage will count towards your 
pods memory usage therefore you may wish to increase your memory requests by 
increasing the value of `spark.{driver,executor}.memoryOverheadFactor` as 
appropriate.
 
 
 ## Introspection and Debugging

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to