[ 
https://issues.apache.org/jira/browse/SPARK-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-23825:
--------------------------------------
    Description: 
We currently request  {{spark.{driver,executor}.memory}} as memory from 
Kubernetes (e.g., 
[here|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala#L95]).
The limit is set to {{spark.{driver,executor}.memory + 
spark.kubernetes.{driver,executor}.memoryOverhead}}.
This seems to be using Kubernetes wrong. 
[How Pods with resource limits are 
run|https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run],
 states"

{noformat}
If a Container exceeds its memory request, it is likely that its Pod will be 
evicted whenever the node runs out of memory.
{noformat}

Thus, if a the  spark driver/executor uses {{memory + memoryOverhead}} memory, 
it can be evicted. While an executor might get restarted (but it would still be 
very bad performance-wise), the driver would be hard to recover.

I think spark should be able to run with the requested (and, thus, guaranteed) 
resources from Kubernetes. It shouldn't rely on optional resources above the 
request and, therefore, be in danger of termination on high cluster utilization.

Thus, we shoud request {{memory + memoryOverhead}} memory from Kubernetes (and 
this should also be the limit).

  was:
We currently request `spark.{driver,executor}.memory` as memory from Kubernetes 
(e.g., 
[here|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala#L95]).
The limit is set to `spark.{driver,executor}.memory + 
spark.kubernetes.{driver,executor}.memoryOverhead`.
This seems to be using Kubernetes wrong. 
[How Pods with resource limits are 
run|https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run],
 states"

{noformat}
If a Container exceeds its memory request, it is likely that its Pod will be 
evicted whenever the node runs out of memory.
{noformat}

Thus, if a the  spark driver/executor uses `memory + memoryOverhead` memory, it 
can be evicted. While an executor might get restarted (but it would still be 
very bad performance-wise), the driver would be hard to recover.

I think spark should be able to run with the requested (and, thus, guaranteed) 
resources from Kubernetes. It shouldn't rely on optional resources above the 
request and, therefore, be in danger of termination on high cluster utilization.

Thus, we shoud request `memory + memoryOverhead` memory from Kubernetes (and 
this should also be the limit).


> [K8s] Spark pods should request memory + memoryOverhead as resources
> --------------------------------------------------------------------
>
>                 Key: SPARK-23825
>                 URL: https://issues.apache.org/jira/browse/SPARK-23825
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 2.3.0
>            Reporter: David Vogelbacher
>            Priority: Major
>
> We currently request  {{spark.{driver,executor}.memory}} as memory from 
> Kubernetes (e.g., 
> [here|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala#L95]).
> The limit is set to {{spark.{driver,executor}.memory + 
> spark.kubernetes.{driver,executor}.memoryOverhead}}.
> This seems to be using Kubernetes wrong. 
> [How Pods with resource limits are 
> run|https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run],
>  states"
> {noformat}
> If a Container exceeds its memory request, it is likely that its Pod will be 
> evicted whenever the node runs out of memory.
> {noformat}
> Thus, if a the  spark driver/executor uses {{memory + memoryOverhead}} 
> memory, it can be evicted. While an executor might get restarted (but it 
> would still be very bad performance-wise), the driver would be hard to 
> recover.
> I think spark should be able to run with the requested (and, thus, 
> guaranteed) resources from Kubernetes. It shouldn't rely on optional 
> resources above the request and, therefore, be in danger of termination on 
> high cluster utilization.
> Thus, we shoud request {{memory + memoryOverhead}} memory from Kubernetes 
> (and this should also be the limit).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to