[jira] [Commented] (SPARK-36060) Support backing off dynamic allocation increases if resources are "stuck"

Weiwei Yang (Jira) Thu, 27 Jan 2022 17:08:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-36060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483504#comment-17483504
 ]


Weiwei Yang commented on SPARK-36060:
-------------------------------------

hi [~holden] 

We have seen the same issue. The flag 
{{spark.kubernetes.allocation.maxPendingPods}} introduced by SPARK-36052 helps 
to mitigate the issue, do you think there is anything else we can do except 
this?

> Support backing off dynamic allocation increases if resources are "stuck"
> -------------------------------------------------------------------------
>
>                 Key: SPARK-36060
>                 URL: https://issues.apache.org/jira/browse/SPARK-36060
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes
>    Affects Versions: 3.2.0
>            Reporter: Holden Karau
>            Priority: Major
>
> In a over-subscribed environment we may enter a situation where our requests 
> for more pods are not going to be fulfilled. Adding more requests for more 
> pods is not going to help and may slow down the scheduler. We should detect 
> this situation and hold off on increasing pod requests until the scheduler 
> allocates more pods to us. We have a limited version of this in the Kube 
> scheduler it's self but it would be better to plumb this all the way through 
> to the DA logic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36060) Support backing off dynamic allocation increases if resources are "stuck"

Reply via email to