[ 
https://issues.apache.org/jira/browse/YUNIKORN-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoran Yu updated YUNIKORN-1596:
---------------------------------
    Affects Version/s: 1.2.0

> Pods marked unschedulable when dynamic PVC times out
> ----------------------------------------------------
>
>                 Key: YUNIKORN-1596
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1596
>             Project: Apache YuniKorn
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Praveen
>            Priority: Major
>
> We are seeing a behavior when a scheduled pod requesting for PVC times out, 
> its marked as unschedulable. There are no retries on such pod and remain in 
> 'pending' state. With pods in pending, autoscaler does not scale down nodes.  
> This seems similar to issue discussed here:
> [https://github.com/kubernetes/autoscaler/issues/3409]
>  
> {quote}Error from Yunikorn logs :
> ERROR cache/context.go:527 Failed to bind pod volumes \{"podName": 
> "<PODNAME>", "nodeName": "<IP>", "dynamicProvisions": 1, "staticBindings": 0}
> ...
> ...
> /workspace/pkg/cache/task.go:382
> 2023-02-20T00:02:22.368Z ERROR cache/task.go:265 task failed \{"appID": 
> "<APPID>", "taskID": "45981d91-e543-459b-9657-bdc03b57e26f", "reason": "bind 
> pod volumes failed, name: <NS/PODNAME>, binding volumes: timed out waiting 
> for the condition”}
> {{}}
> {quote}
>  
> {{From Autoscalar logs}}
> {quote}I0220 20:47:01.775653 1 static_autoscaler.go:502] Scale down status: 
> unneededOnly=true lastScaleUpTime=2023-02-20 19:20:56.429598603 +0000 UTC 
> m=+249612.380355315 lastScaleDownDeleteTime=2023-02-20 06:36:50.929515212 
> +0000 UTC m=+203766.880271921 lastScaleDownFailTime=2023-02-17 
> 22:01:33.693397034 +0000 UTC m=+49.644153730 scaleDownForbidden=true 
> isDeleteInProgress=false scaleDownInCooldown=true
> I0220 20:47:11.787999 1 static_autoscaler.go:228] Starting main loop
> I0220 20:47:11.792789 1 filter_out_schedulable.go:65] Filtering out 
> schedulables
> I0220 20:47:11.792953 1 scheduler_binder.go:829] All bound volumes for Pod 
> "<podname>" match with Node <node>"
> I0220 20:47:11.792981 1 filter_out_schedulable.go:118] Pod <podname> marked 
> as unschedulable can be scheduled on node <node> (based on hinting). Ignoring 
> in scale up.
> {quote}
>  
>  # Can Yunikorn introduce retries for such scenarios?
>  # Can pods be set to error state after retries?
> {{Note: pod name, nodename and ip masked above}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to