[ https://issues.apache.org/jira/browse/YUNIKORN-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chaoran Yu updated YUNIKORN-1596: --------------------------------- Affects Version/s: 1.2.0 > Pods marked unschedulable when dynamic PVC times out > ---------------------------------------------------- > > Key: YUNIKORN-1596 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1596 > Project: Apache YuniKorn > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Praveen > Priority: Major > > We are seeing a behavior when a scheduled pod requesting for PVC times out, > its marked as unschedulable. There are no retries on such pod and remain in > 'pending' state. With pods in pending, autoscaler does not scale down nodes. > This seems similar to issue discussed here: > [https://github.com/kubernetes/autoscaler/issues/3409] > > {quote}Error from Yunikorn logs : > ERROR cache/context.go:527 Failed to bind pod volumes \{"podName": > "<PODNAME>", "nodeName": "<IP>", "dynamicProvisions": 1, "staticBindings": 0} > ... > ... > /workspace/pkg/cache/task.go:382 > 2023-02-20T00:02:22.368Z ERROR cache/task.go:265 task failed \{"appID": > "<APPID>", "taskID": "45981d91-e543-459b-9657-bdc03b57e26f", "reason": "bind > pod volumes failed, name: <NS/PODNAME>, binding volumes: timed out waiting > for the condition”} > {{}} > {quote} > > {{From Autoscalar logs}} > {quote}I0220 20:47:01.775653 1 static_autoscaler.go:502] Scale down status: > unneededOnly=true lastScaleUpTime=2023-02-20 19:20:56.429598603 +0000 UTC > m=+249612.380355315 lastScaleDownDeleteTime=2023-02-20 06:36:50.929515212 > +0000 UTC m=+203766.880271921 lastScaleDownFailTime=2023-02-17 > 22:01:33.693397034 +0000 UTC m=+49.644153730 scaleDownForbidden=true > isDeleteInProgress=false scaleDownInCooldown=true > I0220 20:47:11.787999 1 static_autoscaler.go:228] Starting main loop > I0220 20:47:11.792789 1 filter_out_schedulable.go:65] Filtering out > schedulables > I0220 20:47:11.792953 1 scheduler_binder.go:829] All bound volumes for Pod > "<podname>" match with Node <node>" > I0220 20:47:11.792981 1 filter_out_schedulable.go:118] Pod <podname> marked > as unschedulable can be scheduled on node <node> (based on hinting). Ignoring > in scale up. > {quote} > > # Can Yunikorn introduce retries for such scenarios? > # Can pods be set to error state after retries? > {{Note: pod name, nodename and ip masked above}} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org