[ https://issues.apache.org/jira/browse/YUNIKORN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko resolved YUNIKORN-1197. ------------------------------------ Target Version: 1.1.0 Resolution: Fixed > Placeholders are immediately replaced during recovery > ----------------------------------------------------- > > Key: YUNIKORN-1197 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1197 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Labels: pull-request-available > Fix For: 1.1.0 > > > When we restart YK, some placeholders that are running are immediately > replaced, despite the fact that the timeout has not yet expired. > Example: > {noformat} > 2022-04-27T11:43:47.145Z INFO cache/context_recovery.go:182 node > state {"nodeName": "minikube", "nodeState": "Healthy"} > 2022-04-27T11:43:47.145Z INFO cache/context_recovery.go:196 nodes > recovery is successful {"recoveredNodes": 1} > 2022-04-27T11:43:47.145Z INFO shim/scheduler.go:226 scheduler > recovery succeed > 2022-04-27T11:43:47.145Z INFO cache/nodes.go:238 scheduler node > event {"name": "minikube", "current state ": "New", "transition to ": > "RecoverNode"} > 2022-04-27T11:43:47.145Z INFO shim/scheduler.go:356 No outstanding > apps found for a while {"timeout": "2m0s"} > 2022-04-27T11:43:47.145Z INFO cache/application.go:557 Skip > the reservation stage {"appID": "batch-sleep-job"} > 2022-04-27T11:43:47.145Z INFO cache/context.go:318 trigger > scheduler configuration reloading > 2022-04-27T11:43:48.148Z INFO objects/application.go:585 Ask > added successfully to application {"appID": "batch-sleep-job", "ask": > "ce3558cd-2a02-47d8-9bb7-93b2aadf9cc8", "placeholder": false, "pendingDelta": > "map[memory:10000000 vcore:10]"} > 2022-04-27T11:43:48.148Z INFO objects/application.go:585 Ask > added successfully to application {"appID": "batch-sleep-job", "ask": > "c88d0bba-ef94-4728-ad54-da30f72646ee", "placeholder": false, "pendingDelta": > "map[memory:10000000 vcore:10]"} > 2022-04-27T11:43:48.148Z INFO objects/application.go:585 Ask > added successfully to application {"appID": "batch-sleep-job", "ask": > "412d750d-f8c2-4b9c-a4cf-c7077c5384e1", "placeholder": false, "pendingDelta": > "map[memory:10000000 vcore:10]"} > 2022-04-27T11:43:48.148Z INFO objects/application.go:585 Ask > added successfully to application {"appID": "batch-sleep-job", "ask": > "54607708-a8f3-4ff3-b73c-210111a54625", "placeholder": false, "pendingDelta": > "map[memory:10000000 vcore:10]"} > 2022-04-27T11:43:48.148Z INFO objects/application.go:585 Ask > added successfully to application {"appID": "batch-sleep-job", "ask": > "f080aad1-6b08-4d83-8802-8dbf853a89cd", "placeholder": false, "pendingDelta": > "map[memory:10000000 vcore:10]"} > 2022-04-27T11:43:48.156Z INFO scheduler/partition.go:863 > scheduler replace placeholder processed {"appID": "batch-sleep-job", > "allocationKey": "ce3558cd-2a02-47d8-9bb7-93b2aadf9cc8", "UUID": > "5f0d5e0d-0668-4297-82ba-c8ebb585b0f7", "placeholder released UUID": > "312d7df9-000c-4035-9170-9ea96ef9e718"} > 2022-04-27T11:43:48.156Z INFO scheduler/partition.go:863 > scheduler replace placeholder processed {"appID": "batch-sleep-job", > "allocationKey": "c88d0bba-ef94-4728-ad54-da30f72646ee", "UUID": > "a80035cc-9751-4dc2-9a36-9a649ae50922", "placeholder released UUID": > "a2c072c7-3814-4464-bcd8-64f3e3b79b4e"} > 2022-04-27T11:43:48.156Z INFO scheduler/partition.go:863 > scheduler replace placeholder processed {"appID": "batch-sleep-job", > "allocationKey": "412d750d-f8c2-4b9c-a4cf-c7077c5384e1", "UUID": > "126f996f-ca79-4895-a593-ffa51a6fc40e", "placeholder released UUID": > "a48d2a0a-c9cc-446b-8f33-bf7952e5771c"} > 2022-04-27T11:43:48.156Z INFO scheduler/partition.go:863 > scheduler replace placeholder processed {"appID": "batch-sleep-job", > "allocationKey": "54607708-a8f3-4ff3-b73c-210111a54625", "UUID": > "d82c103e-85ce-4375-9448-75d251549326", "placeholder released UUID": > "84e6a8bc-42ab-45da-9af6-c4067b2a3561"} > 2022-04-27T11:43:48.156Z INFO scheduler/partition.go:863 > scheduler replace placeholder processed {"appID": "batch-sleep-job", > "allocationKey": "f080aad1-6b08-4d83-8802-8dbf853a89cd", "UUID": > "2441b758-4c12-44d8-ab70-6c6b3fb100de", "placeholder released UUID": > "ec4f534c-628a-4dc3-87d1-73f782da8c46"} > 2022-04-27T11:43:48.156Z INFO cache/application.go:675 try to > release pod from application {"appID": "batch-sleep-job", > "allocationUUID": "312d7df9-000c-4035-9170-9ea96ef9e718", "terminationType": > "PLACEHOLDER_REPLACED"} > 2022-04-27T11:43:48.168Z INFO cache/application.go:675 try to > release pod from application {"appID": "batch-sleep-job", > "allocationUUID": "a2c072c7-3814-4464-bcd8-64f3e3b79b4e", "terminationType": > "PLACEHOLDER_REPLACED"} > 2022-04-27T11:43:48.174Z INFO cache/application.go:675 try to > release pod from application {"appID": "batch-sleep-job", > "allocationUUID": "a48d2a0a-c9cc-446b-8f33-bf7952e5771c", "terminationType": > "PLACEHOLDER_REPLACED"} > 2022-04-27T11:43:48.180Z INFO cache/application.go:675 try to > release pod from application {"appID": "batch-sleep-job", > "allocationUUID": "84e6a8bc-42ab-45da-9af6-c4067b2a3561", "terminationType": > "PLACEHOLDER_REPLACED"} > 2022-04-27T11:43:48.199Z INFO cache/application.go:675 try to > release pod from application {"appID": "batch-sleep-job", > "allocationUUID": "ec4f534c-628a-4dc3-87d1-73f782da8c46", "terminationType": > "PLACEHOLDER_REPLACED"} > 2022-04-27T11:43:49.671Z INFO general/general.go:285 task completes > {"appType": "general", "namespace": "default", "podName": > "tg-groupa-batch-sleep-job-3", "podUID": > "84e6a8bc-42ab-45da-9af6-c4067b2a3561", "podStatus": "Failed"} > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org