[ https://issues.apache.org/jira/browse/YUNIKORN-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493801#comment-17493801 ]
Anuraag Nalluri commented on YUNIKORN-946: ------------------------------------------ We can conclude that this bug is caused by the issue fixed in YUNIKORN-776. To reach this conclusion, we built the scheduler for 2 commits – the ones preceding and following the merge of YUNIKORN-776. We ran spark-pi applications on both schedulers and supplied custom applicationId's which conflict with the default spark job IDs. Before YUNIKORN-776, we can see the application can be initially created under the latter while the completion event surfaces to the dashboard for the custom applicationId we provided. This means the api-server's delete pod informed the incorrect application, thereby leaving the hanging allocation under the spark job ID's application. In the commit following YUNIKORN-776, we started 3 spark-pi applications with a custom applicationIds. The allocation was both issued for and freed up from the provided applicationId in _all_ cases. This makes sense because the logic now always checks for applicationId first before the spark-generated app ID: [https://github.com/apache/incubator-yunikorn-k8shim/pull/288/files] Attached screenshots to this ticket to show both of these scenarios. Thank you [~ashutosh-pepper] for reporting and [~wilfreds] for providing additional context. > Accounting resources for deleted executor pods > ---------------------------------------------- > > Key: YUNIKORN-946 > URL: https://issues.apache.org/jira/browse/YUNIKORN-946 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler > Affects Versions: 0.11 > Reporter: Ashutosh Singh > Assignee: Anuraag Nalluri > Priority: Critical > Attachments: image-2021-11-16-23-17-42-819.png, > image-2021-11-16-23-18-28-349.png > > > Even when executors are deleted, YK UI shows that resources are consumed by > the pod (the one which is already deleted). _kubectl get pods_ does not show > the executor but YK UI shows the information of a deleted pod consuming > resources even after few hours. > It results into leaking cluster resources. > Steps: > # Run a spark application using k8s spark operator > # Wait for executors to be in running state. > # Delete the application using `kubectl delete sparkapplications <appName>` > OR `kubectl delete {-}{{-}}f <yaml\{-}file>` > # All the driver and executor pods would be deleted. check `kubectl get pods` > # However, YK UI still shows some of the executors running and consuming > resources. It leads to leak of the resource as they are considered as used > and could not be used by pending pods. > More details: > [https://yunikornworkspace.slack.com/archives/CLNUW68MU/p1637126093006900] > !image-2021-11-16-23-18-28-349.png|width=534,height=323! > > !image-2021-11-16-23-17-42-819.png|width=583,height=353! -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org