[
https://issues.apache.org/jira/browse/YUNIKORN-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492936#comment-17492936
]
Wilfred Spiegelenburg commented on YUNIKORN-946:
------------------------------------------------
[~anuraagn] it would be good to revisit this with a build that has the fix for
YUNIKORN-876 added to this. A memory leak on the shim side was fixed which left
pod references around. That leak could have also left the pods in the core and
thus the web UI.
However there is a more likely candidate. YUNIKORN-766 which fixed the
applicationID ordering. Two different app IDs for different executors which
really belonged to the same app. That fact was mentioned in the slack channel
when [~ashutosh-pepper] saw the leak. I think that was dismissed way to quickly.
> Accounting resources for deleted executor pods
> ----------------------------------------------
>
> Key: YUNIKORN-946
> URL: https://issues.apache.org/jira/browse/YUNIKORN-946
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 0.11
> Reporter: Ashutosh Singh
> Priority: Critical
> Attachments: image-2021-11-16-23-17-42-819.png,
> image-2021-11-16-23-18-28-349.png
>
>
> Even when executors are deleted, YK UI shows that resources are consumed by
> the pod (the one which is already deleted). _kubectl get pods_ does not show
> the executor but YK UI shows the information of a deleted pod consuming
> resources even after few hours.
> It results into leaking cluster resources.
> Steps:
> # Run a spark application using k8s spark operator
> # Wait for executors to be in running state.
> # Delete the application using `kubectl delete sparkapplications <appName>`
> OR `kubectl delete {-}{{-}}f <yaml\{-}file>`
> # All the driver and executor pods would be deleted. check `kubectl get pods`
> # However, YK UI still shows some of the executors running and consuming
> resources. It leads to leak of the resource as they are considered as used
> and could not be used by pending pods.
> More details:
> [https://yunikornworkspace.slack.com/archives/CLNUW68MU/p1637126093006900]
> !image-2021-11-16-23-18-28-349.png|width=534,height=323!
>
> !image-2021-11-16-23-17-42-819.png|width=583,height=353!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]