[ 
https://issues.apache.org/jira/browse/MYRIAD-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131200#comment-15131200
 ] 

DarinJ commented on MYRIAD-153:
-------------------------------

I've spend quite a bit of time debugging this issue and believe I found the 
root cause and a solution.  The root cause is that if a YARN container goes 
from ALLOCATED to RELEASED the AuxService Class is not aware of the container 
and therefore never killed.  The proposed solution is to intercept the (public) 
method completeContainers in the scheduler and check if we need to remove a 
myriad task when receiving a RELEASED event.  I've already added the intercept 
as part of the debugging set so should be able to have this patched soon.

> Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.
> -----------------------------------------------------------------------------
>
>                 Key: MYRIAD-153
>                 URL: https://issues.apache.org/jira/browse/MYRIAD-153
>             Project: Myriad
>          Issue Type: Bug
>            Reporter: Sarjeet Singh
>            Assignee: DarinJ
>             Fix For: Myriad 0.2.0
>
>         Attachments: Mesos_UI_screeshot_placeholder_tasks_running.png
>
>
> Observed the placeholder tasks for containers launched on FGS are still in 
> RUNNING state on mesos. These container tasks are not cleaned up properly 
> after job is finished completely.
> see screenshot attached for mesos UI with placeholder tasks still running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to