[jira] [Issue Comment Deleted] (MESOS-2601) Tasks are not removed after recovery from slave and mesos containerizer

Marco Massenzio (JIRA) Fri, 24 Apr 2015 10:55:51 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marco Massenzio updated MESOS-2601:
-----------------------------------
    Comment: was deleted

(was: [~air] today mentioned that fixing this one is critical for DCOS release: 
can [~tnachen] provide us with an update on progress?
I see that [his fix|https://reviews.apache.org/r/33257/] was committed, is this 
going to be part of r0.22? 0.22.1?)

> Tasks are not removed after recovery from slave and mesos containerizer
> -----------------------------------------------------------------------
>
>                 Key: MESOS-2601
>                 URL: https://issues.apache.org/jira/browse/MESOS-2601
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, slave
>    Affects Versions: 0.22.1
>            Reporter: Timothy Chen
>            Assignee: Timothy Chen
>
> We've seen in our test cluster that tasks that were launched with the mesos 
> containerizer are recovered after slave restart, but actual command process 
> is not running anymore and the checkpointed executor is not marked as 
> completed.
> The Mesos containerizer recovers and all the isolators couldn't recover the 
> task, but the containerizer itself is somehow never removed and the monitor 
> kept calling usage on the containerizer.
> Relevant log lines from the beginning of slave recovery:
> I0408 18:06:33.261379 32504 slave.cpp:577] Successfully attached file 
> '/hdd/mesos/slave/slaves/20150401-160104-251662508-5050-2197-S1/frameworks/20141222-194154-218108076-5050-4125-0004/executors/ct:1427921848104:0:EM
>  DataDog Uploader:/runs/990741ed-909e-49cc-83f8-be63298872da'
> ...
> I0408 18:06:36.583277 32511 containerizer.cpp:350] Recovering container 
> '990741ed-909e-49cc-83f8-be63298872da' for executor 'ct:1427921848104:0:EM 
> DataDog Uploader:' of framework 20141222-194154-218108076-5050-4125-0004
> ....
> I0408 18:06:37.017122 32511 linux_launcher.cpp:162] Couldn't find freezer 
> cgroup for container 990741ed-909e-49cc-83f8-be63298872da, assuming already 
> destroyed
> W0408 18:06:37.074916 32496 cpushare.cpp:199] Couldn't find cgroup for 
> container 990741ed-909e-49cc-83f8-be63298872da
> I0408 18:06:37.075173 32486 mem.cpp:158] Couldn't find cgroup for container 
> 990741ed-909e-49cc-83f8-be63298872da
> E0408 18:06:37.092279 32496 containerizer.cpp:1136] Error in a resource 
> limitation for container 990741ed-909e-49cc-83f8-be63298872da: Unknown 
> container
> I0408 18:06:37.092643 32496 containerizer.cpp:906] Destroying container 
> '990741ed-909e-49cc-83f8-be63298872da'
> W0408 18:06:37.229626 32501 containerizer.cpp:807] Ignoring update for 
> currently being destroyed container: 990741ed-909e-49cc-83f8-be63298872da
> W0408 18:06:38.129873 32484 containerizer.cpp:844] Skipping resource 
> statistic for container 990741ed-909e-49cc-83f8-be63298872da because: Unknown 
> container
> W0408 18:06:38.129909 32484 containerizer.cpp:844] Skipping resource 
> statistic for container 990741ed-909e-49cc-83f8-be63298872da because: Unknown 
> container



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (MESOS-2601) Tasks are not removed after recovery from slave and mesos containerizer

Reply via email to