Jie Yu created MESOS-1649:
-----------------------------

             Summary: Network isolator should tolerate slave crashes while 
doing isolate/cleanup.
                 Key: MESOS-1649
                 URL: https://issues.apache.org/jira/browse/MESOS-1649
             Project: Mesos
          Issue Type: Bug
            Reporter: Jie Yu
            Assignee: Jie Yu


A slave may crash while we are installing/removing filters. The slave recovery 
for the network isolator should tolerate those partially installed filters. 
Also, we want to avoid leaking a filter on host eth0 and host lo.

The current code cannot tolerate that, thus may cause the following error:

{noformat}
Failed to perform recovery: Collect failed: Failed to recover container 
d409a100-2afb-497c-864f-fe3002cf65d9 with pid 50405: No ephemeral ports found
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
       This ensures slave doesn't recover old live executors.
Step 2: Restart the slave.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to