Jie Yu created MESOS-1649: ----------------------------- Summary: Network isolator should tolerate slave crashes while doing isolate/cleanup. Key: MESOS-1649 URL: https://issues.apache.org/jira/browse/MESOS-1649 Project: Mesos Issue Type: Bug Reporter: Jie Yu Assignee: Jie Yu
A slave may crash while we are installing/removing filters. The slave recovery for the network isolator should tolerate those partially installed filters. Also, we want to avoid leaking a filter on host eth0 and host lo. The current code cannot tolerate that, thus may cause the following error: {noformat} Failed to perform recovery: Collect failed: Failed to recover container d409a100-2afb-497c-864f-fe3002cf65d9 with pid 50405: No ephemeral ports found To remedy this do as follows: Step 1: rm -f /var/lib/mesos/meta/slaves/latest This ensures slave doesn't recover old live executors. Step 2: Restart the slave. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)