Andrei Budnik created MESOS-8568: ------------------------------------ Summary: Command checks should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER` Key: MESOS-8568 URL: https://issues.apache.org/jira/browse/MESOS-8568 Project: Mesos Issue Type: Task Reporter: Andrei Budnik
After successful launch of a nested container via `LAUNCH_NESTED_CONTAINER_SESSION` in a checker library, it calls [waitNestedContainer |https://github.com/apache/mesos/blob/0a40243c6a35dc9dc41774d43ee3c19cdf9e54be/src/checks/checker_process.cpp#L657] for the container. Checker library [calls|https://github.com/apache/mesos/blob/0a40243c6a35dc9dc41774d43ee3c19cdf9e54be/src/checks/checker_process.cpp#L466-L487] `REMOVE_NESTED_CONTAINER` to remove a previous nested container before launching a nested container for a subsequent check. Hence, `REMOVE_NESTED_CONTAINER` call follows `WAIT_NESTED_CONTAINER` to ensure that the nested container has been terminated and can be removed/cleaned up. In case of failure, the library [doesn't call|https://github.com/apache/mesos/blob/0a40243c6a35dc9dc41774d43ee3c19cdf9e54be/src/checks/checker_process.cpp#L627-L636] `WAIT_NESTED_CONTAINER`. Despite the failure, the container might be launched and the following attempt to remove the container without call `WAIT_NESTED_CONTAINER` leads to errors like: {code:java} W0202 20:03:08.895830 7 checker_process.cpp:503] Received '500 Internal Server Error' (Nested container has not terminated yet) while removing the nested container '2b0c542c-1f5f-42f7-b914-2c1cadb4aeca.da0a7cca-516c-4ec9-b215-b34412b670fa.check-49adc5f1-37a3-4f26-8708-e27d2d6cd125' used for the COMMAND check for task 'node-0-server__e26a82b0-fbab-46a0-a1ea-e7ac6cfa4c91 {code} The checker library should always call `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)