[ https://issues.apache.org/jira/browse/MESOS-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gastón Kleiman updated MESOS-8468: ---------------------------------- Shepherd: Vinod Kone (was: Gilbert Song) > `LAUNCH_GROUP` failure tears down the default executor. > ------------------------------------------------------- > > Key: MESOS-8468 > URL: https://issues.apache.org/jira/browse/MESOS-8468 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0 > Reporter: Chun-Hung Hsiao > Assignee: Gastón Kleiman > Priority: Critical > Labels: default-executor, mesosphere > > The following code in the default executor > (https://github.com/apache/mesos/blob/12be4ba002f2f5ff314fbc16af51d095b0d90e56/src/launcher/default_executor.cpp#L525-L535) > shows that if a `LAUNCH_NESTED_CONTAINER` call is failed (say, due to a > fetcher failure), the whole executor will be shut down: > {code:cpp} > // Check if we received a 200 OK response for all the > // `LAUNCH_NESTED_CONTAINER` calls. Shutdown the executor > // if this is not the case. > foreach (const Response& response, responses.get()) { > if (response.code != process::http::Status::OK) { > LOG(ERROR) << "Received '" << response.status << "' (" > << response.body << ") while launching child container"; > _shutdown(); > return; > } > } > {code} > This is not expected by a user. Instead, one would expect that a failed > `LAUNCH_GROUP` won't affect other task groups launched by the same executor, > similar to the case that a task failure only takes down its own task group. > We should adjust the semantics to make a failed `LAUNCH_GROUP` not take down > the executor and affect other task groups. -- This message was sent by Atlassian JIRA (v7.6.3#76005)