> On May 10, 2017, 11:24 p.m., Neil Conway wrote: > > My apologies for the delay in reviewing this. > > > > High-level comments: > > > > (a) Can we improve the description of the problem in the commit summary? It > > took me quite a while to figure out what is actually going on here. My > > understanding is: > > > > (1) Agent re-registers > > (2) Master sends `ShutdownFrameworkMessage` for non-PA frameworks on the > > agent > > (3) Master offers agent resources to framework > > (4) Framework launches new task on agent _before the agent has finished > > shutting down the framework_ > > (5) Agent ignores launch task because it believes the framework is still > > terminating. > > > > This is a race condition, because if the agent finished shutting down the > > framework (in #4) before the launch task was received, there would not be a > > problem. Is my understanding correct? > > > > (2) I'd prefer a new unit test that constructs exactly this scenario, > > rather than changing existing unit tests. > > > > (3) The new behavior is that the framework will receive `TASK_KILLED` for > > any non-PA tasks running on a partitioned agent that re-registers. This > > does not seem ideal, because `TASK_KILLED` _normally_ corresponds to a > > framework-initiated `killTask` operation. > > > > (4) Thinking out loud, what if a custom executor ignores the `killTask` > > request? When shutting down a framework, it seems we forcibly destroy the > > container (via `containerizer->destroy()`), if the executor doesn't exit > > promptly upon receiving the framework shutdown request. AFAIK we don't have > > similar logic for `killTask` requests. > > > > 3 + 4 suggests that maybe we want a different way to terminate the task on > > the agent -- let's discuss. > > Megha Sharma wrote: > Summarizing the 2 approaches we talked about. > > Approach 1: ShutdownFrameworkMessage > > 1. Upon agent re-registration, master will add tasks even for non-PA > frameworks on this agent. This is needed by the master to do correct resource > accounting and not offer resources already in use on this agent. We need to > mutate the TaskState on the Task before adding them to the master's data > structures since the TaskState might be non-terminal when the agent sends > these tasks with ReregisterSlaveMessage. And the master has already sent > TASK_LOST for these tasks to the frameworks so we need to set the TaskState > to TASK_LOST so that any future reconciliations with the framework doesn't > have this task transitioning from TASK_LOST to TASK_RUNNNG/TASK_STARTING. > This is to avoid unnecessary confusion about task state as observed by the > framework but indeed this could have happened with non-strict registry as > well where the framework can actually receive a non terminal task state > update after receiving a TASK_LOST for the same task in the past. > > 2. When the agent re-registers, the master will continue to send a > ShutdownFrameworkMessage to the agent to kill the tasks pertaining to non-PA > frameworks on the agent as it does today. An additional optional field will > be added to the ShutdownFrameworkMessage to indicate whether or not the > shutdown was initiated internally. > > 3. During framework shutdown the state of the framework is set to > Framework::TERMINATING which prevents it from launching new tasks. Here, > since the framework is not really terminating so in order to allow it to > launch new tasks, the agent will not set the state to terminating if the > ShutdownFrameworkMessage is generated internally. > > 4. The framework shutdown today doesn't generate any status updates which > needs to change. The status updates will be sent if the framework shutdown is > triggered internally, this is needed to remove the tasks of non-PA frameworks > that got added when the agent re-registered (1). > > Approach 2: Do not shutdown non-PA framework when agent re-registers and > let the frameworks make the decision on what needs to be done when they > receive non-terminal status updates for tasks for which they have already > received a TASK_LOST. This hopefully won't break any frameworks since it > could have happened in the past with non-strict registry as well and > frameworks should be resilient enough to handle this scenario. > > Let me know if I have missed anything. Personally, I am in favor of > approach 1 as it seems less catastrophic for the frameworks. What do you > think? > > Megha Sharma wrote: > The proposed solution to fix the race between new task launches and > shutdown framework on the agent, was to not kill the non-partition aware > tasks when an unreachable agent re-registers with the master. So now as far > as Mesos internals are concerned, the tasks from non-partition aware > frameworks are to be treated the same way as partition aware tasks and one > way to do it cleanly in Mesos was to transition such non-partition aware > tasks to TASK_UNREACHABLE state instead of TASK_LOST when the agent becomes > unreachable. Internally such tasks will be in Framework#unreachableTasks > cache instead of Framework#completedTasks and their state would be > TASK_UNREACHABLE but to be backwards compatible we will do some > transformations when the data is being exposed to the users so there is no > difference in how these tasks are presented before and after this patch. I > will post the patch soon but do let me know what do you think about the > approach. > > Megha Sharma wrote: > Since we are now adding the tasks from non-partition aware frameworks to > unreachableTasks cache instead of completedTasks when the agent goes > unreachable so we needed to do some transformations when the completed_tasks > and unreachable_tasks metric are being presented to the user. To achieve this > we needed an additional cache nonParitionAwareTasks to remember the taskIDs > for non-partition aware tasks which have been added to unreachableTasks so > they can be filtered out when the unreachable_tasks are shown and added when > we present the completed_tasks (code snippet below). > ``` > writer->field("completed_tasks", [this](JSON::ArrayWriter* writer) { > foreach (const Owned<Task>& task, framework_->completedTasks) { > // Skip unauthorized tasks. > if (!authorizeTask_->accept(*task.get(), framework_->info)) { > continue; > } > > writer->element(*task.get()); > } > foreach (const TaskID& taskID, framework_->nonParitionAwareTasks) { > if (framework_->unreachableTasks.contains(taskID)) { > Task task = > framework_->transformNonPartitionAwareTask( > *framework_->unreachableTasks.at(taskID).get()); > // Skip unauthorized tasks. > if (!authorizeTask_->accept(task, framework_->info)) { > continue; > } > writer->element(task); > } > } > }); > ```
I have created a new Review Request https://reviews.apache.org/r/61473. We can continue discussing the approach here. - Megha ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/58898/#review174194 ----------------------------------------------------------- On Aug. 6, 2017, 4:35 a.m., Megha Sharma wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/58898/ > ----------------------------------------------------------- > > (Updated Aug. 6, 2017, 4:35 a.m.) > > > Review request for mesos, Neil Conway and Jiang Yan Xu. > > > Bugs: MESOS-7215 > https://issues.apache.org/jira/browse/MESOS-7215 > > > Repository: mesos > > > Description > ------- > > Master will not kill the tasks for non-Partition aware frameworks > when an unreachable agent re-registers with the master. > Master used to send a ShutdownFrameworkMessages to the agent > to kill the tasks from non partition aware frameworks including the > ones that are still registered which was problematic because the offer > from this agent could still go to the same framework which could then > launch new tasks. The agent would then receive tasks of the same > framework and ignore them because it thinks the framework is shutting > down. The framework is not shutting down of course, so from the master > and the scheduler’s perspective the task is pending in STAGING forever > until the next agent reregistration, which could happen much later. > This commit fixes the problem by not shutting down the non-partition > aware frameworks on such an agent. > > > Diffs > ----- > > src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 > src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 > src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 > src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 > > > Diff: https://reviews.apache.org/r/58898/diff/3/ > > > Testing > ------- > > make check > > > Thanks, > > Megha Sharma > >