> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote: > > src/slave/slave.cpp > > Lines 8195 (patched) > > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195> > > > > Let's validate that there is no task using the resources provided by > > this RP before doing the removal, or fail the call. > > Benjamin Bannier wrote: > I believe this should be treated as orthogonal issue. It might e.g., > happen that the user triggering the RP removal cannot kill tasks or prevent > them from being rescheduled. I'd suggest to leaves as is. > > WDYT? > > Chun-Hung Hsiao wrote: > The scenario I am worried about is that when the RP is marked as gone, > the RP may fail to tear itself down if there is any task using its resources. > > Benjamin Bannier wrote: > We don't seem to have a good story on incompatible resource changes ATM. > If we e.g., update a RP config in an incompatible way, we make not attempt to > detect or rectify this in any way, either. At least in the agent we were in > general we were operating under the assumption that tasks would die on > incompatible changes to their resources. > > I'd suggest to drop this issue for now and potentially file a JIRA to > track draining and situations where it might be needed. > > Benjamin Bannier wrote: > Dropping this, see above comment. > > Chun-Hung Hsiao wrote: > Reopening this. Here's my suggestion: > > 1. Create a ticket for a proper design to drain the resource provider. > 2. For now, let's fail the call here if > `slave->resourceProviders.at(resourceProviderId)->totalResources` is not > empty. In other words, the operator must first reconfigure the resource > provider to "drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` > call.
Okay. 1. I created https://issues.apache.org/jira/browse/MESOS-9522. 2. We had discussed this some time ago, and I think agreed that preventing operators from marking resource providers as gone whenever a task was running on them was not good (especially if operators cannot drain the RP). We have prior art in `MARK_AGENT_GONE` which asks operators to drain agent before marking as gone. I would suggest for now with MESOS-9522 not implemented we allow operators to mark resource providers as gone even if some of their resources are in use; if needed they can drain them by filtering all tasks using their resources. This also wouldn't expose a race. Does this address your concerns? - Benjamin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68147/#review207559 ----------------------------------------------------------- On Jan. 15, 2019, 3:53 p.m., Benjamin Bannier wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68147/ > ----------------------------------------------------------- > > (Updated Jan. 15, 2019, 3:53 p.m.) > > > Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan > Schlicht. > > > Bugs: MESOS-8403 > https://issues.apache.org/jira/browse/MESOS-8403 > > > Repository: mesos > > > Description > ------- > > This patch adds support for triggering permanent removal of local > resource providers. We also add authorization and tests as part of this > patch. > > > Diffs > ----- > > include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 > include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f > src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a > src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 > src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 > src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c > src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 > src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb > src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e > src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 > > > Diff: https://reviews.apache.org/r/68147/diff/12/ > > > Testing > ------- > > `make check` > > > Thanks, > > Benjamin Bannier > >