----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45961/#review143040 -----------------------------------------------------------
Haven't commented on tests. Will look at them together with other tests. --- Chatted with BenM and Anindya. I think we are able to simplify the code and minimize the specially handling of shared resources by separating the responsibilities between master, allocator and sorter this way: - Invariants: All usage of resources (shared or not) should have directly correponding allocations in the allocator and sorter. - Allocator: - The only thing special about shared resources in `allocate()` is that it's *available* once created regardless of past allocations (i.e., `total - allocated`), this adheres to the definition of shared resources and is easily explanable. When multiple instances of the same shared resources are allocated, all instances/copies are maintained in `slave.allocated` and other data structures. - We just need to tweak `updateAllocation` for the master's special allocation. - Sorter: Shared resources only needs a little special handling when creating `createStrippedScalarQuantity`, multiple instances can appear in a framework's allocation but they don't affect sorting. - Master: The only special handlings are that - Task validation allows tasks that consume shared resources not in `_offeredResources` as long as they are in the original `offeredResources` or they are created in the same ACCEPT call. - Tasks may use more shared resources than being allocated, this leads to a special allocation for additional instances of these shared resources. These semantics combined should result in simpler and smaller changes than maintaining other additional variables which are adjusted in special ways due to shared resources. src/common/resources.cpp (line 1224) <https://reviews.apache.org/r/45961/#comment209250> Add comment. // Because we currently only allow persistent volumes to be shared, the originally resource must be non-shared. src/common/resources.cpp (line 1266) <https://reviews.apache.org/r/45961/#comment209251> Add a comment: // Because we currently only allow persistent volumes to be shared, we return the resource to the non-shared state after destroy of the volume. src/master/allocator/mesos/hierarchical.hpp (line 327) <https://reviews.apache.org/r/45961/#comment209379> s/each copy/the number of copies/ src/master/allocator/mesos/hierarchical.hpp (line 328) <https://reviews.apache.org/r/45961/#comment209382> s/allocated/allocated to/ s/recovered/recovered from/ src/master/allocator/mesos/hierarchical.cpp (lines 604 - 607) <https://reviews.apache.org/r/45961/#comment209390> Now we have to modify this method to process the LAUNCH call and adjust the API docs accordingly. src/master/allocator/mesos/hierarchical.cpp (line 1429) <https://reviews.apache.org/r/45961/#comment209386> s/=/-/ src/master/allocator/sorter/drf/sorter.cpp (line 164) <https://reviews.apache.org/r/45961/#comment209389> s/no more/no existing/ src/master/allocator/sorter/drf/sorter.cpp (lines 165 - 166) <https://reviews.apache.org/r/45961/#comment209388> For this you want to exclude shared resources already allocated to the client but `resources` can have more instances of the same shared resources than `allocations[name].resources[slaveId].shared()`. src/master/master.hpp (line 324) <https://reviews.apache.org/r/45961/#comment209066> s/shared resources/each shared resource/ (because later in this sentence you are referencing "this shared resource" src/master/master.hpp (line 325) <https://reviews.apache.org/r/45961/#comment209065> Say "each copy corresponds to a running task using it" instead? src/master/master.hpp (line 331) <https://reviews.apache.org/r/45961/#comment208840> s/assigned/requested/ (assigned is a bit ambiguous, at this point resources are still not given to the task yet). src/master/master.hpp (line 340) <https://reviews.apache.org/r/45961/#comment208745> s/accept/ACCEPT/ src/master/master.hpp (line 341) <https://reviews.apache.org/r/45961/#comment208746> s/removing it/removing them/ src/master/master.hpp (line 344) <https://reviews.apache.org/r/45961/#comment209069> With `hashmap<FrameworkID, Resources> pendingResources` we are preventing DESTROY from affecting pending tasks of the same framework but letting DESTROY to go through under pending tasks of other frameworks (of the same role). If we use `pendingResources` it should be across frameworks right? But because we already need to add ``` multihashmap<FrameworkID, TaskID> pendingTasks; ``` for /r/47082/, we can just do that instead in this review, the number of tasks pending for each slave should be small enough to loop through quickly. Overall this should be more straightfoward than maintaining `pendingResources`. src/master/master.hpp (lines 817 - 822) <https://reviews.apache.org/r/45961/#comment209084> As commented below, I realized that we may not need this. We shouldn't calculate total task resources as it's done in `Master::addTask()`. src/master/master.hpp (lines 879 - 887) <https://reviews.apache.org/r/45961/#comment209229> We may not need this either if we can directly recover all resources. src/master/master.hpp (line 2483) <https://reviews.apache.org/r/45961/#comment209085> s/shared resources/each shared resource/ (because later in this sentence you are referencing "this shared resource" src/master/master.hpp (line 2484) <https://reviews.apache.org/r/45961/#comment209086> Say "each copy corresponds to a running task using it" instead? src/master/master.hpp (lines 2501 - 2502) <https://reviews.apache.org/r/45961/#comment208845> Prbably can't store them here as we don't aggregate identity-based resources across agents. Additionally it looks like we don't need them per comments below. src/master/master.cpp (line 3352) <https://reviews.apache.org/r/45961/#comment209254> Leave this method unchanged now that we no longer use `totalTaskResources()`. src/master/master.cpp (lines 3514 - 3517) <https://reviews.apache.org/r/45961/#comment209255> Move over /r/47082/'s `slave->pendingTasks.put()` logic. src/master/master.cpp (lines 3664 - 3665) <https://reviews.apache.org/r/45961/#comment209301> Shouldn't need to keep track of used shared resources. src/master/master.cpp (line 3672) <https://reviews.apache.org/r/45961/#comment208867> s/offer/offers/ src/master/master.cpp (line 3674) <https://reviews.apache.org/r/45961/#comment209137> s/cycle/call/ s/a shared resource/the same shared resources/ src/master/master.cpp (lines 3674 - 3675) <https://reviews.apache.org/r/45961/#comment209307> "Note that an offer has a single copy of a shared resource." - This is true but here since multiple offers are combined, there can be multiple instances of the same offered resources appearing in `offeredResources` and `offeredSharedResources`. src/master/master.cpp (lines 3676 - 3678) <https://reviews.apache.org/r/45961/#comment209139> It's not just persistent volume creation right? If the framework chooses to destroy a volume and then use it (by mistake), the mechanism is the same. Here what matters is that `offeredSharedResources` is updated by CREATE/DESTROY offer operations. We can just sayd that ``` Since only persistent volumes can be shared, `offeredSharedResources` can be updated by CREATE/DESTROY but not by RESERVE/UNRESERVE. Also, unlike `_offeredResources`, shared resources are not removed from `offeredSharedResources` when tasks are launched. ``` src/master/master.cpp (lines 3841 - 3842) <https://reviews.apache.org/r/45961/#comment209159> This should be safe for now but since we have delegated the logic to apply changes to the resources to `Resources::apply()`, here it's most consistent to use `offeredSharedResources = _offeredResources.shared()`. Also, 2 spaces. src/master/master.cpp (lines 3906 - 3907) <https://reviews.apache.org/r/45961/#comment209160> 2 spaces. Ditto: `offeredSharedResources = _offeredResources.shared()`. src/master/master.cpp (line 3936) <https://reviews.apache.org/r/45961/#comment209131> `slave->pendingResources[framework->id()]`: We shouldn't be only looking at the same frameworks for resources that are in use. src/master/master.cpp (line 3937) <https://reviews.apache.org/r/45961/#comment209130> Using `totalTaskResources` here is problematic: while this task is pending, the master could be informed that the executor has exited so the method `totalTaskResources()` would not include the executor's resources, therefore leaving some resources in `pendingResources`. Taking a step back it seems we actually **always** care about executor's (**intention** to use these) resources. Consider the following example: -> DESTROY arrives at `accept()`, gets sent to the authorizer. -> Task 1 arrives at `accept()` and it requests some shared resources in the executor but the executor is already running so the executor's resources are not added to pendingResources. -> Task 1 enters the authroizer. -> Master receives ExitedExecutorMessage for the executor. -> DESTROY arrives at `_accept()` and sees the executor is gone and destroys the volume. -> Task 1 enters `_accept()` and finds that the volume is detroyed. To prevent the DESTROY to go through we do need to always check the pending tasks' executor resources. As discussed, let's just iterate the list of pending tasks and look at their task resources and executor resources. src/master/master.cpp (line 3992) <https://reviews.apache.org/r/45961/#comment209309> s/accept/ACCEPT/ src/master/master.cpp (lines 4025 - 4027) <https://reviews.apache.org/r/45961/#comment209228> As discussed here I think we can do somethig like (just a sketch): ``` Offer::Operation operation; operation.set_type(Offer::Operation::LAUNCH); // To allow tasks to use more instances of shared resources // than being offered, we need to allocate more instances // of the shared resources when they are requested by the // task but no longer in `_offeredResources`. For now we // use `allocator->updateAllocation()` to pass these // additional resources as part of the LAUNCH operation. // TODO: This solution is temporary as the // allocator should be the entity that accepts and manages offers // after MESOS-4553 is implemented and then the logic that // allocates additional shared resources when accepting // the LAUNCH call should be moved to the allocator. Resources additionalShared; foreach (const Resource& resource, taskResources.shared()) { if (!_offeredResources.contains(resource)) { additionalShared += resource; } } task_.mutable_resources()->CopyFrom(additionalShared); operation.mutable_launch()->add_task_infos()->CopyFrom(task_); allocator->updateAllocation(framework->id(), slave->id, {operation}); ``` We can do this after sending RunTaskMessage so we can modify freely modify `task_`. Note that an ACCEPT call can have mutliple LAUNCH calls but `allocator->updateAllocation()` is invoked per Call. src/master/master.cpp (line 4070) <https://reviews.apache.org/r/45961/#comment209180> So, here besides recovering unused resources, here we can add the logic to "allocate additional instances of shared resources in the master". ``` allocator->updateAllocation(framework->id(), slave->id, launches); ``` src/master/master.cpp (lines 7326 - 7331) <https://reviews.apache.org/r/45961/#comment209018> So we first decided to have a wrapper method before it became apparent that we needed to recover all resources (with one exception) to the allocator. Under the new circumstance it seems that this wrapper would be unnecessary. src/master/validation.hpp (line 158) <https://reviews.apache.org/r/45961/#comment209378> This will not take pendingTasks on the agent as the last argument. - Jiang Yan Xu On July 19, 2016, 3:52 p.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45961/ > ----------------------------------------------------------- > > (Updated July 19, 2016, 3:52 p.m.) > > > Review request for mesos, Benjamin Mahler, Joris Van Remoortere, Michael > Park, and Jiang Yan Xu. > > > Bugs: MESOS-4431 > https://issues.apache.org/jira/browse/MESOS-4431 > > > Repository: mesos > > > Description > ------- > > o Each shared resource is accouted via its share count. This count is > updated based on the resource operations (such as add and subtract) > in various scenarios such as task launch and terminate at multiple > modules such as master, allocator, sorter, etc. > o Only allow DESTROY if there are no running or pending tasks using > the volume. However, if the volume is in a pending offer to one or > more frameworks, resind that offer before processing the DESTROY. > > > Diffs > ----- > > src/common/resources.cpp b1bd2784aefdebf91892638b40b215373b998574 > src/common/resources_utils.cpp 8e881a09500b0966b577e514077460f724868a8d > src/master/allocator/mesos/hierarchical.hpp > b72ba16277a3210e4d309b616d185a10e2029a66 > src/master/allocator/mesos/hierarchical.cpp > 7d4064535a20b93950f5a95eef1ad3f0d37d305b > src/master/allocator/sorter/drf/sorter.hpp > bc6bfb2d5d3b32d55be055a0514861b4e7d889bb > src/master/allocator/sorter/drf/sorter.cpp > ac85b327fc33d34246788e6a8c8bf5a486c61434 > src/master/http.cpp 0293e4956afb5968f84b516e3d4b5f6f858ef7a3 > src/master/master.hpp bade8af69f567341667b9907368207189d0dab0e > src/master/master.cpp 370fd8712062dc75bb81824cb99ccc7920acbf78 > src/master/quota_handler.cpp bf6a613a7bb3c62fd77d1ffde3170749d6c21fa2 > src/master/validation.hpp 43d876b00511d4ef652c2685d4950c994b5eef31 > src/master/validation.cpp 50ba372d1301d16d1738a3e4f4882b51f9ce06cd > src/master/weights_handler.cpp 9a901e71ba2fbf2ca1c02f658a72d44cfaa5ec62 > src/tests/master_validation_tests.cpp > 9eb82a569f7c48caa96d4d54a93b199ccac74385 > src/tests/sorter_tests.cpp b0e5ef8a55bbcca3553a221bd5691a9c801a04f7 > src/v1/resources.cpp 6d4ec75fbb7edab013563142aaf13d5302cc50af > > Diff: https://reviews.apache.org/r/45961/diff/ > > > Testing > ------- > > Tests successful. > > > Thanks, > > Anindya Sinha > >