-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47082/#review134535
-----------------------------------------------------------



I think for this patch we should at least have one test case where a slave lost 
message is not sent.
Perhaps two frameworks connected the master and the slave lost message is only 
sent to one because the other doesn't have any business with that slave?


src/master/master.hpp (line 24)
<https://reviews.apache.org/r/47082/#comment199611>

    Use a hashset (already included)?



src/master/master.hpp (line 666)
<https://reviews.apache.org/r/47082/#comment199610>

    Use a hashset since no ordering is required?



src/master/master.cpp (line 96)
<https://reviews.apache.org/r/47082/#comment199612>

    Remove this if we use a hashset.



src/master/master.cpp (lines 6545 - 6546)
<https://reviews.apache.org/r/47082/#comment199622>

    For 1: Also there are inverse offers.
    
    And there is also 
    3. They have running tasks on this slave.
    4. They have pending tasks for this slave.
    
    You did add framworks for running tasks below when you go through 
`slave->tasks` but I think here we should list all cases for which we send the 
message in one place.
    
    I commented on the pending tasks in one of the tests.



src/master/master.cpp (line 6550)
<https://reviews.apache.org/r/47082/#comment199598>

    This is about a single slave so instead of looping through all registered 
frameworks, we can look for things in this particular slave's data members.



src/master/master.cpp (line 6551)
<https://reviews.apache.org/r/47082/#comment199603>

    I don't feel that this boolean is necessary, we can easily check if a 
framework is added by checking if it's in the set (O(1) if hashset), right?



src/master/master.cpp (lines 6555 - 6561)
<https://reviews.apache.org/r/47082/#comment199602>

    We can loop through `slave->offers` for this and we should look at 
`slave->inverseOffers` as well.
    
    Actually, we already go through offers and inverseOffers below so we can 
just add `affectedFrameworks.insert(framework->id());` there (each with a 
comment).



src/master/master.cpp (lines 6566 - 6570)
<https://reviews.apache.org/r/47082/#comment199614>

    Looks like this is no longer applicable in the suggested approach.



src/master/master.cpp (lines 6576 - 6579)
<https://reviews.apache.org/r/47082/#comment199613>

    From `slave->checkpointedResource.reservations()` we can get the list of 
roles and from `activeRoles` we can get the list of frameworks for each role.



src/master/master.cpp (line 6722)
<https://reviews.apache.org/r/47082/#comment199712>

    Also put the slaveID in this log?
    
    e.g., something like this
    
    ```
    LOG(WARNING) << "Dropping LostSlaveMessage about agent " << slaveInfo.id() 
                 << " for unknown framework " << frameworkId;
    ```



src/tests/master_authorization_tests.cpp (lines 338 - 339)
<https://reviews.apache.org/r/47082/#comment199711>

    If we don't expect it to be call, we should do
    
    ```
    EXPECT_CALL(sched, slaveLost(&driver, _))
      .Times(0);
    ```
    
    Here and elsewhere.
    
    But here I think we should fix this to have the framework receive the slave 
lost messsage. See the comment below.



src/tests/master_authorization_tests.cpp (lines 346 - 348)
<https://reviews.apache.org/r/47082/#comment199675>

    So if the task is stuck in the `pendingTasks` a slave lost message is not 
sent but later a TASK_LOST is sent with reason `REASON_SLAVE_REMOVED`... We 
should handle this case the same way as the other cases where we do send slave 
lost IMO.
    
    We probably need to add a `pendingTasks` hashmap in the slave as well and 
check against that map during `removeSlave()`, thoughts?


- Jiang Yan Xu


On May 6, 2016, 7:27 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/47082/
> -----------------------------------------------------------
> 
> (Updated May 6, 2016, 7:27 p.m.)
> 
> 
> Review request for mesos and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5143
>     https://issues.apache.org/jira/browse/MESOS-5143
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When a slave is removed, master sends a LostSlaveMessage to affected
> frameworks only (instead of all registered frameworks). An affected
> framework is a framework which satisfied one or more conditions of
> the following:
> 
> 1. There are running tasks on this slave belonging to the framework.
> 2. Reserved resources on the slave have a matching role with the
>    role of the framework.
> 3. There are pending offers from this slave for the framework.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 3e55114ee7866e06513071e86e15608099dae052 
>   src/master/master.cpp 6d3e0f7c634690a35eec1ce827b705e04c3af87e 
>   src/tests/master_authorization_tests.cpp 
> 804b39a269c09df9f6c0bbdf6f8b53921ac09ce8 
>   src/tests/master_tests.cpp 8e00753fcbcd0cae1d08aad387b08aa17c7f63ad 
> 
> Diff: https://reviews.apache.org/r/47082/diff/
> 
> 
> Testing
> -------
> 
> All existing and modified tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>

Reply via email to