Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-10 Thread Joris Van Remoortere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review123064
---


Ship it!




Ship It!

- Joris Van Remoortere


On March 9, 2016, 5:38 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 9, 2016, 5:38 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp a3ad57a1c3f8a01aa609b28c12825670bb243387 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-08 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/
---

(Updated 三月 9, 2016, 5:38 a.m.)


Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.


Bugs: MESOS-4831
https://issues.apache.org/jira/browse/MESOS-4831


Repository: mesos


Description
---

There is a bug when setting host maintain with http endpoint: 
https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
The logic is as this:
1) Get all host list from maintain window and put it to updated hashmap.
2) If the machine in was in updated was also in master->machines, call master 
updateUnavailability to trigger recoverResources, updateUnavailability etc in 
allocator
3) Otherwise, clear the unavailabity time window for the machine.
4) Update each new machines in updated to call master updateUnavailability

But the logic in step 4) is getting all machines from the schedule windows but 
not the machines that is new to the cluster, this caused master get two 
updateUnavailability calls for a machine in the updated hashmap.

The fix is filter machines in updated hashmap when handling new machines.


Diffs (updated)
-

  src/master/http.cpp a3ad57a1c3f8a01aa609b28c12825670bb243387 
  src/tests/master_maintenance_tests.cpp 
3faa8136cf57276295553910319480028f433e4c 

Diff: https://reviews.apache.org/r/44258/diff/


Testing
---

make
make check
 ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose


Thanks,

Guangya Liu



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-08 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/
---

(Updated 三月 9, 2016, 1:47 a.m.)


Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.


Bugs: MESOS-4831
https://issues.apache.org/jira/browse/MESOS-4831


Repository: mesos


Description
---

There is a bug when setting host maintain with http endpoint: 
https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
The logic is as this:
1) Get all host list from maintain window and put it to updated hashmap.
2) If the machine in was in updated was also in master->machines, call master 
updateUnavailability to trigger recoverResources, updateUnavailability etc in 
allocator
3) Otherwise, clear the unavailabity time window for the machine.
4) Update each new machines in updated to call master updateUnavailability

But the logic in step 4) is getting all machines from the schedule windows but 
not the machines that is new to the cluster, this caused master get two 
updateUnavailability calls for a machine in the updated hashmap.

The fix is filter machines in updated hashmap when handling new machines.


Diffs (updated)
-

  src/master/http.cpp a3ad57a1c3f8a01aa609b28c12825670bb243387 
  src/tests/master_maintenance_tests.cpp 
3faa8136cf57276295553910319480028f433e4c 

Diff: https://reviews.apache.org/r/44258/diff/


Testing
---

make
make check
 ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose


Thanks,

Guangya Liu



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-08 Thread Anand Mazumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review122623
---




src/tests/master_maintenance_tests.cpp (line 481)


hmmm .. shouldn't we do a `Clock::pause/settle` here to ensure that the 
master does not send another inverse offer to us i.e. the original issue after 
L481? 

Something like this:

```cpp
event = events.get();

Clock::pause();
Clock::settle();

ASSERT_TRUE(event.isPending());
```

What do you think?


- Anand Mazumdar


On March 5, 2016, 1:23 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 5, 2016, 1:23 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-08 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review122610
---


Ship it!




Ship It!

- Joseph Wu


On March 4, 2016, 5:23 p.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 4, 2016, 5:23 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-04 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/
---

(Updated March 5, 2016, 1:23 a.m.)


Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.


Bugs: MESOS-4831
https://issues.apache.org/jira/browse/MESOS-4831


Repository: mesos


Description
---

There is a bug when setting host maintain with http endpoint: 
https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
The logic is as this:
1) Get all host list from maintain window and put it to updated hashmap.
2) If the machine in was in updated was also in master->machines, call master 
updateUnavailability to trigger recoverResources, updateUnavailability etc in 
allocator
3) Otherwise, clear the unavailabity time window for the machine.
4) Update each new machines in updated to call master updateUnavailability

But the logic in step 4) is getting all machines from the schedule windows but 
not the machines that is new to the cluster, this caused master get two 
updateUnavailability calls for a machine in the updated hashmap.

The fix is filter machines in updated hashmap when handling new machines.


Diffs (updated)
-

  src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
  src/tests/master_maintenance_tests.cpp 
3faa8136cf57276295553910319480028f433e4c 

Diff: https://reviews.apache.org/r/44258/diff/


Testing
---

make
make check
 ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose


Thanks,

Guangya Liu



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-04 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review122161
---




src/master/http.cpp (lines 2029 - 2030)


s/due to/since/
s/trigger inverse/trigger an inverse/


- Joseph Wu


On March 4, 2016, 4:34 p.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 4, 2016, 4:34 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-04 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/
---

(Updated March 5, 2016, 12:34 a.m.)


Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.


Bugs: MESOS-4831
https://issues.apache.org/jira/browse/MESOS-4831


Repository: mesos


Description
---

There is a bug when setting host maintain with http endpoint: 
https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
The logic is as this:
1) Get all host list from maintain window and put it to updated hashmap.
2) If the machine in was in updated was also in master->machines, call master 
updateUnavailability to trigger recoverResources, updateUnavailability etc in 
allocator
3) Otherwise, clear the unavailabity time window for the machine.
4) Update each new machines in updated to call master updateUnavailability

But the logic in step 4) is getting all machines from the schedule windows but 
not the machines that is new to the cluster, this caused master get two 
updateUnavailability calls for a machine in the updated hashmap.

The fix is filter machines in updated hashmap when handling new machines.


Diffs (updated)
-

  src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
  src/tests/master_maintenance_tests.cpp 
3faa8136cf57276295553910319480028f433e4c 

Diff: https://reviews.apache.org/r/44258/diff/


Testing
---

make
make check
 ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose


Thanks,

Guangya Liu



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-04 Thread Guangya Liu


> On March 4, 2016, 8:28 p.m., Joris Van Remoortere wrote:
> > src/master/http.cpp, lines 2026-2027
> > 
> >
> > Can you explain how machines going from `UP` to `DOWN` are handled in 
> > the next loop?
> > I see logic for `UP` to `DRAINING` in the next loop.
> > 
> > Also missing a backtick after `UP`

My bad, it is from `UP` to `DRAINING`.


> On March 4, 2016, 8:28 p.m., Joris Van Remoortere wrote:
> > src/master/http.cpp, lines 2031-2033
> > 
> >
> > For some of these early exit conditions, does it make sense to add 
> > `CHECK`s (and maybe event comments) to document why we are exiting?
> > Stating *that* we are exiting less helpful to readers than *why*.
> > 
> > I think the implied invariant here (which we should call out 
> > explicitly) is that any machine should only be "touched" by 1 of the 2 
> > loops here. The exit conditions between them are meant to enforce this 
> > exclusion?

I was adding some comments to address any machine should only be "touched" by 1 
of the 2 loops here. Can you please show more for how we can add `CHECK` here? 
I did not found a good way to add `CHECK` for here but only by checking via 
some `if` conditions.


- Guangya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review122146
---


On March 4, 2016, 2:10 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 4, 2016, 2:10 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-04 Thread Joris Van Remoortere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review122146
---




src/master/http.cpp (line 2016)


can we rename this to `unavailabilities`?
`updated` suggests only ones that have been modified, which may be why we 
missed this problem before.



src/master/http.cpp (lines 2026 - 2027)


Can you explain how machines going from `UP` to `DOWN` are handled in the 
next loop?
I see logic for `UP` to `DRAINING` in the next loop.

Also missing a backtick after `UP`



src/master/http.cpp (line 2028)


Comments should be in sentence form: `merge` -> `Merge`.



src/master/http.cpp (lines 2031 - 2033)


For some of these early exit conditions, does it make sense to add `CHECK`s 
(and maybe event comments) to document why we are exiting?
Stating *that* we are exiting less helpful to readers than *why*.

I think the implied invariant here (which we should call out explicitly) is 
that any machine should only be "touched" by 1 of the 2 loops here. The exit 
conditions between them are meant to enforce this exclusion?


- Joris Van Remoortere


On March 4, 2016, 2:10 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 4, 2016, 2:10 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-03 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/
---

(Updated 三月 4, 2016, 2:10 a.m.)


Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.


Bugs: MESOS-4831
https://issues.apache.org/jira/browse/MESOS-4831


Repository: mesos


Description
---

There is a bug when setting host maintain with http endpoint: 
https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
The logic is as this:
1) Get all host list from maintain window and put it to updated hashmap.
2) If the machine in was in updated was also in master->machines, call master 
updateUnavailability to trigger recoverResources, updateUnavailability etc in 
allocator
3) Otherwise, clear the unavailabity time window for the machine.
4) Update each new machines in updated to call master updateUnavailability

But the logic in step 4) is getting all machines from the schedule windows but 
not the machines that is new to the cluster, this caused master get two 
updateUnavailability calls for a machine in the updated hashmap.

The fix is filter machines in updated hashmap when handling new machines.


Diffs (updated)
-

  src/master/http.cpp 8276baa538eb4d2aaf54cc1aa516bffaadacc4dd 
  src/tests/master_maintenance_tests.cpp 
3faa8136cf57276295553910319480028f433e4c 

Diff: https://reviews.apache.org/r/44258/diff/


Testing
---

make
make check
 ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose


Thanks,

Guangya Liu



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-03 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review121923
---




src/master/http.cpp (lines 1996 - 1999)


Instead of these two inner comments, can you instead have an overall 
comment like:
```
// Update the unavailability for each existing machine, except for machines 
going from `UP to `DOWN` (handled in the next loop).  
```

Also, can you add a TODO here, to merge this logic with 
`Master::updateUnavailability`?  (Having it in two places results in more 
conditionals.)



src/master/http.cpp (lines 2014 - 2016)


These two bullet points say the same thing.


- Joseph Wu


On March 2, 2016, 10:05 p.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 2, 2016, 10:05 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-02 Thread Klaus Ma


> On March 2, 2016, 10:43 p.m., Klaus Ma wrote:
> > src/master/http.cpp, lines 1999-2000
> > 
> >
> > I think we can just remove `master->updateUnavailability(id, 
> > updated[id]);` here, so other machine will `UP` and scheduler will be 
> > updated in next loop.
> > 
> > 
> > Futhur more, we can avoid the loop for update. The draft code will be:
> > 
> > ```
> > hashmap updated;
> > 
> > // delete the loop for "updated[id] = window.unavailability();"
> > 
> > foreach (const mesos::maintenance::Window& window, schedule.windows()) {
> >   foreach (const MachineID& id, window.machine_ids()) {
> > ...
> > master->updateUnavailability(id, window.unavailability());
> > updated[id] = window.unavailability();
> >   }
> > }
> > 
> > foreachkey (const MachineID& id, utils::copy(master->machines)) {
> >   if (updated.contains(id)) {
> > continue;
> >   }
> > 
> >   // Transition each removed machine back to the `UP` mode and remove 
> > the
> >   // unavailability.
> >   master->machines[id].info.set_mode(MachineInfo::UP);
> >   master->updateUnavailability(id, None());
> > }
> > ```
> 
> Joseph Wu wrote:
> (Wish I could comment inline on a comment...)
> 
> Your two loops won't update the `Unavailability` for machines that have 
> been brought DOWN previously.  That's why we had 
> `master->updateUnavailability(id, updated[id]);` in two places.

@Joseph, I think `schedule.windows` will also include DOWN machines. We did 
check here: 
https://github.com/apache/mesos/blob/master/src/master/maintenance.cpp#L241 ; 
if the machine is DOWN but not in updated list, master'll reject the request.

I have filed a JIRA (MESOS-4837) on whether this check is necessary.


- Klaus


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review121643
---


On March 3, 2016, 2:05 p.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 3, 2016, 2:05 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-02 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/
---

(Updated 三月 3, 2016, 6:05 a.m.)


Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.


Bugs: MESOS-4831
https://issues.apache.org/jira/browse/MESOS-4831


Repository: mesos


Description
---

There is a bug when setting host maintain with http endpoint: 
https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
The logic is as this:
1) Get all host list from maintain window and put it to updated hashmap.
2) If the machine in was in updated was also in master->machines, call master 
updateUnavailability to trigger recoverResources, updateUnavailability etc in 
allocator
3) Otherwise, clear the unavailabity time window for the machine.
4) Update each new machines in updated to call master updateUnavailability

But the logic in step 4) is getting all machines from the schedule windows but 
not the machines that is new to the cluster, this caused master get two 
updateUnavailability calls for a machine in the updated hashmap.

The fix is filter machines in updated hashmap when handling new machines.


Diffs (updated)
-

  src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
  src/tests/master_maintenance_tests.cpp 
3faa8136cf57276295553910319480028f433e4c 

Diff: https://reviews.apache.org/r/44258/diff/


Testing
---

make
make check
 ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose


Thanks,

Guangya Liu



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-02 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review121770
---




src/master/http.cpp (lines 1999 - 2000)


We originally did not set the mode here because this loop updates the 
"unavailability" regardless of the "mode".

If you'd like, you can add a note here.  i.e.
```
// NOTE: We do not update the `MachineInfo::MODE` here because the machine 
may be in any mode.
```

---

If we really want to prevent the double-inverse-offer, you'd need to add 
something like this at inside the existing `if`:
```
if (master->machines[id].info.mode() == MachineInfo::UP) {
  continue;
}
```

Although at this point, we might as well consider folding the body of this 
function into `Master::updateUnavailability`.



src/master/http.cpp (lines 2014 - 2016)


Nice catch.  This loop is in fact not representative of the comment above.

However, to perform the update correctly (including the issue above), you'd 
need to account for any existing machines that are DRAINING or DOWN.
```
if (master->machines.contains(id) &&
master->machines[id].info.mode() != MachineInfo::UP) {
  continue;
}
```

---

Nit: Newline after the `if`.



src/tests/master_maintenance_tests.cpp (lines 427 - 429)


Can you also change this to:
```
EXPECT_FALSE(event.get().offers().offers(0).has_unavailability());
```



src/tests/master_maintenance_tests.cpp (lines 469 - 478)


Can you also change this:

```
EXPECT_TRUE(event.get().offers().offers(0).has_unavailability());
EXPECT_EQ(
unavailability.start().nanoseconds(),
event.get().offers().offers(0).unavailability().start().nanoseconds());

EXPECT_EQ(
unavailability.duration().nanoseconds(),

event.get().offers().offers(0).unavailability().duration().nanoseconds());
```


- Joseph Wu


On March 1, 2016, 11:45 p.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 1, 2016, 11:45 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-02 Thread Joseph Wu


> On March 2, 2016, 6:43 a.m., Klaus Ma wrote:
> > src/master/http.cpp, lines 1999-2000
> > 
> >
> > I think we can just remove `master->updateUnavailability(id, 
> > updated[id]);` here, so other machine will `UP` and scheduler will be 
> > updated in next loop.
> > 
> > 
> > Futhur more, we can avoid the loop for update. The draft code will be:
> > 
> > ```
> > hashmap updated;
> > 
> > // delete the loop for "updated[id] = window.unavailability();"
> > 
> > foreach (const mesos::maintenance::Window& window, schedule.windows()) {
> >   foreach (const MachineID& id, window.machine_ids()) {
> > ...
> > master->updateUnavailability(id, window.unavailability());
> > updated[id] = window.unavailability();
> >   }
> > }
> > 
> > foreachkey (const MachineID& id, utils::copy(master->machines)) {
> >   if (updated.contains(id)) {
> > continue;
> >   }
> > 
> >   // Transition each removed machine back to the `UP` mode and remove 
> > the
> >   // unavailability.
> >   master->machines[id].info.set_mode(MachineInfo::UP);
> >   master->updateUnavailability(id, None());
> > }
> > ```

(Wish I could comment inline on a comment...)

Your two loops won't update the `Unavailability` for machines that have been 
brought DOWN previously.  That's why we had `master->updateUnavailability(id, 
updated[id]);` in two places.


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review121643
---


On March 1, 2016, 11:45 p.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 1, 2016, 11:45 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-02 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review121659
---



Patch looks great!

Reviews applied: [44258]

Passed command: export OS='ubuntu:14.04' CONFIGURATION='--verbose' 
COMPILER='gcc' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh

- Mesos ReviewBot


On March 2, 2016, 7:45 a.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 2, 2016, 7:45 a.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Re: Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-02 Thread Klaus Ma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review121643
---




src/master/http.cpp (lines 1999 - 2000)


I think we can just remove `master->updateUnavailability(id, updated[id]);` 
here, so other machine will `UP` and scheduler will be updated in next loop.

Futhur more, we can avoid the loop for update. The draft code will be:

```
hashmap updated;

// delete the loop for "updated[id] = window.unavailability();"

foreach (const mesos::maintenance::Window& window, schedule.windows()) {
  foreach (const MachineID& id, window.machine_ids()) {
...
master->updateUnavailability(id, window.unavailability());
updated[id] = window.unavailability();
  }
}

foreachkey (const MachineID& id, utils::copy(master->machines)) {
  if (updated.contains(id)) {
continue;
  }

  // Transition each removed machine back to the `UP` mode and remove the
  // unavailability.
  master->machines[id].info.set_mode(MachineInfo::UP);
  master->updateUnavailability(id, None());
}
```


- Klaus Ma


On March 2, 2016, 3:45 p.m., Guangya Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> ---
> 
> (Updated March 2, 2016, 3:45 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
> https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>



Review Request 44258: Fixed http endpoint trigger two inverse offer calls.

2016-03-01 Thread Guangya Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/
---

Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.


Bugs: MESOS-4831
https://issues.apache.org/jira/browse/MESOS-4831


Repository: mesos


Description
---

There is a bug when setting host maintain with http endpoint: 
https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
The logic is as this:
1) Get all host list from maintain window and put it to updated hashmap.
2) If the machine in was in updated was also in master->machines, call master 
updateUnavailability to trigger recoverResources, updateUnavailability etc in 
allocator
3) Otherwise, clear the unavailabity time window for the machine.
4) Update each new machines in updated to call master updateUnavailability

But the logic in step 4) is getting all machines from the schedule windows but 
not the machines that is new to the cluster, this caused master get two 
updateUnavailability calls for a machine in the updated hashmap.

The fix is filter machines in updated hashmap when handling new machines.


Diffs
-

  src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
  src/tests/master_maintenance_tests.cpp 
3faa8136cf57276295553910319480028f433e4c 

Diff: https://reviews.apache.org/r/44258/diff/


Testing
---

make
make check
 ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose


Thanks,

Guangya Liu