Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)

2015-09-25 Thread Klaus Ma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
---

(Updated Sept. 26, 2015, 2:52 a.m.)


Review request for mesos, Jie Yu and Vinod Kone.


Changes
---

Merge the code with the latest code; and re-check whether any potentail issue. 
I'll add more UT case on "kill duplicated tasks" and "show duplicated tasks in 
metrics"


Bugs: MESOS-3070
https://issues.apache.org/jira/browse/MESOS-3070


Repository: mesos


Description
---

__Phenomenon:__
The master crash because of duplicated task id

__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time 
window that new slave lanched a task with same task id; so if the old task 
re-registered back, the master will crash because of duplicated task id.

__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.


Diffs (updated)
-

  src/master/http.cpp cd37c91 
  src/master/master.hpp 4bb65f0 
  src/master/master.cpp 6bee4f3 
  src/tests/master_tests.cpp ee24739 

Diff: https://reviews.apache.org/r/37531/diff/


Testing
---

make
make check


Thanks,

Klaus Ma



Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)

2015-09-25 Thread Mesos ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/#review100737
---


Patch looks great!

Reviews applied: [37531]

All tests passed.

- Mesos ReviewBot


On Sept. 26, 2015, 2:52 a.m., Klaus Ma wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37531/
> ---
> 
> (Updated Sept. 26, 2015, 2:52 a.m.)
> 
> 
> Review request for mesos, Jie Yu and Vinod Kone.
> 
> 
> Bugs: MESOS-3070
> https://issues.apache.org/jira/browse/MESOS-3070
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> __Phenomenon:__
> The master crash because of duplicated task id
> 
> __Root Cause:__
> The task id are stored in slave agent; if master failover, there's a time 
> window that new slave lanched a task with same task id; so if the old task 
> re-registered back, the master will crash because of duplicated task id.
> 
> __Solution:__
> Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp cd37c91 
>   src/master/master.hpp 4bb65f0 
>   src/master/master.cpp 6bee4f3 
>   src/tests/master_tests.cpp ee24739 
> 
> Diff: https://reviews.apache.org/r/37531/diff/
> 
> 
> Testing
> ---
> 
> make
> make check
> 
> 
> Thanks,
> 
> Klaus Ma
> 
>



Re: Review Request 37531: MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)

2015-09-04 Thread Klaus Ma

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37531/
---

(Updated Sept. 5, 2015, 3:27 a.m.)


Review request for mesos and Vinod Kone.


Changes
---

Add summary & description


Summary (updated)
-

MESOS-3070 (Master CHECK failure if a framework uses duplicated task id)


Bugs: MESOS-3070
https://issues.apache.org/jira/browse/MESOS-3070


Repository: mesos


Description (updated)
---

__Phenomenon:__
The master crash because of duplicated task id

__Root Cause:__
The task id are stored in slave agent; if master failover, there's a time 
window that new slave lanched a task with same task id; so if the old task 
re-registered back, the master will crash because of duplicated task id.

__Solution:__
Stores tasks info in Master::Framework by SlaveID to avoid duplicated issue.


Diffs
-

  src/master/http.cpp 37d76ee 
  src/master/master.hpp 36c6759 
  src/master/master.cpp 95207d2 
  src/tests/master_tests.cpp 8a6b98b 

Diff: https://reviews.apache.org/r/37531/diff/


Testing
---

make
make check


Thanks,

Klaus Ma