Jenkins build is back to normal : Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME #1766

2013-11-19 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/1766/



Re: Review Request 15653: Adds systemLoad() convenience method to stout

2013-11-19 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15653/#review29124
---



3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp
https://reviews.apache.org/r/15653/#comment56255

How about Tryvectordouble  instead? We typically return values by 
return value rather than function parameter.


- Vinod Kone


On Nov. 18, 2013, 7:19 p.m., Niklas Nielsen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15653/
 ---
 
 (Updated Nov. 18, 2013, 7:19 p.m.)
 
 
 Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 This patch includes a wrapper to get system load averages in uptime(1)
 format. This is used by an upcoming patch which expose these averages
 over master and slave stats.json endpoints.
 
 
 Diffs
 -
 
   3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp 
 f6bbf5e00a810affd8cb6f828d1f306dc8bf3051 
 
 Diff: https://reviews.apache.org/r/15653/diff/
 
 
 Testing
 ---
 
 make check and functional testing with endpoints.
 
 
 Thanks,
 
 Niklas Nielsen
 




Re: Review Request 15653: Adds systemLoad() convenience method to stout

2013-11-19 Thread Niklas Nielsen


 On Nov. 19, 2013, 6:48 p.m., Vinod Kone wrote:
  3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp, line 836
  https://reviews.apache.org/r/15653/diff/1/?file=388010#file388010line836
 
  How about Tryvectordouble  instead? We typically return values by 
  return value rather than function parameter.

SGTM - Will get that in.


- Niklas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15653/#review29124
---


On Nov. 18, 2013, 7:19 p.m., Niklas Nielsen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15653/
 ---
 
 (Updated Nov. 18, 2013, 7:19 p.m.)
 
 
 Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 This patch includes a wrapper to get system load averages in uptime(1)
 format. This is used by an upcoming patch which expose these averages
 over master and slave stats.json endpoints.
 
 
 Diffs
 -
 
   3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp 
 f6bbf5e00a810affd8cb6f828d1f306dc8bf3051 
 
 Diff: https://reviews.apache.org/r/15653/diff/
 
 
 Testing
 ---
 
 make check and functional testing with endpoints.
 
 
 Thanks,
 
 Niklas Nielsen
 




Re: Review Request 14669: launchTasks on list of offers

2013-11-19 Thread Vinod Kone

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14669/#review29127
---


Sorry for the delay on this. Mind rebasing it? I will get this committed today. 
Thanks.

- Vinod Kone


On Nov. 14, 2013, 10:31 p.m., Niklas Nielsen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/14669/
 ---
 
 (Updated Nov. 14, 2013, 10:31 p.m.)
 
 
 Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
 
 
 Bugs: MESOS-749
 https://issues.apache.org/jira/browse/MESOS-749
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 Running tasks on more than one offer belonging to a single slave can be 
 useful in situations with multiple out-standing offers.
 
 This patch extends the usual launchTasks() to accept a vector of OfferIDs. 
 The previous launchTasks (accepting a single OfferID) has been kept for 
 backward compatibility, but this now calls the new launchTasks() with a 
 one-element list.
 This also applied for the JNI and python interfaces, which accepts both 
 formats as well.
 
 Offers are verified to belong to the same slave and framework, before 
 resources are merged and used.
 
 
 Diffs
 -
 
   include/mesos/scheduler.hpp 380e087 
   src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 9869929 
   src/java/src/org/apache/mesos/MesosSchedulerDriver.java ed4b4a3 
   src/java/src/org/apache/mesos/SchedulerDriver.java 5b0ca39 
   src/master/master.hpp e377af8 
   src/master/master.cpp abab6ce 
   src/messages/messages.proto 71f68a0 
   src/python/native/mesos_scheduler_driver_impl.cpp 059ed5d 
   src/sched/sched.cpp 3abe72f 
   src/tests/master_tests.cpp bf790d2 
   src/tests/resource_offers_tests.cpp 2864c9a 
 
 Diff: https://reviews.apache.org/r/14669/diff/
 
 
 Testing
 ---
 
 Three new tests has been added: LaunchCombinedOfferTest, 
 LaunchAcrossSlavesTest and LaunchDuplicateOfferTest
 This test ensures that:
 1) Multiple offers can be used to run a single task (requesting the sum of 
 offer resources).
 2) Offers cannot span multiple slaves.
 3) No offers can appear more than once in offer list.
 
 $ make check
 ...
 [ RUN  ] MasterTest.LaunchCombinedOfferTest
 [   OK ] MasterTest.LaunchCombinedOfferTest (2010 ms)
 [ RUN  ] MasterTest.LaunchAcrossSlavesTest
 [   OK ] MasterTest.LaunchAcrossSlavesTest (3 ms)
 [ RUN  ] MasterTest.LaunchDuplicateOfferTest
 [   OK ] MasterTest.LaunchDuplicateOfferTest (3 ms)
 ...
 
 
 Thanks,
 
 Niklas Nielsen
 




[jira] [Created] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky

2013-11-19 Thread Yan Xu (JIRA)
Yan Xu created MESOS-822:


 Summary: AllocatorTest/0.SchedulerFailover is flaky
 Key: MESOS-822
 URL: https://issues.apache.org/jira/browse/MESOS-822
 Project: Mesos
  Issue Type: Bug
Reporter: Yan Xu
 Fix For: 0.16.0


Log output: 
http://sfo2-aad-36-sr1.perf.twttr.net:8080/job/mesos-centos-6-gcc/119/console

I1119 12:01:33.126309 17083 master.hpp:438] Removing offer 
201311191201-16777343-52448-17056-1 with resources cpus(*):2; mem(*):768; 
disk(*):22668; ports(*):[31000-32000] on slave 
201311191201-16777343-52448-17056-0 (localhost.localdomain)
tests/allocator_tests.cpp:993: Failure
Mock function called more times than expected - taking default action specified 
at:
./tests/mesos.hpp:412:
Function call: resourcesUnused(@0x7f6f80025e58 
201311191201-16777343-52448-17056-, @0x7f6f80025e38 
201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, 
disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object 00-00 
00-00 00-00 00-00 30-10 03-80 6F-7F 00-00)
 Expected: to be called once
   Actual: called twice - over-saturated and active
I1119 12:01:33.126698 17083 hierarchical_allocator_process.hpp:547] Framework 
201311191201-16777343-52448-17056- left cpus(*):2; mem(*):768; 
disk(*):22668; ports(*):[31000-32000] unused on slave 
201311191201-16777343-52448-17056-0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky

2013-11-19 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-822:
-

Description: 
slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
tests/allocator_tests.cpp:993: Failure
Mock function called more times than expected - taking default action specified 
at:
./tests/mesos.hpp:412:
Function call: resourcesUnused(@0x7f6f80025e58 
201311191201-16777343-52448-17056-, @0x7f6f80025e38 
201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, 
disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object 00-00 
00-00 00-00 00-00 30-10 03-80 6F-7F 00-00)
 Expected: to be called once
   Actual: called twice - over-saturated and active

Full Log:

[ RUN  ] AllocatorTest/0.SchedulerFailover
I1119 12:01:32.106143 19009 exec.cpp:84] Committing suicide by killing the 
process group
I1119 12:01:32.106276 19017 exec.cpp:84] Committing suicide by killing the 
process group
I1119 12:01:32.108185 18999 exec.cpp:84] Committing suicide by killing the 
process group
I1119 12:01:32.113991 17076 master.cpp:285] Master started on 127.0.0.1:52448
I1119 12:01:32.114038 17076 master.cpp:299] Master ID: 
201311191201-16777343-52448-17056
I1119 12:01:32.114047 17076 master.cpp:302] Master only allowing authenticated 
frameworks to register!
I1119 12:01:32.114109 17082 slave.cpp:112] Slave started on 127)@127.0.0.1:52448
I1119 12:01:32.114209 17082 slave.cpp:212] Slave resources: cpus(*):3; 
mem(*):1024; disk(*):22668; ports(*):[31000-32000]
I1119 12:01:32.114393 17080 sched.cpp:207] New master detected at 
master@127.0.0.1:52448
I1119 12:01:32.114413 17080 sched.cpp:260] Authenticating with master 
master@127.0.0.1:52448
I1119 12:01:32.114461 17080 sched.cpp:229] Detecting new master
I1119 12:01:32.114497 17080 authenticatee.hpp:124] Creating new client SASL 
connection
I1119 12:01:32.118248 17082 state.cpp:33] Recovering state from 
'/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/meta'
I1119 12:01:32.118343 17082 status_update_manager.cpp:180] Recovering status 
update manager
I1119 12:01:32.118407 17082 slave.cpp:2743] Finished recovery
I1119 12:01:32.118463 17082 slave.cpp:497] New master detected at 
master@127.0.0.1:52448
I1119 12:01:32.118517 17082 slave.cpp:524] Detecting new master
I1119 12:01:32.118538 17082 status_update_manager.cpp:158] New master detected 
at master@127.0.0.1:52448
I1119 12:01:32.118906 17076 master.cpp:1734] Authenticating framework at 
scheduler(119)@127.0.0.1:52448
W1119 12:01:32.118986 17076 master.cpp:1235] Ignoring register slave message 
from localhost.localdomain since not elected yet
I1119 12:01:32.119091 17076 master.cpp:85] No whitelist given. Advertising 
offers for all slaves
I1119 12:01:32.119155 17076 authenticator.hpp:140] Creating new server SASL 
connection
I1119 12:01:32.119243 17076 hierarchical_allocator_process.hpp:302] 
Initializing hierarchical allocator process with master : master@127.0.0.1:52448
I1119 12:01:32.119279 17076 authenticatee.hpp:212] Received SASL authentication 
mechanisms: CRAM-MD5
I1119 12:01:32.119293 17076 authenticatee.hpp:238] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I1119 12:01:32.119312 17076 master.cpp:744] The newly elected leader is 
master@127.0.0.1:52448
I1119 12:01:32.119321 17076 master.cpp:748] Elected as the leading master!
I1119 12:01:32.119343 17076 authenticator.hpp:243] Received SASL authentication 
start
I1119 12:01:32.119390 17076 authenticator.hpp:325] Authentication requires more 
steps
I1119 12:01:32.119417 17076 authenticatee.hpp:258] Received SASL authentication 
step
I1119 12:01:32.119447 17076 authenticator.hpp:271] Received SASL authentication 
step
I1119 12:01:32.119463 17076 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 
'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: 
false 
I1119 12:01:32.119472 17076 auxprop.cpp:153] Looking up auxiliary property 
'*userPassword'
I1119 12:01:32.119482 17076 auxprop.cpp:153] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'
I1119 12:01:32.119490 17076 auxprop.cpp:81] Request to lookup properties for 
user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 
'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I1119 12:01:32.119498 17076 auxprop.cpp:103] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true
I1119 12:01:32.119503 17076 auxprop.cpp:103] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I1119 12:01:32.119514 17076 authenticator.hpp:317] Authentication success
I1119 12:01:32.119532 17076 authenticatee.hpp:298] Authentication success
I1119 12:01:32.119547 17076 master.cpp:1774] Successfully authenticated 
framework at scheduler(119)@127.0.0.1:52448
I1119 12:01:32.119604 17076 

Re: Review Request 14669: launchTasks on list of offers

2013-11-19 Thread Niklas Nielsen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14669/
---

(Updated Nov. 19, 2013, 10:11 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
---

Rebased to master.


Bugs: MESOS-749
https://issues.apache.org/jira/browse/MESOS-749


Repository: mesos-git


Description
---

Running tasks on more than one offer belonging to a single slave can be useful 
in situations with multiple out-standing offers.

This patch extends the usual launchTasks() to accept a vector of OfferIDs. The 
previous launchTasks (accepting a single OfferID) has been kept for backward 
compatibility, but this now calls the new launchTasks() with a one-element list.
This also applied for the JNI and python interfaces, which accepts both formats 
as well.

Offers are verified to belong to the same slave and framework, before resources 
are merged and used.


Diffs (updated)
-

  include/mesos/scheduler.hpp 161cc65 
  src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 9869929 
  src/java/src/org/apache/mesos/MesosSchedulerDriver.java ed4b4a3 
  src/java/src/org/apache/mesos/SchedulerDriver.java 5b0ca39 
  src/master/master.hpp c86c1f1 
  src/master/master.cpp f65b344 
  src/messages/messages.proto 1f264d5 
  src/python/native/mesos_scheduler_driver_impl.cpp 059ed5d 
  src/sched/sched.cpp 51f95bb 
  src/tests/master_tests.cpp 37ee7a0 
  src/tests/resource_offers_tests.cpp 2864c9a 

Diff: https://reviews.apache.org/r/14669/diff/


Testing
---

Three new tests has been added: LaunchCombinedOfferTest, LaunchAcrossSlavesTest 
and LaunchDuplicateOfferTest
This test ensures that:
1) Multiple offers can be used to run a single task (requesting the sum of 
offer resources).
2) Offers cannot span multiple slaves.
3) No offers can appear more than once in offer list.

$ make check
...
[ RUN  ] MasterTest.LaunchCombinedOfferTest
[   OK ] MasterTest.LaunchCombinedOfferTest (2010 ms)
[ RUN  ] MasterTest.LaunchAcrossSlavesTest
[   OK ] MasterTest.LaunchAcrossSlavesTest (3 ms)
[ RUN  ] MasterTest.LaunchDuplicateOfferTest
[   OK ] MasterTest.LaunchDuplicateOfferTest (3 ms)
...


Thanks,

Niklas Nielsen



Review Request 15684: Python CLI helper 'http.get' should not assume JSON.

2013-11-19 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15684/
---

Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and 
Vinod Kone.


Repository: mesos-git


Description
---

See summary.


Diffs
-

  src/cli/mesos-cat bb1e19750083f9e2680a5a22bd3bd3f7b2bc8656 
  src/cli/mesos-ps aff8423040a4ba5ce7f41da73a4c70a4d76da93f 
  src/cli/mesos-tail 33acee4f92a1fa0cbf65568537e24f822e083717 
  src/cli/python/mesos/http.py e65701bee92dcad2af4e871394df5c60a7150659 

Diff: https://reviews.apache.org/r/15684/diff/


Testing
---


Thanks,

Benjamin Hindman



Re: Review Request 15684: Python CLI helper 'http.get' should not assume JSON.

2013-11-19 Thread Du Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15684/#review29144
---

Ship it!


let the client check it even if the content is empty.

- Du Li


On Nov. 19, 2013, 10:22 p.m., Benjamin Hindman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15684/
 ---
 
 (Updated Nov. 19, 2013, 10:22 p.m.)
 
 
 Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, 
 and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/cli/mesos-cat bb1e19750083f9e2680a5a22bd3bd3f7b2bc8656 
   src/cli/mesos-ps aff8423040a4ba5ce7f41da73a4c70a4d76da93f 
   src/cli/mesos-tail 33acee4f92a1fa0cbf65568537e24f822e083717 
   src/cli/python/mesos/http.py e65701bee92dcad2af4e871394df5c60a7150659 
 
 Diff: https://reviews.apache.org/r/15684/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Benjamin Hindman
 




Re: Review Request 15684: Python CLI helper 'http.get' should not assume JSON.

2013-11-19 Thread Shingo Omura

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15684/#review29145
---

Ship it!


Ship It!

- Shingo Omura


On Nov. 19, 2013, 10:22 p.m., Benjamin Hindman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15684/
 ---
 
 (Updated Nov. 19, 2013, 10:22 p.m.)
 
 
 Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, 
 and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/cli/mesos-cat bb1e19750083f9e2680a5a22bd3bd3f7b2bc8656 
   src/cli/mesos-ps aff8423040a4ba5ce7f41da73a4c70a4d76da93f 
   src/cli/mesos-tail 33acee4f92a1fa0cbf65568537e24f822e083717 
   src/cli/python/mesos/http.py e65701bee92dcad2af4e871394df5c60a7150659 
 
 Diff: https://reviews.apache.org/r/15684/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Benjamin Hindman
 




Re: Review Request 15684: Python CLI helper 'http.get' should not assume JSON.

2013-11-19 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15684/#review29146
---

Ship it!


Ship It!

- Ben Mahler


On Nov. 19, 2013, 10:22 p.m., Benjamin Hindman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15684/
 ---
 
 (Updated Nov. 19, 2013, 10:22 p.m.)
 
 
 Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, 
 and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/cli/mesos-cat bb1e19750083f9e2680a5a22bd3bd3f7b2bc8656 
   src/cli/mesos-ps aff8423040a4ba5ce7f41da73a4c70a4d76da93f 
   src/cli/mesos-tail 33acee4f92a1fa0cbf65568537e24f822e083717 
   src/cli/python/mesos/http.py e65701bee92dcad2af4e871394df5c60a7150659 
 
 Diff: https://reviews.apache.org/r/15684/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Benjamin Hindman
 




Re: Review Request 15653: Adds systemLoad() convenience method to stout

2013-11-19 Thread Niklas Nielsen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15653/
---

(Updated Nov. 19, 2013, 11:08 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.


Changes
---

Returns vector instead of return parameter array.


Repository: mesos-git


Description
---

This patch includes a wrapper to get system load averages in uptime(1)
format. This is used by an upcoming patch which expose these averages
over master and slave stats.json endpoints.


Diffs (updated)
-

  3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp f6bbf5e 

Diff: https://reviews.apache.org/r/15653/diff/


Testing
---

make check and functional testing with endpoints.


Thanks,

Niklas Nielsen



Review Request 15691: Bug fix in Python CLI futures.

2013-11-19 Thread Benjamin Hindman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15691/
---

Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and 
Vinod Kone.


Repository: mesos-git


Description
---

See summary.


Diffs
-

  src/cli/python/mesos/futures.py 9c36823c94ba26bbe5d17c52c055df0a361f9645 

Diff: https://reviews.apache.org/r/15691/diff/


Testing
---


Thanks,

Benjamin Hindman



Re: Review Request 15691: Bug fix in Python CLI futures.

2013-11-19 Thread Du Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15691/#review29148
---

Ship it!


Ship It!

- Du Li


On Nov. 19, 2013, 11:13 p.m., Benjamin Hindman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15691/
 ---
 
 (Updated Nov. 19, 2013, 11:13 p.m.)
 
 
 Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, 
 and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/cli/python/mesos/futures.py 9c36823c94ba26bbe5d17c52c055df0a361f9645 
 
 Diff: https://reviews.apache.org/r/15691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Benjamin Hindman
 




Re: Review Request 15691: Bug fix in Python CLI futures.

2013-11-19 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15691/#review29149
---

Ship it!


Ship It!

- Ben Mahler


On Nov. 19, 2013, 11:13 p.m., Benjamin Hindman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15691/
 ---
 
 (Updated Nov. 19, 2013, 11:13 p.m.)
 
 
 Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, 
 and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/cli/python/mesos/futures.py 9c36823c94ba26bbe5d17c52c055df0a361f9645 
 
 Diff: https://reviews.apache.org/r/15691/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Benjamin Hindman
 




Re: Review Request 15653: Adds systemLoad() convenience method to stout

2013-11-19 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15653/#review29151
---



3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp
https://reviews.apache.org/r/15653/#comment56284

Hey Nik, I see your gist here:
https://gist.github.com/nqn/7493244

More interesting than node-wide load average will be the total cpu time for 
the master, can we expose the same cpu time information as what we do in 
ProcessIsolator::usage instead of the load average?


- Ben Mahler


On Nov. 19, 2013, 11:08 p.m., Niklas Nielsen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15653/
 ---
 
 (Updated Nov. 19, 2013, 11:08 p.m.)
 
 
 Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone.
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 This patch includes a wrapper to get system load averages in uptime(1)
 format. This is used by an upcoming patch which expose these averages
 over master and slave stats.json endpoints.
 
 
 Diffs
 -
 
   3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp f6bbf5e 
 
 Diff: https://reviews.apache.org/r/15653/diff/
 
 
 Testing
 ---
 
 make check and functional testing with endpoints.
 
 
 Thanks,
 
 Niklas Nielsen
 




Re: Review Request 14960: implementation of CLI mesos-status

2013-11-19 Thread Du Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14960/
---

(Updated Nov. 19, 2013, 11:43 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Shingo Omura.


Changes
---

This commit implements the CLI command mesos-status, which reports three 
categories of hosts:

(1) The hosts that are reported by the /master/state.json and responded to 
/slave(1)/health query;
(2) Those that are reported by master but failed to respond to health query by 
timeout.


Repository: mesos-git


Description
---

This commit implements the CLI command mesos-status, which reports three 
categories of hosts:

(1) The hosts that are reported by the /master/state.json and responded to 
/slave(1)/health query;
(2) Those that are reported by master but failed to respond to health query by 
timeout;
(3) Those that are included in the var/mesos/deploy/slaves files but not 
reported by the master.


Diffs (updated)
-

  src/cli/mesos-status PRE-CREATION 

Diff: https://reviews.apache.org/r/14960/diff/


Testing
---

has been tested on a local cluster of 12 servers.


Thanks,

Du Li



Re: Review Request 14960: implementation of CLI mesos-status

2013-11-19 Thread Du Li

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14960/
---

(Updated Nov. 19, 2013, 11:47 p.m.)


Review request for mesos, Benjamin Hindman, Ben Mahler, and Shingo Omura.


Changes
---

This commit implements the CLI command mesos-status, which reports three 
categories of hosts:

(1) The hosts that are reported by the /master/state.json and responded to 
/slave(1)/health query;
(2) Those that are reported by master but failed to respond to health query by 
timeout.

This diff is rebased on latest code on master and has temporarily removed code 
for checking configuration file.


Repository: mesos-git


Description
---

This commit implements the CLI command mesos-status, which reports three 
categories of hosts:

(1) The hosts that are reported by the /master/state.json and responded to 
/slave(1)/health query;
(2) Those that are reported by master but failed to respond to health query by 
timeout;
(3) Those that are included in the var/mesos/deploy/slaves files but not 
reported by the master.


Diffs (updated)
-

  src/cli/mesos-status PRE-CREATION 

Diff: https://reviews.apache.org/r/14960/diff/


Testing
---

has been tested on a local cluster of 12 servers.


Thanks,

Du Li



[jira] [Updated] (MESOS-818) Bump up the minimum number threads libprocess creates to accommodate new tests

2013-11-19 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-818:
-

Component/s: (was: general)
 libprocess

 Bump up the minimum number threads libprocess creates to accommodate new tests
 --

 Key: MESOS-818
 URL: https://issues.apache.org/jira/browse/MESOS-818
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Yan Xu
Assignee: Yan Xu
  Labels: twitter
 Fix For: 0.16.0


 Currently the minimum number of threads libprocess creates is 4 which causes 
 some newly written tests that have more libprocess processes needing to wait 
 on latches than the number of threads libprocess has thus are starved.
 See:
 https://github.com/apache/mesos/blob/dd89ea359ec55fbc90b5718d9cdbf021f189c2fa/3rdparty/libprocess/src/process.cpp#L1367
 Need to bump the minimum number to 8.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MESOS-818) Bump up the minimum number threads libprocess creates to accommodate new tests

2013-11-19 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-818:
-

Component/s: general

 Bump up the minimum number threads libprocess creates to accommodate new tests
 --

 Key: MESOS-818
 URL: https://issues.apache.org/jira/browse/MESOS-818
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Yan Xu
Assignee: Yan Xu
  Labels: twitter
 Fix For: 0.16.0


 Currently the minimum number of threads libprocess creates is 4 which causes 
 some newly written tests that have more libprocess processes needing to wait 
 on latches than the number of threads libprocess has thus are starved.
 See:
 https://github.com/apache/mesos/blob/dd89ea359ec55fbc90b5718d9cdbf021f189c2fa/3rdparty/libprocess/src/process.cpp#L1367
 Need to bump the minimum number to 8.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MESOS-818) Bump up the minimum number threads libprocess creates to accommodate new tests

2013-11-19 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-818:
-

Labels: twitter  (was: )

 Bump up the minimum number threads libprocess creates to accommodate new tests
 --

 Key: MESOS-818
 URL: https://issues.apache.org/jira/browse/MESOS-818
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Yan Xu
Assignee: Yan Xu
  Labels: twitter
 Fix For: 0.16.0


 Currently the minimum number of threads libprocess creates is 4 which causes 
 some newly written tests that have more libprocess processes needing to wait 
 on latches than the number of threads libprocess has thus are starved.
 See:
 https://github.com/apache/mesos/blob/dd89ea359ec55fbc90b5718d9cdbf021f189c2fa/3rdparty/libprocess/src/process.cpp#L1367
 Need to bump the minimum number to 8.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Review Request 15706: Fixed Group to retry when authentication failed due to retryable errors.

2013-11-19 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15706/
---

Review request for mesos, Benjamin Hindman, Ben Mahler, Ian Downes, Jie Yu, and 
Vinod Kone.


Bugs: MESOS-814
https://issues.apache.org/jira/browse/MESOS-814


Repository: mesos-git


Description
---

See summary.


Diffs
-

  src/zookeeper/group.hpp 04068e357cec95457d1f24c166d0b60f86d997d2 
  src/zookeeper/group.cpp 12c781b29f4300ca8a29660adc3f1e55e03d5d04 

Diff: https://reviews.apache.org/r/15706/diff/


Testing
---

make check  mesos-tests.sh 
--gtest_filter=GroupTest*:ZooKeeperTest*:ZooKeeperMasterContenderDetectorTest* 
with high iterations.

This fix is for a problem not easy to expose through unit tests so no new tests 
were written.


Thanks,

Jiang Yan Xu



[jira] [Commented] (MESOS-814) Retry retryable authentication failures

2013-11-19 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827152#comment-13827152
 ] 

Yan Xu commented on MESOS-814:
--

https://reviews.apache.org/r/15706

 Retry retryable authentication failures
 ---

 Key: MESOS-814
 URL: https://issues.apache.org/jira/browse/MESOS-814
 Project: Mesos
  Issue Type: Improvement
Reporter: Yan Xu
Assignee: Yan Xu

 Currently Group puts all unsuccessful operations but authentication into a 
 retry queue if the first attempt fails (and if the error indicates they are 
 retryable).
 Authentication should be retried as well.
 See: https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp#L393



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky

2013-11-19 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-822:
-

Assignee: Benjamin Mahler

 AllocatorTest/0.SchedulerFailover is flaky
 --

 Key: MESOS-822
 URL: https://issues.apache.org/jira/browse/MESOS-822
 Project: Mesos
  Issue Type: Bug
Reporter: Yan Xu
Assignee: Benjamin Mahler
 Fix For: 0.16.0


 slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
 tests/allocator_tests.cpp:993: Failure
 Mock function called more times than expected - taking default action 
 specified at:
 ./tests/mesos.hpp:412:
 Function call: resourcesUnused(@0x7f6f80025e58 
 201311191201-16777343-52448-17056-, @0x7f6f80025e38 
 201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, 
 disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object 
 00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00)
  Expected: to be called once
Actual: called twice - over-saturated and active
 Full Log:
 [ RUN  ] AllocatorTest/0.SchedulerFailover
 I1119 12:01:32.106143 19009 exec.cpp:84] Committing suicide by killing the 
 process group
 I1119 12:01:32.106276 19017 exec.cpp:84] Committing suicide by killing the 
 process group
 I1119 12:01:32.108185 18999 exec.cpp:84] Committing suicide by killing the 
 process group
 I1119 12:01:32.113991 17076 master.cpp:285] Master started on 127.0.0.1:52448
 I1119 12:01:32.114038 17076 master.cpp:299] Master ID: 
 201311191201-16777343-52448-17056
 I1119 12:01:32.114047 17076 master.cpp:302] Master only allowing 
 authenticated frameworks to register!
 I1119 12:01:32.114109 17082 slave.cpp:112] Slave started on 
 127)@127.0.0.1:52448
 I1119 12:01:32.114209 17082 slave.cpp:212] Slave resources: cpus(*):3; 
 mem(*):1024; disk(*):22668; ports(*):[31000-32000]
 I1119 12:01:32.114393 17080 sched.cpp:207] New master detected at 
 master@127.0.0.1:52448
 I1119 12:01:32.114413 17080 sched.cpp:260] Authenticating with master 
 master@127.0.0.1:52448
 I1119 12:01:32.114461 17080 sched.cpp:229] Detecting new master
 I1119 12:01:32.114497 17080 authenticatee.hpp:124] Creating new client SASL 
 connection
 I1119 12:01:32.118248 17082 state.cpp:33] Recovering state from 
 '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/meta'
 I1119 12:01:32.118343 17082 status_update_manager.cpp:180] Recovering status 
 update manager
 I1119 12:01:32.118407 17082 slave.cpp:2743] Finished recovery
 I1119 12:01:32.118463 17082 slave.cpp:497] New master detected at 
 master@127.0.0.1:52448
 I1119 12:01:32.118517 17082 slave.cpp:524] Detecting new master
 I1119 12:01:32.118538 17082 status_update_manager.cpp:158] New master 
 detected at master@127.0.0.1:52448
 I1119 12:01:32.118906 17076 master.cpp:1734] Authenticating framework at 
 scheduler(119)@127.0.0.1:52448
 W1119 12:01:32.118986 17076 master.cpp:1235] Ignoring register slave message 
 from localhost.localdomain since not elected yet
 I1119 12:01:32.119091 17076 master.cpp:85] No whitelist given. Advertising 
 offers for all slaves
 I1119 12:01:32.119155 17076 authenticator.hpp:140] Creating new server SASL 
 connection
 I1119 12:01:32.119243 17076 hierarchical_allocator_process.hpp:302] 
 Initializing hierarchical allocator process with master : 
 master@127.0.0.1:52448
 I1119 12:01:32.119279 17076 authenticatee.hpp:212] Received SASL 
 authentication mechanisms: CRAM-MD5
 I1119 12:01:32.119293 17076 authenticatee.hpp:238] Attempting to authenticate 
 with mechanism 'CRAM-MD5'
 I1119 12:01:32.119312 17076 master.cpp:744] The newly elected leader is 
 master@127.0.0.1:52448
 I1119 12:01:32.119321 17076 master.cpp:748] Elected as the leading master!
 I1119 12:01:32.119343 17076 authenticator.hpp:243] Received SASL 
 authentication start
 I1119 12:01:32.119390 17076 authenticator.hpp:325] Authentication requires 
 more steps
 I1119 12:01:32.119417 17076 authenticatee.hpp:258] Received SASL 
 authentication step
 I1119 12:01:32.119447 17076 authenticator.hpp:271] Received SASL 
 authentication step
 I1119 12:01:32.119463 17076 auxprop.cpp:81] Request to lookup properties for 
 user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 
 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: 
 false 
 I1119 12:01:32.119472 17076 auxprop.cpp:153] Looking up auxiliary property 
 '*userPassword'
 I1119 12:01:32.119482 17076 auxprop.cpp:153] Looking up auxiliary property 
 '*cmusaslsecretCRAM-MD5'
 I1119 12:01:32.119490 17076 auxprop.cpp:81] Request to lookup properties for 
 user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 
 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: 
 true 
 I1119 12:01:32.119498 17076 auxprop.cpp:103] Skipping auxiliary property 
 '*userPassword' since 

Review Request 15707: Fixed a flaky test: AllocatorTest/SchedulerFailover.

2013-11-19 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15707/
---

Review request for mesos and Vinod Kone.


Bugs: MESOS-822
https://issues.apache.org/jira/browse/MESOS-822


Repository: mesos-git


Description
---

See MESOS-822. The timing in CI was such that a subsequent offer was sent to 
the scheduler before it could fail over.


Diffs
-

  src/tests/allocator_tests.cpp 61ab235c148e7b380b0de148c9ca7bd9fa6563f2 

Diff: https://reviews.apache.org/r/15707/diff/


Testing
---

make check with 20,000 iterations


Thanks,

Ben Mahler



[jira] [Commented] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky

2013-11-19 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827169#comment-13827169
 ] 

Benjamin Mahler commented on MESOS-822:
---

https://reviews.apache.org/r/15707/

 AllocatorTest/0.SchedulerFailover is flaky
 --

 Key: MESOS-822
 URL: https://issues.apache.org/jira/browse/MESOS-822
 Project: Mesos
  Issue Type: Bug
Reporter: Yan Xu
Assignee: Benjamin Mahler
 Fix For: 0.16.0


 slave 201311191201-16777343-52448-17056-0 (localhost.localdomain)
 tests/allocator_tests.cpp:993: Failure
 Mock function called more times than expected - taking default action 
 specified at:
 ./tests/mesos.hpp:412:
 Function call: resourcesUnused(@0x7f6f80025e58 
 201311191201-16777343-52448-17056-, @0x7f6f80025e38 
 201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, 
 disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object 
 00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00)
  Expected: to be called once
Actual: called twice - over-saturated and active
 Full Log:
 [ RUN  ] AllocatorTest/0.SchedulerFailover
 I1119 12:01:32.106143 19009 exec.cpp:84] Committing suicide by killing the 
 process group
 I1119 12:01:32.106276 19017 exec.cpp:84] Committing suicide by killing the 
 process group
 I1119 12:01:32.108185 18999 exec.cpp:84] Committing suicide by killing the 
 process group
 I1119 12:01:32.113991 17076 master.cpp:285] Master started on 127.0.0.1:52448
 I1119 12:01:32.114038 17076 master.cpp:299] Master ID: 
 201311191201-16777343-52448-17056
 I1119 12:01:32.114047 17076 master.cpp:302] Master only allowing 
 authenticated frameworks to register!
 I1119 12:01:32.114109 17082 slave.cpp:112] Slave started on 
 127)@127.0.0.1:52448
 I1119 12:01:32.114209 17082 slave.cpp:212] Slave resources: cpus(*):3; 
 mem(*):1024; disk(*):22668; ports(*):[31000-32000]
 I1119 12:01:32.114393 17080 sched.cpp:207] New master detected at 
 master@127.0.0.1:52448
 I1119 12:01:32.114413 17080 sched.cpp:260] Authenticating with master 
 master@127.0.0.1:52448
 I1119 12:01:32.114461 17080 sched.cpp:229] Detecting new master
 I1119 12:01:32.114497 17080 authenticatee.hpp:124] Creating new client SASL 
 connection
 I1119 12:01:32.118248 17082 state.cpp:33] Recovering state from 
 '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/meta'
 I1119 12:01:32.118343 17082 status_update_manager.cpp:180] Recovering status 
 update manager
 I1119 12:01:32.118407 17082 slave.cpp:2743] Finished recovery
 I1119 12:01:32.118463 17082 slave.cpp:497] New master detected at 
 master@127.0.0.1:52448
 I1119 12:01:32.118517 17082 slave.cpp:524] Detecting new master
 I1119 12:01:32.118538 17082 status_update_manager.cpp:158] New master 
 detected at master@127.0.0.1:52448
 I1119 12:01:32.118906 17076 master.cpp:1734] Authenticating framework at 
 scheduler(119)@127.0.0.1:52448
 W1119 12:01:32.118986 17076 master.cpp:1235] Ignoring register slave message 
 from localhost.localdomain since not elected yet
 I1119 12:01:32.119091 17076 master.cpp:85] No whitelist given. Advertising 
 offers for all slaves
 I1119 12:01:32.119155 17076 authenticator.hpp:140] Creating new server SASL 
 connection
 I1119 12:01:32.119243 17076 hierarchical_allocator_process.hpp:302] 
 Initializing hierarchical allocator process with master : 
 master@127.0.0.1:52448
 I1119 12:01:32.119279 17076 authenticatee.hpp:212] Received SASL 
 authentication mechanisms: CRAM-MD5
 I1119 12:01:32.119293 17076 authenticatee.hpp:238] Attempting to authenticate 
 with mechanism 'CRAM-MD5'
 I1119 12:01:32.119312 17076 master.cpp:744] The newly elected leader is 
 master@127.0.0.1:52448
 I1119 12:01:32.119321 17076 master.cpp:748] Elected as the leading master!
 I1119 12:01:32.119343 17076 authenticator.hpp:243] Received SASL 
 authentication start
 I1119 12:01:32.119390 17076 authenticator.hpp:325] Authentication requires 
 more steps
 I1119 12:01:32.119417 17076 authenticatee.hpp:258] Received SASL 
 authentication step
 I1119 12:01:32.119447 17076 authenticator.hpp:271] Received SASL 
 authentication step
 I1119 12:01:32.119463 17076 auxprop.cpp:81] Request to lookup properties for 
 user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 
 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: 
 false 
 I1119 12:01:32.119472 17076 auxprop.cpp:153] Looking up auxiliary property 
 '*userPassword'
 I1119 12:01:32.119482 17076 auxprop.cpp:153] Looking up auxiliary property 
 '*cmusaslsecretCRAM-MD5'
 I1119 12:01:32.119490 17076 auxprop.cpp:81] Request to lookup properties for 
 user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 
 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: 
 true 
 I1119 12:01:32.119498 17076 

Review Request 15708: Improved exit status printing in the CgroupsIsolator.

2013-11-19 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15708/
---

Review request for mesos and Vinod Kone.


Repository: mesos-git


Description
---

See above.


Diffs
-

  src/slave/cgroups_isolator.cpp c769ae045783125013989b12f8aa61dfda687ce8 

Diff: https://reviews.apache.org/r/15708/diff/


Testing
---

make check


Thanks,

Ben Mahler



[jira] [Created] (MESOS-823) ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky

2013-11-19 Thread Yan Xu (JIRA)
Yan Xu created MESOS-823:


 Summary: 
ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky
 Key: MESOS-823
 URL: https://issues.apache.org/jira/browse/MESOS-823
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Yan Xu
 Fix For: 0.16.0


This was never captured on faster build servers...



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Review Request 15710: Fixed a bug in ZooKeeperMasterContenderDetectorTest that caused the local timeout in Group not getting triggered.

2013-11-19 Thread Jiang Yan Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15710/
---

Review request for mesos and Ben Mahler.


Bugs: MESOS-823
https://issues.apache.org/jira/browse/MESOS-823


Repository: mesos-git


Description
---

See summary.


Diffs
-

  src/tests/master_contender_detector_tests.cpp 
5e4237454133edc155e74ffa04aec24ccd04c1b4 

Diff: https://reviews.apache.org/r/15710/diff/


Testing
---

make check 100 iterations


Thanks,

Jiang Yan Xu



[jira] [Assigned] (MESOS-823) ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky

2013-11-19 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu reassigned MESOS-823:


Assignee: Yan Xu

 ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky
 --

 Key: MESOS-823
 URL: https://issues.apache.org/jira/browse/MESOS-823
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Yan Xu
Assignee: Yan Xu
 Fix For: 0.16.0


 This was never captured on faster build servers...



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MESOS-823) ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky

2013-11-19 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827209#comment-13827209
 ] 

Yan Xu commented on MESOS-823:
--

https://reviews.apache.org/r/15710/

 ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky
 --

 Key: MESOS-823
 URL: https://issues.apache.org/jira/browse/MESOS-823
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Yan Xu
 Fix For: 0.16.0


 This was never captured on faster build servers...



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 15710: Fixed a bug in ZooKeeperMasterContenderDetectorTest that caused the local timeout in Group not getting triggered.

2013-11-19 Thread Ben Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15710/#review29156
---



src/tests/master_contender_detector_tests.cpp
https://reviews.apache.org/r/15710/#comment56297

Looks good, we should consider creating a testing abstraction to make sure 
that our tests do not run forever:

DO_FOR (Seconds(10)) {
  ...
}


- Ben Mahler


On Nov. 20, 2013, 1:20 a.m., Jiang Yan Xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15710/
 ---
 
 (Updated Nov. 20, 2013, 1:20 a.m.)
 
 
 Review request for mesos and Ben Mahler.
 
 
 Bugs: MESOS-823
 https://issues.apache.org/jira/browse/MESOS-823
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 See summary.
 
 
 Diffs
 -
 
   src/tests/master_contender_detector_tests.cpp 
 5e4237454133edc155e74ffa04aec24ccd04c1b4 
 
 Diff: https://reviews.apache.org/r/15710/diff/
 
 
 Testing
 ---
 
 make check 100 iterations
 
 
 Thanks,
 
 Jiang Yan Xu
 




[jira] [Created] (MESOS-824) export running config via http+json

2013-11-19 Thread David Robinson (JIRA)
David Robinson created MESOS-824:


 Summary: export running config via http+json 
 Key: MESOS-824
 URL: https://issues.apache.org/jira/browse/MESOS-824
 Project: Mesos
  Issue Type: Improvement
Reporter: David Robinson
Priority: Minor


Currently there's no way of knowing whether a slave is actually checkpointing 
(except for grepping through logs, which isn't ideal). The --checkpoint flag on 
the command line can't be used to detect this since checkpointing could be 
enabled on the slave but not in the framework. Because of this we cannot detect 
whether slave recovery is actually enabled and therefore can't tell whether 
it's safe to restart a slave.

Please export the running config, preferably via a json endpoint.



--
This message was sent by Atlassian JIRA
(v6.1#6144)