Jenkins build is back to normal : Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME #1766
See https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Set-JAVA_HOME/1766/
Re: Review Request 15653: Adds systemLoad() convenience method to stout
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15653/#review29124 --- 3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp https://reviews.apache.org/r/15653/#comment56255 How about Tryvectordouble instead? We typically return values by return value rather than function parameter. - Vinod Kone On Nov. 18, 2013, 7:19 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15653/ --- (Updated Nov. 18, 2013, 7:19 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Repository: mesos-git Description --- This patch includes a wrapper to get system load averages in uptime(1) format. This is used by an upcoming patch which expose these averages over master and slave stats.json endpoints. Diffs - 3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp f6bbf5e00a810affd8cb6f828d1f306dc8bf3051 Diff: https://reviews.apache.org/r/15653/diff/ Testing --- make check and functional testing with endpoints. Thanks, Niklas Nielsen
Re: Review Request 15653: Adds systemLoad() convenience method to stout
On Nov. 19, 2013, 6:48 p.m., Vinod Kone wrote: 3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp, line 836 https://reviews.apache.org/r/15653/diff/1/?file=388010#file388010line836 How about Tryvectordouble instead? We typically return values by return value rather than function parameter. SGTM - Will get that in. - Niklas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15653/#review29124 --- On Nov. 18, 2013, 7:19 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15653/ --- (Updated Nov. 18, 2013, 7:19 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Repository: mesos-git Description --- This patch includes a wrapper to get system load averages in uptime(1) format. This is used by an upcoming patch which expose these averages over master and slave stats.json endpoints. Diffs - 3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp f6bbf5e00a810affd8cb6f828d1f306dc8bf3051 Diff: https://reviews.apache.org/r/15653/diff/ Testing --- make check and functional testing with endpoints. Thanks, Niklas Nielsen
Re: Review Request 14669: launchTasks on list of offers
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14669/#review29127 --- Sorry for the delay on this. Mind rebasing it? I will get this committed today. Thanks. - Vinod Kone On Nov. 14, 2013, 10:31 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14669/ --- (Updated Nov. 14, 2013, 10:31 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Bugs: MESOS-749 https://issues.apache.org/jira/browse/MESOS-749 Repository: mesos-git Description --- Running tasks on more than one offer belonging to a single slave can be useful in situations with multiple out-standing offers. This patch extends the usual launchTasks() to accept a vector of OfferIDs. The previous launchTasks (accepting a single OfferID) has been kept for backward compatibility, but this now calls the new launchTasks() with a one-element list. This also applied for the JNI and python interfaces, which accepts both formats as well. Offers are verified to belong to the same slave and framework, before resources are merged and used. Diffs - include/mesos/scheduler.hpp 380e087 src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 9869929 src/java/src/org/apache/mesos/MesosSchedulerDriver.java ed4b4a3 src/java/src/org/apache/mesos/SchedulerDriver.java 5b0ca39 src/master/master.hpp e377af8 src/master/master.cpp abab6ce src/messages/messages.proto 71f68a0 src/python/native/mesos_scheduler_driver_impl.cpp 059ed5d src/sched/sched.cpp 3abe72f src/tests/master_tests.cpp bf790d2 src/tests/resource_offers_tests.cpp 2864c9a Diff: https://reviews.apache.org/r/14669/diff/ Testing --- Three new tests has been added: LaunchCombinedOfferTest, LaunchAcrossSlavesTest and LaunchDuplicateOfferTest This test ensures that: 1) Multiple offers can be used to run a single task (requesting the sum of offer resources). 2) Offers cannot span multiple slaves. 3) No offers can appear more than once in offer list. $ make check ... [ RUN ] MasterTest.LaunchCombinedOfferTest [ OK ] MasterTest.LaunchCombinedOfferTest (2010 ms) [ RUN ] MasterTest.LaunchAcrossSlavesTest [ OK ] MasterTest.LaunchAcrossSlavesTest (3 ms) [ RUN ] MasterTest.LaunchDuplicateOfferTest [ OK ] MasterTest.LaunchDuplicateOfferTest (3 ms) ... Thanks, Niklas Nielsen
[jira] [Created] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky
Yan Xu created MESOS-822: Summary: AllocatorTest/0.SchedulerFailover is flaky Key: MESOS-822 URL: https://issues.apache.org/jira/browse/MESOS-822 Project: Mesos Issue Type: Bug Reporter: Yan Xu Fix For: 0.16.0 Log output: http://sfo2-aad-36-sr1.perf.twttr.net:8080/job/mesos-centos-6-gcc/119/console I1119 12:01:33.126309 17083 master.hpp:438] Removing offer 201311191201-16777343-52448-17056-1 with resources cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] on slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) tests/allocator_tests.cpp:993: Failure Mock function called more times than expected - taking default action specified at: ./tests/mesos.hpp:412: Function call: resourcesUnused(@0x7f6f80025e58 201311191201-16777343-52448-17056-, @0x7f6f80025e38 201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object 00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00) Expected: to be called once Actual: called twice - over-saturated and active I1119 12:01:33.126698 17083 hierarchical_allocator_process.hpp:547] Framework 201311191201-16777343-52448-17056- left cpus(*):2; mem(*):768; disk(*):22668; ports(*):[31000-32000] unused on slave 201311191201-16777343-52448-17056-0 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky
[ https://issues.apache.org/jira/browse/MESOS-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-822: - Description: slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) tests/allocator_tests.cpp:993: Failure Mock function called more times than expected - taking default action specified at: ./tests/mesos.hpp:412: Function call: resourcesUnused(@0x7f6f80025e58 201311191201-16777343-52448-17056-, @0x7f6f80025e38 201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object 00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00) Expected: to be called once Actual: called twice - over-saturated and active Full Log: [ RUN ] AllocatorTest/0.SchedulerFailover I1119 12:01:32.106143 19009 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.106276 19017 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.108185 18999 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.113991 17076 master.cpp:285] Master started on 127.0.0.1:52448 I1119 12:01:32.114038 17076 master.cpp:299] Master ID: 201311191201-16777343-52448-17056 I1119 12:01:32.114047 17076 master.cpp:302] Master only allowing authenticated frameworks to register! I1119 12:01:32.114109 17082 slave.cpp:112] Slave started on 127)@127.0.0.1:52448 I1119 12:01:32.114209 17082 slave.cpp:212] Slave resources: cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000] I1119 12:01:32.114393 17080 sched.cpp:207] New master detected at master@127.0.0.1:52448 I1119 12:01:32.114413 17080 sched.cpp:260] Authenticating with master master@127.0.0.1:52448 I1119 12:01:32.114461 17080 sched.cpp:229] Detecting new master I1119 12:01:32.114497 17080 authenticatee.hpp:124] Creating new client SASL connection I1119 12:01:32.118248 17082 state.cpp:33] Recovering state from '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/meta' I1119 12:01:32.118343 17082 status_update_manager.cpp:180] Recovering status update manager I1119 12:01:32.118407 17082 slave.cpp:2743] Finished recovery I1119 12:01:32.118463 17082 slave.cpp:497] New master detected at master@127.0.0.1:52448 I1119 12:01:32.118517 17082 slave.cpp:524] Detecting new master I1119 12:01:32.118538 17082 status_update_manager.cpp:158] New master detected at master@127.0.0.1:52448 I1119 12:01:32.118906 17076 master.cpp:1734] Authenticating framework at scheduler(119)@127.0.0.1:52448 W1119 12:01:32.118986 17076 master.cpp:1235] Ignoring register slave message from localhost.localdomain since not elected yet I1119 12:01:32.119091 17076 master.cpp:85] No whitelist given. Advertising offers for all slaves I1119 12:01:32.119155 17076 authenticator.hpp:140] Creating new server SASL connection I1119 12:01:32.119243 17076 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@127.0.0.1:52448 I1119 12:01:32.119279 17076 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5 I1119 12:01:32.119293 17076 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I1119 12:01:32.119312 17076 master.cpp:744] The newly elected leader is master@127.0.0.1:52448 I1119 12:01:32.119321 17076 master.cpp:748] Elected as the leading master! I1119 12:01:32.119343 17076 authenticator.hpp:243] Received SASL authentication start I1119 12:01:32.119390 17076 authenticator.hpp:325] Authentication requires more steps I1119 12:01:32.119417 17076 authenticatee.hpp:258] Received SASL authentication step I1119 12:01:32.119447 17076 authenticator.hpp:271] Received SASL authentication step I1119 12:01:32.119463 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1119 12:01:32.119472 17076 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1119 12:01:32.119482 17076 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1119 12:01:32.119490 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1119 12:01:32.119498 17076 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true I1119 12:01:32.119503 17076 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true I1119 12:01:32.119514 17076 authenticator.hpp:317] Authentication success I1119 12:01:32.119532 17076 authenticatee.hpp:298] Authentication success I1119 12:01:32.119547 17076 master.cpp:1774] Successfully authenticated framework at scheduler(119)@127.0.0.1:52448 I1119 12:01:32.119604 17076
Re: Review Request 14669: launchTasks on list of offers
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14669/ --- (Updated Nov. 19, 2013, 10:11 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Changes --- Rebased to master. Bugs: MESOS-749 https://issues.apache.org/jira/browse/MESOS-749 Repository: mesos-git Description --- Running tasks on more than one offer belonging to a single slave can be useful in situations with multiple out-standing offers. This patch extends the usual launchTasks() to accept a vector of OfferIDs. The previous launchTasks (accepting a single OfferID) has been kept for backward compatibility, but this now calls the new launchTasks() with a one-element list. This also applied for the JNI and python interfaces, which accepts both formats as well. Offers are verified to belong to the same slave and framework, before resources are merged and used. Diffs (updated) - include/mesos/scheduler.hpp 161cc65 src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp 9869929 src/java/src/org/apache/mesos/MesosSchedulerDriver.java ed4b4a3 src/java/src/org/apache/mesos/SchedulerDriver.java 5b0ca39 src/master/master.hpp c86c1f1 src/master/master.cpp f65b344 src/messages/messages.proto 1f264d5 src/python/native/mesos_scheduler_driver_impl.cpp 059ed5d src/sched/sched.cpp 51f95bb src/tests/master_tests.cpp 37ee7a0 src/tests/resource_offers_tests.cpp 2864c9a Diff: https://reviews.apache.org/r/14669/diff/ Testing --- Three new tests has been added: LaunchCombinedOfferTest, LaunchAcrossSlavesTest and LaunchDuplicateOfferTest This test ensures that: 1) Multiple offers can be used to run a single task (requesting the sum of offer resources). 2) Offers cannot span multiple slaves. 3) No offers can appear more than once in offer list. $ make check ... [ RUN ] MasterTest.LaunchCombinedOfferTest [ OK ] MasterTest.LaunchCombinedOfferTest (2010 ms) [ RUN ] MasterTest.LaunchAcrossSlavesTest [ OK ] MasterTest.LaunchAcrossSlavesTest (3 ms) [ RUN ] MasterTest.LaunchDuplicateOfferTest [ OK ] MasterTest.LaunchDuplicateOfferTest (3 ms) ... Thanks, Niklas Nielsen
Review Request 15684: Python CLI helper 'http.get' should not assume JSON.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15684/ --- Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and Vinod Kone. Repository: mesos-git Description --- See summary. Diffs - src/cli/mesos-cat bb1e19750083f9e2680a5a22bd3bd3f7b2bc8656 src/cli/mesos-ps aff8423040a4ba5ce7f41da73a4c70a4d76da93f src/cli/mesos-tail 33acee4f92a1fa0cbf65568537e24f822e083717 src/cli/python/mesos/http.py e65701bee92dcad2af4e871394df5c60a7150659 Diff: https://reviews.apache.org/r/15684/diff/ Testing --- Thanks, Benjamin Hindman
Re: Review Request 15684: Python CLI helper 'http.get' should not assume JSON.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15684/#review29144 --- Ship it! let the client check it even if the content is empty. - Du Li On Nov. 19, 2013, 10:22 p.m., Benjamin Hindman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15684/ --- (Updated Nov. 19, 2013, 10:22 p.m.) Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and Vinod Kone. Repository: mesos-git Description --- See summary. Diffs - src/cli/mesos-cat bb1e19750083f9e2680a5a22bd3bd3f7b2bc8656 src/cli/mesos-ps aff8423040a4ba5ce7f41da73a4c70a4d76da93f src/cli/mesos-tail 33acee4f92a1fa0cbf65568537e24f822e083717 src/cli/python/mesos/http.py e65701bee92dcad2af4e871394df5c60a7150659 Diff: https://reviews.apache.org/r/15684/diff/ Testing --- Thanks, Benjamin Hindman
Re: Review Request 15684: Python CLI helper 'http.get' should not assume JSON.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15684/#review29145 --- Ship it! Ship It! - Shingo Omura On Nov. 19, 2013, 10:22 p.m., Benjamin Hindman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15684/ --- (Updated Nov. 19, 2013, 10:22 p.m.) Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and Vinod Kone. Repository: mesos-git Description --- See summary. Diffs - src/cli/mesos-cat bb1e19750083f9e2680a5a22bd3bd3f7b2bc8656 src/cli/mesos-ps aff8423040a4ba5ce7f41da73a4c70a4d76da93f src/cli/mesos-tail 33acee4f92a1fa0cbf65568537e24f822e083717 src/cli/python/mesos/http.py e65701bee92dcad2af4e871394df5c60a7150659 Diff: https://reviews.apache.org/r/15684/diff/ Testing --- Thanks, Benjamin Hindman
Re: Review Request 15684: Python CLI helper 'http.get' should not assume JSON.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15684/#review29146 --- Ship it! Ship It! - Ben Mahler On Nov. 19, 2013, 10:22 p.m., Benjamin Hindman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15684/ --- (Updated Nov. 19, 2013, 10:22 p.m.) Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and Vinod Kone. Repository: mesos-git Description --- See summary. Diffs - src/cli/mesos-cat bb1e19750083f9e2680a5a22bd3bd3f7b2bc8656 src/cli/mesos-ps aff8423040a4ba5ce7f41da73a4c70a4d76da93f src/cli/mesos-tail 33acee4f92a1fa0cbf65568537e24f822e083717 src/cli/python/mesos/http.py e65701bee92dcad2af4e871394df5c60a7150659 Diff: https://reviews.apache.org/r/15684/diff/ Testing --- Thanks, Benjamin Hindman
Re: Review Request 15653: Adds systemLoad() convenience method to stout
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15653/ --- (Updated Nov. 19, 2013, 11:08 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Changes --- Returns vector instead of return parameter array. Repository: mesos-git Description --- This patch includes a wrapper to get system load averages in uptime(1) format. This is used by an upcoming patch which expose these averages over master and slave stats.json endpoints. Diffs (updated) - 3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp f6bbf5e Diff: https://reviews.apache.org/r/15653/diff/ Testing --- make check and functional testing with endpoints. Thanks, Niklas Nielsen
Review Request 15691: Bug fix in Python CLI futures.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15691/ --- Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and Vinod Kone. Repository: mesos-git Description --- See summary. Diffs - src/cli/python/mesos/futures.py 9c36823c94ba26bbe5d17c52c055df0a361f9645 Diff: https://reviews.apache.org/r/15691/diff/ Testing --- Thanks, Benjamin Hindman
Re: Review Request 15691: Bug fix in Python CLI futures.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15691/#review29148 --- Ship it! Ship It! - Du Li On Nov. 19, 2013, 11:13 p.m., Benjamin Hindman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15691/ --- (Updated Nov. 19, 2013, 11:13 p.m.) Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and Vinod Kone. Repository: mesos-git Description --- See summary. Diffs - src/cli/python/mesos/futures.py 9c36823c94ba26bbe5d17c52c055df0a361f9645 Diff: https://reviews.apache.org/r/15691/diff/ Testing --- Thanks, Benjamin Hindman
Re: Review Request 15691: Bug fix in Python CLI futures.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15691/#review29149 --- Ship it! Ship It! - Ben Mahler On Nov. 19, 2013, 11:13 p.m., Benjamin Hindman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15691/ --- (Updated Nov. 19, 2013, 11:13 p.m.) Review request for mesos, Ben Mahler, Du Li, Shingo Omura, Niklas Nielsen, and Vinod Kone. Repository: mesos-git Description --- See summary. Diffs - src/cli/python/mesos/futures.py 9c36823c94ba26bbe5d17c52c055df0a361f9645 Diff: https://reviews.apache.org/r/15691/diff/ Testing --- Thanks, Benjamin Hindman
Re: Review Request 15653: Adds systemLoad() convenience method to stout
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15653/#review29151 --- 3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp https://reviews.apache.org/r/15653/#comment56284 Hey Nik, I see your gist here: https://gist.github.com/nqn/7493244 More interesting than node-wide load average will be the total cpu time for the master, can we expose the same cpu time information as what we do in ProcessIsolator::usage instead of the load average? - Ben Mahler On Nov. 19, 2013, 11:08 p.m., Niklas Nielsen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15653/ --- (Updated Nov. 19, 2013, 11:08 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Vinod Kone. Repository: mesos-git Description --- This patch includes a wrapper to get system load averages in uptime(1) format. This is used by an upcoming patch which expose these averages over master and slave stats.json endpoints. Diffs - 3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp f6bbf5e Diff: https://reviews.apache.org/r/15653/diff/ Testing --- make check and functional testing with endpoints. Thanks, Niklas Nielsen
Re: Review Request 14960: implementation of CLI mesos-status
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14960/ --- (Updated Nov. 19, 2013, 11:43 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Shingo Omura. Changes --- This commit implements the CLI command mesos-status, which reports three categories of hosts: (1) The hosts that are reported by the /master/state.json and responded to /slave(1)/health query; (2) Those that are reported by master but failed to respond to health query by timeout. Repository: mesos-git Description --- This commit implements the CLI command mesos-status, which reports three categories of hosts: (1) The hosts that are reported by the /master/state.json and responded to /slave(1)/health query; (2) Those that are reported by master but failed to respond to health query by timeout; (3) Those that are included in the var/mesos/deploy/slaves files but not reported by the master. Diffs (updated) - src/cli/mesos-status PRE-CREATION Diff: https://reviews.apache.org/r/14960/diff/ Testing --- has been tested on a local cluster of 12 servers. Thanks, Du Li
Re: Review Request 14960: implementation of CLI mesos-status
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14960/ --- (Updated Nov. 19, 2013, 11:47 p.m.) Review request for mesos, Benjamin Hindman, Ben Mahler, and Shingo Omura. Changes --- This commit implements the CLI command mesos-status, which reports three categories of hosts: (1) The hosts that are reported by the /master/state.json and responded to /slave(1)/health query; (2) Those that are reported by master but failed to respond to health query by timeout. This diff is rebased on latest code on master and has temporarily removed code for checking configuration file. Repository: mesos-git Description --- This commit implements the CLI command mesos-status, which reports three categories of hosts: (1) The hosts that are reported by the /master/state.json and responded to /slave(1)/health query; (2) Those that are reported by master but failed to respond to health query by timeout; (3) Those that are included in the var/mesos/deploy/slaves files but not reported by the master. Diffs (updated) - src/cli/mesos-status PRE-CREATION Diff: https://reviews.apache.org/r/14960/diff/ Testing --- has been tested on a local cluster of 12 servers. Thanks, Du Li
[jira] [Updated] (MESOS-818) Bump up the minimum number threads libprocess creates to accommodate new tests
[ https://issues.apache.org/jira/browse/MESOS-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-818: - Component/s: (was: general) libprocess Bump up the minimum number threads libprocess creates to accommodate new tests -- Key: MESOS-818 URL: https://issues.apache.org/jira/browse/MESOS-818 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Yan Xu Assignee: Yan Xu Labels: twitter Fix For: 0.16.0 Currently the minimum number of threads libprocess creates is 4 which causes some newly written tests that have more libprocess processes needing to wait on latches than the number of threads libprocess has thus are starved. See: https://github.com/apache/mesos/blob/dd89ea359ec55fbc90b5718d9cdbf021f189c2fa/3rdparty/libprocess/src/process.cpp#L1367 Need to bump the minimum number to 8. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MESOS-818) Bump up the minimum number threads libprocess creates to accommodate new tests
[ https://issues.apache.org/jira/browse/MESOS-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-818: - Component/s: general Bump up the minimum number threads libprocess creates to accommodate new tests -- Key: MESOS-818 URL: https://issues.apache.org/jira/browse/MESOS-818 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Yan Xu Assignee: Yan Xu Labels: twitter Fix For: 0.16.0 Currently the minimum number of threads libprocess creates is 4 which causes some newly written tests that have more libprocess processes needing to wait on latches than the number of threads libprocess has thus are starved. See: https://github.com/apache/mesos/blob/dd89ea359ec55fbc90b5718d9cdbf021f189c2fa/3rdparty/libprocess/src/process.cpp#L1367 Need to bump the minimum number to 8. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MESOS-818) Bump up the minimum number threads libprocess creates to accommodate new tests
[ https://issues.apache.org/jira/browse/MESOS-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-818: - Labels: twitter (was: ) Bump up the minimum number threads libprocess creates to accommodate new tests -- Key: MESOS-818 URL: https://issues.apache.org/jira/browse/MESOS-818 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Yan Xu Assignee: Yan Xu Labels: twitter Fix For: 0.16.0 Currently the minimum number of threads libprocess creates is 4 which causes some newly written tests that have more libprocess processes needing to wait on latches than the number of threads libprocess has thus are starved. See: https://github.com/apache/mesos/blob/dd89ea359ec55fbc90b5718d9cdbf021f189c2fa/3rdparty/libprocess/src/process.cpp#L1367 Need to bump the minimum number to 8. -- This message was sent by Atlassian JIRA (v6.1#6144)
Review Request 15706: Fixed Group to retry when authentication failed due to retryable errors.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15706/ --- Review request for mesos, Benjamin Hindman, Ben Mahler, Ian Downes, Jie Yu, and Vinod Kone. Bugs: MESOS-814 https://issues.apache.org/jira/browse/MESOS-814 Repository: mesos-git Description --- See summary. Diffs - src/zookeeper/group.hpp 04068e357cec95457d1f24c166d0b60f86d997d2 src/zookeeper/group.cpp 12c781b29f4300ca8a29660adc3f1e55e03d5d04 Diff: https://reviews.apache.org/r/15706/diff/ Testing --- make check mesos-tests.sh --gtest_filter=GroupTest*:ZooKeeperTest*:ZooKeeperMasterContenderDetectorTest* with high iterations. This fix is for a problem not easy to expose through unit tests so no new tests were written. Thanks, Jiang Yan Xu
[jira] [Commented] (MESOS-814) Retry retryable authentication failures
[ https://issues.apache.org/jira/browse/MESOS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827152#comment-13827152 ] Yan Xu commented on MESOS-814: -- https://reviews.apache.org/r/15706 Retry retryable authentication failures --- Key: MESOS-814 URL: https://issues.apache.org/jira/browse/MESOS-814 Project: Mesos Issue Type: Improvement Reporter: Yan Xu Assignee: Yan Xu Currently Group puts all unsuccessful operations but authentication into a retry queue if the first attempt fails (and if the error indicates they are retryable). Authentication should be retried as well. See: https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp#L393 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky
[ https://issues.apache.org/jira/browse/MESOS-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler reassigned MESOS-822: - Assignee: Benjamin Mahler AllocatorTest/0.SchedulerFailover is flaky -- Key: MESOS-822 URL: https://issues.apache.org/jira/browse/MESOS-822 Project: Mesos Issue Type: Bug Reporter: Yan Xu Assignee: Benjamin Mahler Fix For: 0.16.0 slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) tests/allocator_tests.cpp:993: Failure Mock function called more times than expected - taking default action specified at: ./tests/mesos.hpp:412: Function call: resourcesUnused(@0x7f6f80025e58 201311191201-16777343-52448-17056-, @0x7f6f80025e38 201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object 00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00) Expected: to be called once Actual: called twice - over-saturated and active Full Log: [ RUN ] AllocatorTest/0.SchedulerFailover I1119 12:01:32.106143 19009 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.106276 19017 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.108185 18999 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.113991 17076 master.cpp:285] Master started on 127.0.0.1:52448 I1119 12:01:32.114038 17076 master.cpp:299] Master ID: 201311191201-16777343-52448-17056 I1119 12:01:32.114047 17076 master.cpp:302] Master only allowing authenticated frameworks to register! I1119 12:01:32.114109 17082 slave.cpp:112] Slave started on 127)@127.0.0.1:52448 I1119 12:01:32.114209 17082 slave.cpp:212] Slave resources: cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000] I1119 12:01:32.114393 17080 sched.cpp:207] New master detected at master@127.0.0.1:52448 I1119 12:01:32.114413 17080 sched.cpp:260] Authenticating with master master@127.0.0.1:52448 I1119 12:01:32.114461 17080 sched.cpp:229] Detecting new master I1119 12:01:32.114497 17080 authenticatee.hpp:124] Creating new client SASL connection I1119 12:01:32.118248 17082 state.cpp:33] Recovering state from '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/meta' I1119 12:01:32.118343 17082 status_update_manager.cpp:180] Recovering status update manager I1119 12:01:32.118407 17082 slave.cpp:2743] Finished recovery I1119 12:01:32.118463 17082 slave.cpp:497] New master detected at master@127.0.0.1:52448 I1119 12:01:32.118517 17082 slave.cpp:524] Detecting new master I1119 12:01:32.118538 17082 status_update_manager.cpp:158] New master detected at master@127.0.0.1:52448 I1119 12:01:32.118906 17076 master.cpp:1734] Authenticating framework at scheduler(119)@127.0.0.1:52448 W1119 12:01:32.118986 17076 master.cpp:1235] Ignoring register slave message from localhost.localdomain since not elected yet I1119 12:01:32.119091 17076 master.cpp:85] No whitelist given. Advertising offers for all slaves I1119 12:01:32.119155 17076 authenticator.hpp:140] Creating new server SASL connection I1119 12:01:32.119243 17076 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@127.0.0.1:52448 I1119 12:01:32.119279 17076 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5 I1119 12:01:32.119293 17076 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I1119 12:01:32.119312 17076 master.cpp:744] The newly elected leader is master@127.0.0.1:52448 I1119 12:01:32.119321 17076 master.cpp:748] Elected as the leading master! I1119 12:01:32.119343 17076 authenticator.hpp:243] Received SASL authentication start I1119 12:01:32.119390 17076 authenticator.hpp:325] Authentication requires more steps I1119 12:01:32.119417 17076 authenticatee.hpp:258] Received SASL authentication step I1119 12:01:32.119447 17076 authenticator.hpp:271] Received SASL authentication step I1119 12:01:32.119463 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1119 12:01:32.119472 17076 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1119 12:01:32.119482 17076 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1119 12:01:32.119490 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1119 12:01:32.119498 17076 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since
Review Request 15707: Fixed a flaky test: AllocatorTest/SchedulerFailover.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15707/ --- Review request for mesos and Vinod Kone. Bugs: MESOS-822 https://issues.apache.org/jira/browse/MESOS-822 Repository: mesos-git Description --- See MESOS-822. The timing in CI was such that a subsequent offer was sent to the scheduler before it could fail over. Diffs - src/tests/allocator_tests.cpp 61ab235c148e7b380b0de148c9ca7bd9fa6563f2 Diff: https://reviews.apache.org/r/15707/diff/ Testing --- make check with 20,000 iterations Thanks, Ben Mahler
[jira] [Commented] (MESOS-822) AllocatorTest/0.SchedulerFailover is flaky
[ https://issues.apache.org/jira/browse/MESOS-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827169#comment-13827169 ] Benjamin Mahler commented on MESOS-822: --- https://reviews.apache.org/r/15707/ AllocatorTest/0.SchedulerFailover is flaky -- Key: MESOS-822 URL: https://issues.apache.org/jira/browse/MESOS-822 Project: Mesos Issue Type: Bug Reporter: Yan Xu Assignee: Benjamin Mahler Fix For: 0.16.0 slave 201311191201-16777343-52448-17056-0 (localhost.localdomain) tests/allocator_tests.cpp:993: Failure Mock function called more times than expected - taking default action specified at: ./tests/mesos.hpp:412: Function call: resourcesUnused(@0x7f6f80025e58 201311191201-16777343-52448-17056-, @0x7f6f80025e38 201311191201-16777343-52448-17056-0, @0x7f6f80025e00 { cpus(*):2, mem(*):768, disk(*):22668, ports(*):[31000-32000] }, @0x7f6f80025df0 16-byte object 00-00 00-00 00-00 00-00 30-10 03-80 6F-7F 00-00) Expected: to be called once Actual: called twice - over-saturated and active Full Log: [ RUN ] AllocatorTest/0.SchedulerFailover I1119 12:01:32.106143 19009 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.106276 19017 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.108185 18999 exec.cpp:84] Committing suicide by killing the process group I1119 12:01:32.113991 17076 master.cpp:285] Master started on 127.0.0.1:52448 I1119 12:01:32.114038 17076 master.cpp:299] Master ID: 201311191201-16777343-52448-17056 I1119 12:01:32.114047 17076 master.cpp:302] Master only allowing authenticated frameworks to register! I1119 12:01:32.114109 17082 slave.cpp:112] Slave started on 127)@127.0.0.1:52448 I1119 12:01:32.114209 17082 slave.cpp:212] Slave resources: cpus(*):3; mem(*):1024; disk(*):22668; ports(*):[31000-32000] I1119 12:01:32.114393 17080 sched.cpp:207] New master detected at master@127.0.0.1:52448 I1119 12:01:32.114413 17080 sched.cpp:260] Authenticating with master master@127.0.0.1:52448 I1119 12:01:32.114461 17080 sched.cpp:229] Detecting new master I1119 12:01:32.114497 17080 authenticatee.hpp:124] Creating new client SASL connection I1119 12:01:32.118248 17082 state.cpp:33] Recovering state from '/tmp/AllocatorTest_0_SchedulerFailover_LsrJz0/meta' I1119 12:01:32.118343 17082 status_update_manager.cpp:180] Recovering status update manager I1119 12:01:32.118407 17082 slave.cpp:2743] Finished recovery I1119 12:01:32.118463 17082 slave.cpp:497] New master detected at master@127.0.0.1:52448 I1119 12:01:32.118517 17082 slave.cpp:524] Detecting new master I1119 12:01:32.118538 17082 status_update_manager.cpp:158] New master detected at master@127.0.0.1:52448 I1119 12:01:32.118906 17076 master.cpp:1734] Authenticating framework at scheduler(119)@127.0.0.1:52448 W1119 12:01:32.118986 17076 master.cpp:1235] Ignoring register slave message from localhost.localdomain since not elected yet I1119 12:01:32.119091 17076 master.cpp:85] No whitelist given. Advertising offers for all slaves I1119 12:01:32.119155 17076 authenticator.hpp:140] Creating new server SASL connection I1119 12:01:32.119243 17076 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@127.0.0.1:52448 I1119 12:01:32.119279 17076 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5 I1119 12:01:32.119293 17076 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5' I1119 12:01:32.119312 17076 master.cpp:744] The newly elected leader is master@127.0.0.1:52448 I1119 12:01:32.119321 17076 master.cpp:748] Elected as the leading master! I1119 12:01:32.119343 17076 authenticator.hpp:243] Received SASL authentication start I1119 12:01:32.119390 17076 authenticator.hpp:325] Authentication requires more steps I1119 12:01:32.119417 17076 authenticatee.hpp:258] Received SASL authentication step I1119 12:01:32.119447 17076 authenticator.hpp:271] Received SASL authentication step I1119 12:01:32.119463 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false I1119 12:01:32.119472 17076 auxprop.cpp:153] Looking up auxiliary property '*userPassword' I1119 12:01:32.119482 17076 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' I1119 12:01:32.119490 17076 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'localhost.localdomain' server FQDN: 'localhost.localdomain' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true I1119 12:01:32.119498 17076
Review Request 15708: Improved exit status printing in the CgroupsIsolator.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15708/ --- Review request for mesos and Vinod Kone. Repository: mesos-git Description --- See above. Diffs - src/slave/cgroups_isolator.cpp c769ae045783125013989b12f8aa61dfda687ce8 Diff: https://reviews.apache.org/r/15708/diff/ Testing --- make check Thanks, Ben Mahler
[jira] [Created] (MESOS-823) ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky
Yan Xu created MESOS-823: Summary: ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky Key: MESOS-823 URL: https://issues.apache.org/jira/browse/MESOS-823 Project: Mesos Issue Type: Bug Components: test Reporter: Yan Xu Fix For: 0.16.0 This was never captured on faster build servers... -- This message was sent by Atlassian JIRA (v6.1#6144)
Review Request 15710: Fixed a bug in ZooKeeperMasterContenderDetectorTest that caused the local timeout in Group not getting triggered.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15710/ --- Review request for mesos and Ben Mahler. Bugs: MESOS-823 https://issues.apache.org/jira/browse/MESOS-823 Repository: mesos-git Description --- See summary. Diffs - src/tests/master_contender_detector_tests.cpp 5e4237454133edc155e74ffa04aec24ccd04c1b4 Diff: https://reviews.apache.org/r/15710/diff/ Testing --- make check 100 iterations Thanks, Jiang Yan Xu
[jira] [Assigned] (MESOS-823) ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky
[ https://issues.apache.org/jira/browse/MESOS-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-823: Assignee: Yan Xu ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky -- Key: MESOS-823 URL: https://issues.apache.org/jira/browse/MESOS-823 Project: Mesos Issue Type: Bug Components: test Reporter: Yan Xu Assignee: Yan Xu Fix For: 0.16.0 This was never captured on faster build servers... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MESOS-823) ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky
[ https://issues.apache.org/jira/browse/MESOS-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827209#comment-13827209 ] Yan Xu commented on MESOS-823: -- https://reviews.apache.org/r/15710/ ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork is flaky -- Key: MESOS-823 URL: https://issues.apache.org/jira/browse/MESOS-823 Project: Mesos Issue Type: Bug Components: test Reporter: Yan Xu Fix For: 0.16.0 This was never captured on faster build servers... -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Review Request 15710: Fixed a bug in ZooKeeperMasterContenderDetectorTest that caused the local timeout in Group not getting triggered.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15710/#review29156 --- src/tests/master_contender_detector_tests.cpp https://reviews.apache.org/r/15710/#comment56297 Looks good, we should consider creating a testing abstraction to make sure that our tests do not run forever: DO_FOR (Seconds(10)) { ... } - Ben Mahler On Nov. 20, 2013, 1:20 a.m., Jiang Yan Xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15710/ --- (Updated Nov. 20, 2013, 1:20 a.m.) Review request for mesos and Ben Mahler. Bugs: MESOS-823 https://issues.apache.org/jira/browse/MESOS-823 Repository: mesos-git Description --- See summary. Diffs - src/tests/master_contender_detector_tests.cpp 5e4237454133edc155e74ffa04aec24ccd04c1b4 Diff: https://reviews.apache.org/r/15710/diff/ Testing --- make check 100 iterations Thanks, Jiang Yan Xu
[jira] [Created] (MESOS-824) export running config via http+json
David Robinson created MESOS-824: Summary: export running config via http+json Key: MESOS-824 URL: https://issues.apache.org/jira/browse/MESOS-824 Project: Mesos Issue Type: Improvement Reporter: David Robinson Priority: Minor Currently there's no way of knowing whether a slave is actually checkpointing (except for grepping through logs, which isn't ideal). The --checkpoint flag on the command line can't be used to detect this since checkpointing could be enabled on the slave but not in the framework. Because of this we cannot detect whether slave recovery is actually enabled and therefore can't tell whether it's safe to restart a slave. Please export the running config, preferably via a json endpoint. -- This message was sent by Atlassian JIRA (v6.1#6144)