[jira] [Commented] (MESOS-3801) Flaky test on Ubuntu Wily: ReservationTest.DropReserveTooLarge
[ https://issues.apache.org/jira/browse/MESOS-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972254#comment-14972254 ] Anand Mazumdar commented on MESOS-3801: --- [~neilc] Can you add the verbose logs too ? > Flaky test on Ubuntu Wily: ReservationTest.DropReserveTooLarge > -- > > Key: MESOS-3801 > URL: https://issues.apache.org/jira/browse/MESOS-3801 > Project: Mesos > Issue Type: Bug > Environment: Linux vagrant-ubuntu-wily-64 4.2.0-16-generic #19-Ubuntu > SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Neil Conway >Priority: Minor > Labels: flaky-test, mesosphere > > {noformat} > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from ReservationTest > [ RUN ] ReservationTest.DropReserveTooLarge > /mesos/src/tests/reservation_tests.cpp:449: Failure > Failed to wait 15secs for offers > /mesos/src/tests/reservation_tests.cpp:439: Failure > Actual function call count doesn't match EXPECT_CALL(sched, > resourceOffers(&driver, _))... > Expected: to be called once >Actual: never called - unsatisfied and active > /mesos/src/tests/reservation_tests.cpp:421: Failure > Actual function call count doesn't match EXPECT_CALL(allocator, addSlave(_, > _, _, _, _))... > Expected: to be called once >Actual: never called - unsatisfied and active > [ FAILED ] ReservationTest.DropReserveTooLarge (15302 ms) > [--] 1 test from ReservationTest (15303 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (15308 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] ReservationTest.DropReserveTooLarge > 1 FAILED TEST > {noformat} > Repro'd via "mesos-tests --gtest_filter=ReservationTest.DropReserveTooLarge > --gtest_repeat=100". ~4 runs out of 100 resulted in the error. Note that test > runtime varied pretty widely: most test runs completed in < 500ms, but many > (1/3?) of runs took 5000ms or longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972219#comment-14972219 ] Klaus Ma commented on MESOS-3765: - So the "granularity" is cluster level and access (CRUD) only by operator. In allocator, it will assign resources by granularity instead of all slave resources. And I think the granularity should have min value, or the allocation will be slow :). > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972146#comment-14972146 ] Vinod Kone commented on MESOS-1739: --- Great to hear. Yea. Will be happy to shepherd. > Allow slave reconfiguration on restart > -- > > Key: MESOS-1739 > URL: https://issues.apache.org/jira/browse/MESOS-1739 > Project: Mesos > Issue Type: Epic >Reporter: Patrick Reilly >Assignee: Greg Mann > Labels: external-volumes, mesosphere, myriad > > Make it so that either via a slave restart or a out of process "reconfigure" > ping, the attributes and resources of a slave can be updated to be a superset > of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`
[ https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972123#comment-14972123 ] Greg Mann edited comment on MESOS-3506 at 10/23/15 11:50 PM: - Thanks [~haosd...@gmail.com]! I was using a different CentOS6 image than usual, and it turns out it had some extra stuff installed by default. You're right, I confirmed on another bare image that those are not installed. was (Author: greggomann): Thanks [~haosdent]! I was using a different CentOS6 image than usual, and it turns out it had some extra stuff installed by default. You're right, I confirmed on another bare image that those are not installed. > Build instructions for CentOS 6.6 should include `sudo yum update` > -- > > Key: MESOS-3506 > URL: https://issues.apache.org/jira/browse/MESOS-3506 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: documentation, mesosphere > > Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the > build to break when building {{mesos-0.25.0.jar}}. The build instructions for > this platform on the Getting Started page should be changed accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3801) Flaky test on Ubuntu Wily: ReservationTest.DropReserveTooLarge
Neil Conway created MESOS-3801: -- Summary: Flaky test on Ubuntu Wily: ReservationTest.DropReserveTooLarge Key: MESOS-3801 URL: https://issues.apache.org/jira/browse/MESOS-3801 Project: Mesos Issue Type: Bug Environment: Linux vagrant-ubuntu-wily-64 4.2.0-16-generic #19-Ubuntu SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Reporter: Neil Conway Priority: Minor {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from ReservationTest [ RUN ] ReservationTest.DropReserveTooLarge /mesos/src/tests/reservation_tests.cpp:449: Failure Failed to wait 15secs for offers /mesos/src/tests/reservation_tests.cpp:439: Failure Actual function call count doesn't match EXPECT_CALL(sched, resourceOffers(&driver, _))... Expected: to be called once Actual: never called - unsatisfied and active /mesos/src/tests/reservation_tests.cpp:421: Failure Actual function call count doesn't match EXPECT_CALL(allocator, addSlave(_, _, _, _, _))... Expected: to be called once Actual: never called - unsatisfied and active [ FAILED ] ReservationTest.DropReserveTooLarge (15302 ms) [--] 1 test from ReservationTest (15303 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (15308 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] ReservationTest.DropReserveTooLarge 1 FAILED TEST {noformat} Repro'd via "mesos-tests --gtest_filter=ReservationTest.DropReserveTooLarge --gtest_repeat=100". ~4 runs out of 100 resulted in the error. Note that test runtime varied pretty widely: most test runs completed in < 500ms, but many (1/3?) of runs took 5000ms or longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`
[ https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972124#comment-14972124 ] Greg Mann commented on MESOS-3506: -- Well it turns out that `sudo yum update -y nss` will take care of the issue (go figure). I'll adjust the review accordingly. > Build instructions for CentOS 6.6 should include `sudo yum update` > -- > > Key: MESOS-3506 > URL: https://issues.apache.org/jira/browse/MESOS-3506 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: documentation, mesosphere > > Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the > build to break when building {{mesos-0.25.0.jar}}. The build instructions for > this platform on the Getting Started page should be changed accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`
[ https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972123#comment-14972123 ] Greg Mann commented on MESOS-3506: -- Thanks [~haosdent]! I was using a different CentOS6 image than usual, and it turns out it had some extra stuff installed by default. You're right, I confirmed on another bare image that those are not installed. > Build instructions for CentOS 6.6 should include `sudo yum update` > -- > > Key: MESOS-3506 > URL: https://issues.apache.org/jira/browse/MESOS-3506 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: documentation, mesosphere > > Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the > build to break when building {{mesos-0.25.0.jar}}. The build instructions for > this platform on the Getting Started page should be changed accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972043#comment-14972043 ] Greg Mann commented on MESOS-1739: -- I'd like to have a go at getting this thing pushed through. [~vinodkone], are you still interested in shepherding? I've read through the existing patch and reviews; I can try to come up with a solution to the repeated re-registration problem outlined above. Once I have an idea in mind, would you like me to explain my plan in a small design doc or just here via comments? > Allow slave reconfiguration on restart > -- > > Key: MESOS-1739 > URL: https://issues.apache.org/jira/browse/MESOS-1739 > Project: Mesos > Issue Type: Epic >Reporter: Patrick Reilly >Assignee: Greg Mann > Labels: external-volumes, mesosphere, myriad > > Make it so that either via a slave restart or a out of process "reconfigure" > ping, the attributes and resources of a slave can be updated to be a superset > of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-1739: Assignee: Greg Mann > Allow slave reconfiguration on restart > -- > > Key: MESOS-1739 > URL: https://issues.apache.org/jira/browse/MESOS-1739 > Project: Mesos > Issue Type: Epic >Reporter: Patrick Reilly >Assignee: Greg Mann > Labels: external-volumes, mesosphere, myriad > > Make it so that either via a slave restart or a out of process "reconfigure" > ping, the attributes and resources of a slave can be updated to be a superset > of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3800) Containerizer attempts to create Linux launcher by default
[ https://issues.apache.org/jira/browse/MESOS-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3800: - Labels: Mesosphere (was: ) > Containerizer attempts to create Linux launcher by default > --- > > Key: MESOS-3800 > URL: https://issues.apache.org/jira/browse/MESOS-3800 > Project: Mesos > Issue Type: Bug >Reporter: Artem Harutyunyan >Assignee: Artem Harutyunyan > Labels: Mesosphere > > Mesos containerizer attempts to create a Linux launcher by default without > verifying whether the necessary prerequisites (such as availability of > cgroups) are met. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3800) Containerizer attempts to create Linux launcher by default
Artem Harutyunyan created MESOS-3800: Summary: Containerizer attempts to create Linux launcher by default Key: MESOS-3800 URL: https://issues.apache.org/jira/browse/MESOS-3800 Project: Mesos Issue Type: Bug Reporter: Artem Harutyunyan Assignee: Artem Harutyunyan Mesos containerizer attempts to create a Linux launcher by default without verifying whether the necessary prerequisites (such as availability of cgroups) are met. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3799) Compilation warning with Ubuntu wily: auto_ptr is deprecated
[ https://issues.apache.org/jira/browse/MESOS-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-3799: --- Labels: mesosphere (was: ) > Compilation warning with Ubuntu wily: auto_ptr is deprecated > > > Key: MESOS-3799 > URL: https://issues.apache.org/jira/browse/MESOS-3799 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Priority: Minor > Labels: mesosphere > > Variants of this message are printed many times during compilation (Wily on > AMD64): > {noformat} > CXX libprocess_la-pid.lo > CXX libprocess_la-poll_socket.lo > CXX libprocess_la-profiler.lo > In file included from > /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp:23:0, > from > /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/stringify.hpp:26, > from > /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:59, > from > /mesos/3rdparty/libprocess/include/process/address.hpp:34, > from /mesos/3rdparty/libprocess/include/process/pid.hpp:26, > from /mesos/3rdparty/libprocess/src/pid.cpp:28: > 3rdparty/boost-1.53.0/boost/get_pointer.hpp:27:40: warning: ‘template > class std::auto_ptr’ is deprecated [-Wdeprecated-declarations] > template T * get_pointer(std::auto_ptr const& p) > ^ > In file included from /usr/include/c++/5/memory:81:0, > from > 3rdparty/boost-1.53.0/boost/functional/hash/extensions.hpp:32, > from > 3rdparty/boost-1.53.0/boost/functional/hash/hash.hpp:529, > from 3rdparty/boost-1.53.0/boost/functional/hash.hpp:6, > from /mesos/3rdparty/libprocess/include/process/pid.hpp:24, > from /mesos/3rdparty/libprocess/src/pid.cpp:28: > /usr/include/c++/5/bits/unique_ptr.h:49:28: note: declared here >template class auto_ptr; > ^ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3799) Compilation warning with Ubuntu wily: auto_ptr is deprecated
Neil Conway created MESOS-3799: -- Summary: Compilation warning with Ubuntu wily: auto_ptr is deprecated Key: MESOS-3799 URL: https://issues.apache.org/jira/browse/MESOS-3799 Project: Mesos Issue Type: Bug Reporter: Neil Conway Priority: Minor Variants of this message are printed many times during compilation (Wily on AMD64): {noformat} CXX libprocess_la-pid.lo CXX libprocess_la-poll_socket.lo CXX libprocess_la-profiler.lo In file included from /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp:23:0, from /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/stringify.hpp:26, from /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:59, from /mesos/3rdparty/libprocess/include/process/address.hpp:34, from /mesos/3rdparty/libprocess/include/process/pid.hpp:26, from /mesos/3rdparty/libprocess/src/pid.cpp:28: 3rdparty/boost-1.53.0/boost/get_pointer.hpp:27:40: warning: ‘template class std::auto_ptr’ is deprecated [-Wdeprecated-declarations] template T * get_pointer(std::auto_ptr const& p) ^ In file included from /usr/include/c++/5/memory:81:0, from 3rdparty/boost-1.53.0/boost/functional/hash/extensions.hpp:32, from 3rdparty/boost-1.53.0/boost/functional/hash/hash.hpp:529, from 3rdparty/boost-1.53.0/boost/functional/hash.hpp:6, from /mesos/3rdparty/libprocess/include/process/pid.hpp:24, from /mesos/3rdparty/libprocess/src/pid.cpp:28: /usr/include/c++/5/bits/unique_ptr.h:49:28: note: declared here template class auto_ptr; ^ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING
[ https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971791#comment-14971791 ] Anand Mazumdar commented on MESOS-3766: --- I can take this up. > Can not kill task in Status STAGING > --- > > Key: MESOS-3766 > URL: https://issues.apache.org/jira/browse/MESOS-3766 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.25.0 > Environment: OSX >Reporter: Matthias Veit >Assignee: Niklas Quarfot Nielsen > Attachments: master.log.zip, slave.log.zip > > > I have created a simple Marathon Application with instance count 100 (100 > tasks) with a simple sleep command. Before all tasks were running, I killed > all tasks. This operation was successful, except 2 tasks. These 2 tasks are > in state STAGING (according to the mesos UI). Marathon tries to kill those > tasks every 5 seconds (for over an hour now) - unsuccessfully. > I picked one task and grepped the slave log: > {noformat} > I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour > I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 > I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container > '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr > I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing > executor's forked pid 37096 to > '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks > I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 > I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame > I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-26
[jira] [Assigned] (MESOS-3766) Can not kill task in Status STAGING
[ https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar reassigned MESOS-3766: - Assignee: Anand Mazumdar (was: Niklas Quarfot Nielsen) > Can not kill task in Status STAGING > --- > > Key: MESOS-3766 > URL: https://issues.apache.org/jira/browse/MESOS-3766 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.25.0 > Environment: OSX >Reporter: Matthias Veit >Assignee: Anand Mazumdar > Attachments: master.log.zip, slave.log.zip > > > I have created a simple Marathon Application with instance count 100 (100 > tasks) with a simple sleep command. Before all tasks were running, I killed > all tasks. This operation was successful, except 2 tasks. These 2 tasks are > in state STAGING (according to the mesos UI). Marathon tries to kill those > tasks every 5 seconds (for over an hour now) - unsuccessfully. > I picked one task and grepped the slave log: > {noformat} > I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour > I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 > I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container > '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr > I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing > executor's forked pid 37096 to > '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks > I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 > I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame > I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I
[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING
[ https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971789#comment-14971789 ] Niklas Quarfot Nielsen commented on MESOS-3766: --- Thanks [~anandmazumdar]! [~matth...@mesosphere.io] - I haven't been able to repro yet. How many slaves where you running? Is it mesos-local? Can you repro easily (and maybe enable verbose logging)? [~anandmazumdar] - do you have time to take this one on? > Can not kill task in Status STAGING > --- > > Key: MESOS-3766 > URL: https://issues.apache.org/jira/browse/MESOS-3766 > Project: Mesos > Issue Type: Bug > Components: general >Affects Versions: 0.25.0 > Environment: OSX >Reporter: Matthias Veit >Assignee: Niklas Quarfot Nielsen > Attachments: master.log.zip, slave.log.zip > > > I have created a simple Marathon Application with instance count 100 (100 > tasks) with a simple sleep command. Before all tasks were running, I killed > all tasks. This operation was successful, except 2 tasks. These 2 tasks are > in state STAGING (according to the mesos UI). Marathon tries to kill those > tasks every 5 seconds (for over an hour now) - unsuccessfully. > I picked one task and grepped the slave log: > {noformat} > I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour > I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80 > I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container > '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr > I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing > executor's forked pid 37096 to > '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks > I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000 > I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor > 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame > I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task > app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework > 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- > I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task >
[jira] [Commented] (MESOS-191) Add support for multiple disk resources
[ https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971720#comment-14971720 ] David Greenberg commented on MESOS-191: --- Hey [~anindya.sinha], that proposal looks very similar to what we've been discussing. The key difference is that it also allows for isolated spindles to be used as scratch/GC-able storage, which could be advantageous for some ephemeral tasks that spill to disk, but also adds more complexity to the implmentation. I'm going to add that use case to the other doc; I think that it could become its own project once multiple disks are available. > Add support for multiple disk resources > --- > > Key: MESOS-191 > URL: https://issues.apache.org/jira/browse/MESOS-191 > Project: Mesos > Issue Type: Story >Reporter: Vinod Kone > Labels: mesosphere, persistent-volumes > > It would be nice to schedule mesos tasks with fine-grained disk scheduling. > The idea is, a slave with multiple spindles, would specify spindle specific > config. Mesos would then include this info in its resource offers to > frameworks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3798) io::write(fd, const string&) api writes junk sometimes
[ https://issues.apache.org/jira/browse/MESOS-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971718#comment-14971718 ] Jojy Varghese commented on MESOS-3798: -- Preliminary investigation shows that the junk characters are written after the os::nonblock(fd) in the write function. > io::write(fd, const string&) api writes junk sometimes > -- > > Key: MESOS-3798 > URL: https://issues.apache.org/jira/browse/MESOS-3798 > Project: Mesos > Issue Type: Bug > Components: libprocess > Environment: osx >Reporter: Jojy Varghese >Assignee: Jojy Varghese > > This was noticed during registry client test( please see MESOS-3773).A brief > summary : > 1. open a file with flags " O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC" and > mode"S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)", > 2. Call write(fd, string). > This causes junk to be written every once in a while to the beginning of the > file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3798) io::write(fd, const string&) api writes junk sometimes
Jojy Varghese created MESOS-3798: Summary: io::write(fd, const string&) api writes junk sometimes Key: MESOS-3798 URL: https://issues.apache.org/jira/browse/MESOS-3798 Project: Mesos Issue Type: Bug Components: libprocess Environment: osx Reporter: Jojy Varghese Assignee: Jojy Varghese This was noticed during registry client test( please see MESOS-3773).A brief summary : 1. open a file with flags " O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC" and mode "S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)", 2. Call write(fd, string). This causes junk to be written every once in a while to the beginning of the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow
[ https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-3775: --- Labels: mesosphere tech-debt (was: mesosphere) > MasterAllocatorTest.SlaveLost is slow > - > > Key: MESOS-3775 > URL: https://issues.apache.org/jira/browse/MESOS-3775 > Project: Mesos > Issue Type: Bug > Components: technical debt, test >Reporter: Alexander Rukletsov >Priority: Minor > Labels: mesosphere, tech-debt > > The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A > brief look into the code hints that the stopped agent does not quit > immediately (and hence its resources are not released by the allocator) > because [it waits for the executor to > terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717]. > {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant. > Possible solutions: > * Do not wait until the stopped agent quits (can be flaky, needs deeper > analysis). > * Decrease the agent's {{executor_shutdown_grace_period}} flag. > * Terminate the executor faster (this may require some refactoring since the > executor driver is created in the {{TestContainerizer}} and we do not have > direct access to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`
[ https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971413#comment-14971413 ] haosdent commented on MESOS-3506: - I test this in CentOS 6 docker image, seem don't contains default {noformat} [root@af15e2315ea4 /]# wget bash: wget: command not found [root@af15e2315ea4 /]# tar bash: tar: command not found [root@af15e2315ea4 /]# which bash: which: command not found [root@af15e2315ea4 /]# cat /etc/issue CentOS release 6.7 (Final) Kernel \r on an \m {noformat} > Build instructions for CentOS 6.6 should include `sudo yum update` > -- > > Key: MESOS-3506 > URL: https://issues.apache.org/jira/browse/MESOS-3506 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: documentation, mesosphere > > Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the > build to break when building {{mesos-0.25.0.jar}}. The build instructions for > this platform on the Getting Started page should be changed accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3480) Refactor Executor struct in Slave to handle HTTP based executors
[ https://issues.apache.org/jira/browse/MESOS-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971385#comment-14971385 ] Anand Mazumdar commented on MESOS-3480: --- {code} commit e1b0e125723dd6f144aa733961c490c1f0e1ef17 Author: Anand Mazumdar Date: Thu Oct 22 23:13:51 2015 -0700 Added HttpConnection to the Executor struct in the Agent. This lays an initial part of the groundwork needed to support executors using the HTTP API in the Agent. Review: https://reviews.apache.org/r/38874 {code} {code} commit 02c7d93ceefce19743b0e043ead62fb02a160dbd Author: Anand Mazumdar Date: Thu Oct 22 18:25:55 2015 -0700 Added output operator for Executor struct in agent. Review: https://reviews.apache.org/r/39569 {code} > Refactor Executor struct in Slave to handle HTTP based executors > > > Key: MESOS-3480 > URL: https://issues.apache.org/jira/browse/MESOS-3480 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere > Fix For: 0.26.0 > > > Currently, the {{struct Executor}} in slave only supports executors connected > via message passing (driver). We should refactor it to add support for HTTP > based Executors similar to what was done for the Scheduler API {{struct > Framework}} in {{src/master/master.hpp}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3786) Backticks are not mentioned in Mesos C++ Style Guide
[ https://issues.apache.org/jira/browse/MESOS-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971374#comment-14971374 ] Joseph Wu commented on MESOS-3786: -- This was definitely intentional for the maintenance comments. > Backticks are not mentioned in Mesos C++ Style Guide > > > Key: MESOS-3786 > URL: https://issues.apache.org/jira/browse/MESOS-3786 > Project: Mesos > Issue Type: Documentation >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Minor > Labels: documentation, mesosphere > > As far as I can tell, current practice is to quote code excerpts and object > names with backticks when writing comments. For example: > {code} > // You know, `sadPanda` seems extra sad lately. > std::string sadPanda; > sadPanda = " :'( "; > {code} > However, I don't see this documented in our C++ style guide at all. It should > be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3718) Implement Quota support in allocator
[ https://issues.apache.org/jira/browse/MESOS-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-3718: --- Sprint: Mesosphere Sprint 21 > Implement Quota support in allocator > > > Key: MESOS-3718 > URL: https://issues.apache.org/jira/browse/MESOS-3718 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > The built-in Hierarchical DRF allocator should support Quota. This includes > (but not limited to): adding, updating, removing and satisfying quota; > avoiding both overcomitting resources and handing them to non-quota'ed roles > in presence of master failover. > A [design doc for Quota support in > Allocator|https://issues.apache.org/jira/browse/MESOS-2937] provides an > overview of a feature set required to be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3716) Update Allocator interface to support quota
[ https://issues.apache.org/jira/browse/MESOS-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971331#comment-14971331 ] Alexander Rukletsov commented on MESOS-3716: https://reviews.apache.org/r/38218/ > Update Allocator interface to support quota > --- > > Key: MESOS-3716 > URL: https://issues.apache.org/jira/browse/MESOS-3716 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > An allocator should be notified when a quota is being set/updated or removed. > Also to support master failover in presence of quota, allocator should be > notified about the reregistering agents and allocations towards quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3164) Introduce QuotaInfo message
[ https://issues.apache.org/jira/browse/MESOS-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971329#comment-14971329 ] Alexander Rukletsov commented on MESOS-3164: https://reviews.apache.org/r/39317/ > Introduce QuotaInfo message > --- > > Key: MESOS-3164 > URL: https://issues.apache.org/jira/browse/MESOS-3164 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Alexander Rukletsov >Assignee: Joerg Schad > Labels: mesosphere > > A {{QuotaInfo}} protobuf message is internal representation for quota related > information (e.g. for persisting quota). The protobuf message should be > extendable for future needs and allows for easy aggregation across roles and > operator principals. It may also be used to pass quota information to > allocators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3716) Update Allocator interface to support quota
[ https://issues.apache.org/jira/browse/MESOS-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-3716: --- Sprint: Mesosphere Sprint 21 > Update Allocator interface to support quota > --- > > Key: MESOS-3716 > URL: https://issues.apache.org/jira/browse/MESOS-3716 > Project: Mesos > Issue Type: Bug > Components: allocation >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > An allocator should be notified when a quota is being set/updated or removed. > Also to support master failover in presence of quota, allocator should be > notified about the reregistering agents and allocations towards quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971282#comment-14971282 ] Alexander Rukletsov commented on MESOS-3765: My answer is: it depends. I would like to give an operator the ability to choose what is better for their cluster. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3574) Support replacing ZooKeeper with replicated log
[ https://issues.apache.org/jira/browse/MESOS-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971266#comment-14971266 ] Yong Tang commented on MESOS-3574: -- Created MESOS-3797 to capture implementation of replacing Zookeeper with Consul. In the short term a lot of users would like to remove the dependency of Zookeeper (by either replace it with etcd, or with Consul). > Support replacing ZooKeeper with replicated log > --- > > Key: MESOS-3574 > URL: https://issues.apache.org/jira/browse/MESOS-3574 > Project: Mesos > Issue Type: Improvement > Components: leader election, replicated log >Reporter: Neil Conway > Labels: mesosphere > > It would be useful to support using the replicated log without also requiring > ZooKeeper to be running. This would simplify the process of > configuring/operating a high-availability configuration of Mesos. > At least three things would need to be done: > 1. Abstract away the stuff we use Zk for into an interface that can be > implemented (e.g., by etcd, consul, rep-log, or Zk). This might be done > already as part of [MESOS-1806] > 2. Enhance the replicated log to be able to do its own leader election + > failure detection (to decide when the current master is down). > 3. Validate replicated log performance to ensure it is adequate (per Joris, > likely needs some significant work) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3797) Support replacing Zookeeper with Consul
[ https://issues.apache.org/jira/browse/MESOS-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971258#comment-14971258 ] Yong Tang commented on MESOS-3797: -- Replacing Mesos with Consul could be part of MESOS-3574, and a short term solution before the implementation of (no-dependency) leader election in Mesos. > Support replacing Zookeeper with Consul > --- > > Key: MESOS-3797 > URL: https://issues.apache.org/jira/browse/MESOS-3797 > Project: Mesos > Issue Type: Improvement > Components: leader election >Reporter: Yong Tang > > Currently Mesos only support Zookeeper for leader election. While Zookeeper > has been widely used it is not actively developed and the configuration of > Zookeeper is often cumbersome or difficult. > There is already an ongoing MESOS-1806 which would replace Zookeeper with > etcd. It would be great if Mesos could support replacing Zookeeper with > Consul for its ease of deployment. > While MESOS-3574 proposed Mesos to do its own leader election and failure > detection, replacing Zookeeper with Consul as a short term solution will > really benefit a lot of existing Mesos users that want to avoid the > dependency of Zookeeper deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3797) Support replacing Zookeeper with Consul
[ https://issues.apache.org/jira/browse/MESOS-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971248#comment-14971248 ] Yong Tang commented on MESOS-3797: -- The implementation of replacing Zookeeper with etcd could help the implementation of replacing Zookeeper with Consul. > Support replacing Zookeeper with Consul > --- > > Key: MESOS-3797 > URL: https://issues.apache.org/jira/browse/MESOS-3797 > Project: Mesos > Issue Type: Improvement > Components: leader election >Reporter: Yong Tang > > Currently Mesos only support Zookeeper for leader election. While Zookeeper > has been widely used it is not actively developed and the configuration of > Zookeeper is often cumbersome or difficult. > There is already an ongoing MESOS-1806 which would replace Zookeeper with > etcd. It would be great if Mesos could support replacing Zookeeper with > Consul for its ease of deployment. > While MESOS-3574 proposed Mesos to do its own leader election and failure > detection, replacing Zookeeper with Consul as a short term solution will > really benefit a lot of existing Mesos users that want to avoid the > dependency of Zookeeper deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3797) Support replacing Zookeeper with Consul
Yong Tang created MESOS-3797: Summary: Support replacing Zookeeper with Consul Key: MESOS-3797 URL: https://issues.apache.org/jira/browse/MESOS-3797 Project: Mesos Issue Type: Improvement Components: leader election Reporter: Yong Tang Currently Mesos only support Zookeeper for leader election. While Zookeeper has been widely used it is not actively developed and the configuration of Zookeeper is often cumbersome or difficult. There is already an ongoing MESOS-1806 which would replace Zookeeper with etcd. It would be great if Mesos could support replacing Zookeeper with Consul for its ease of deployment. While MESOS-3574 proposed Mesos to do its own leader election and failure detection, replacing Zookeeper with Consul as a short term solution will really benefit a lot of existing Mesos users that want to avoid the dependency of Zookeeper deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky
[ https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-3773: --- Story Points: 3 > RegistryClientTest.SimpleGetBlob is flaky > - > > Key: MESOS-3773 > URL: https://issues.apache.org/jira/browse/MESOS-3773 > Project: Mesos > Issue Type: Bug > Components: test >Reporter: Joseph Wu >Assignee: Jojy Varghese > Labels: mesosphere > > {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times. This was > encountered on OSX. > {code:title=Repro} > bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" > --gtest_repeat=10 --gtest_break_on_failure > {code} > {code:title=Example Failure} > [ RUN ] RegistryClientTest.SimpleGetBlob > ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure > Value of: blobResponse > Actual: "2015-10-20 20:58:59.579393024+00:00" > Expected: blob.get() > Which is: > "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8 > \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 > 20:58:59.579393024+00:00" > *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are > using GNU date *** > PC: @0x103144ddc testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: *** > @ 0x7fff8c58af1a _sigtramp > @ 0x7fff8386e187 malloc > @0x1031445b7 testing::internal::AssertHelper::operator=() > @0x1030d32e0 > mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() > @0x1030d3562 > mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody() > @0x1031ac8f3 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x103192f87 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1031533f5 testing::Test::Run() > @0x10315493b testing::TestInfo::Run() > @0x1031555f7 testing::TestCase::Run() > @0x103163df3 testing::internal::UnitTestImpl::RunAllTests() > @0x1031af8c3 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @0x103195397 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1031639f2 testing::UnitTest::Run() > @0x1025abd41 RUN_ALL_TESTS() > @0x1025a8089 main > @ 0x7fff86b155c9 start > {code} > {code:title=Less common failure} > [ RUN ] RegistryClientTest.SimpleGetBlob > ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure > (socket).failure(): Failed accept: connection error: > error::lib(0):func(0):reason(0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with --executor_environmnent_variables
[ https://issues.apache.org/jira/browse/MESOS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-3751: --- Shepherd: Timothy Chen Story Points: 2 > MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with > --executor_environmnent_variables > --- > > Key: MESOS-3751 > URL: https://issues.apache.org/jira/browse/MESOS-3751 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 0.24.1, 0.25.0 >Reporter: Cody Maloney >Assignee: Gilbert Song > Labels: mesosphere, newbie > > When using --executor_environment_variables, and having > MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos > containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself. > Relevant code: > https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281 > It sees that the variable is in the mesos-slave's environment (os::getenv), > rather than checking if it is set in the environment variable set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3796) Mesos Master and Agent http api should support configurable CORS headers
[ https://issues.apache.org/jira/browse/MESOS-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Schroeder updated MESOS-3796: - Description: There are several places where it would be useful to access the mesos master api (http port 5050) or agent api (http port 5051) via a javascript client. This is inhibited by the fact that the http headers such as Access-Control-Allow-Origin are not passed at all by mesos. A stop-gap is to write a small proxy which passes requests to/from mesos while adding the header, but that is suboptimal for several reasons. Allowing the option to configure said headers, or just to enable them explicitly would be very useful. Some projects such as mesos-ui[1] have an issue open about this[2]. [1] http://capgemini.github.io/devops/mesos-ui/ [2] https://github.com/Capgemini/mesos-ui/issues/57 was: There are several places where it would be useful to access the mesos master api (http port 5050) or agent api (http port 5051) via a javascript client. This is inhibited by the fact that the http headers such as Access-Control-Allow-Origin are not passed at all by mesos. Allowing the option to configure said headers, or just to enable them explicitly would be very useful. Some projects such as mesos-ui[1] have an issue open about this[2]. [1] http://capgemini.github.io/devops/mesos-ui/ [2] https://github.com/Capgemini/mesos-ui/issues/57 > Mesos Master and Agent http api should support configurable CORS headers > > > Key: MESOS-3796 > URL: https://issues.apache.org/jira/browse/MESOS-3796 > Project: Mesos > Issue Type: Improvement > Components: HTTP API >Affects Versions: 0.25.0 >Reporter: Jeffrey Schroeder >Priority: Minor > > There are several places where it would be useful to access the mesos master > api (http port 5050) or agent api (http port 5051) via a javascript client. > This is inhibited by the fact that the http headers such as > Access-Control-Allow-Origin are not passed at all by mesos. A stop-gap is to > write a small proxy which passes requests to/from mesos while adding the > header, but that is suboptimal for several reasons. > Allowing the option to configure said headers, or just to enable them > explicitly would be very useful. Some projects such as mesos-ui[1] have an > issue open about this[2]. > [1] http://capgemini.github.io/devops/mesos-ui/ > [2] https://github.com/Capgemini/mesos-ui/issues/57 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3796) Mesos Master and Agent http api should support configurable CORS headers
Jeffrey Schroeder created MESOS-3796: Summary: Mesos Master and Agent http api should support configurable CORS headers Key: MESOS-3796 URL: https://issues.apache.org/jira/browse/MESOS-3796 Project: Mesos Issue Type: Improvement Components: HTTP API Affects Versions: 0.25.0 Reporter: Jeffrey Schroeder Priority: Minor There are several places where it would be useful to access the mesos master api (http port 5050) or agent api (http port 5051) via a javascript client. This is inhibited by the fact that the http headers such as Access-Control-Allow-Origin are not passed at all by mesos. Allowing the option to configure said headers, or just to enable them explicitly would be very useful. Some projects such as mesos-ui[1] have an issue open about this[2]. [1] http://capgemini.github.io/devops/mesos-ui/ [2] https://github.com/Capgemini/mesos-ui/issues/57 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine
[ https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971183#comment-14971183 ] haosdent commented on MESOS-3793: - Please add --launcher=posix or mount cgroup as rw when launch docker container. http://search-hadoop.com/m/0Vlr6zfCev1S7gRF1 > Cannot start mesos local on a Debian GNU/Linux 8 docker machine > --- > > Key: MESOS-3793 > URL: https://issues.apache.org/jira/browse/MESOS-3793 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: Debian GNU/Linux 8 docker machine >Reporter: Matthias Veit >Assignee: Jojy Varghese > Labels: mesosphere > > We updated the mesos version to 0.25.0 in our Marathon docker image, that > runs our integration tests. > We use mesos local for those tests. This fails with this message: > {noformat} > root@a06e4b4eb776:/marathon# mesos local > I1022 18:42:26.852485 136 leveldb.cpp:176] Opened db in 6.103258ms > I1022 18:42:26.853302 136 leveldb.cpp:183] Compacted db in 765740ns > I1022 18:42:26.853343 136 leveldb.cpp:198] Created db iterator in 9001ns > I1022 18:42:26.853355 136 leveldb.cpp:204] Seeked to beginning of db in > 1287ns > I1022 18:42:26.853366 136 leveldb.cpp:273] Iterated through 0 keys in the > db in ns > I1022 18:42:26.853406 136 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1022 18:42:26.853775 141 recover.cpp:449] Starting replica recovery > I1022 18:42:26.853862 141 recover.cpp:475] Replica is in EMPTY status > I1022 18:42:26.854751 138 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I1022 18:42:26.854856 140 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1022 18:42:26.855002 140 recover.cpp:566] Updating replica status to > STARTING > I1022 18:42:26.855655 138 master.cpp:376] Master > a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on > 172.17.0.14:5050 > I1022 18:42:26.855680 138 master.cpp:378] Flags at startup: > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_slave_ping_timeouts="5" --quiet="false" > --recovery_slave_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" > --registry_strict="false" --root_submissions="true" > --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" > --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs" > I1022 18:42:26.855790 138 master.cpp:425] Master allowing unauthenticated > frameworks to register > I1022 18:42:26.855803 138 master.cpp:430] Master allowing unauthenticated > slaves to register > I1022 18:42:26.855815 138 master.cpp:467] Using default 'crammd5' > authenticator > W1022 18:42:26.855829 138 authenticator.cpp:505] No credentials provided, > authentication requests will be refused > I1022 18:42:26.855840 138 authenticator.cpp:512] Initializing server SASL > I1022 18:42:26.856442 136 containerizer.cpp:143] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I1022 18:42:26.856943 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.888185ms > I1022 18:42:26.856987 140 replica.cpp:323] Persisted replica status to > STARTING > I1022 18:42:26.857115 140 recover.cpp:475] Replica is in STARTING status > I1022 18:42:26.857270 140 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I1022 18:42:26.857312 140 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1022 18:42:26.857368 140 recover.cpp:566] Updating replica status to VOTING > I1022 18:42:26.857781 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 371121ns > I1022 18:42:26.857841 140 replica.cpp:323] Persisted replica status to > VOTING > I1022 18:42:26.857895 140 recover.cpp:580] Successfully joined the Paxos > group > I1022 18:42:26.857928 140 recover.cpp:464] Recover process terminated > I1022 18:42:26.862455 137 master.cpp:1603] The newly elected leader is > master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8 > I1022 18:42:26.862498 137 master.cpp:1616] Elected as the leading master! > I1022 18:42:26.862511 137 master.cpp:1376] Recovering from registrar > I1022 18:42:26.862560 137 registrar.cpp:309] Recovering registrar > Failed to create a
[jira] [Commented] (MESOS-3583) Introduce sessions in HTTP Scheduler API Subscribed Responses
[ https://issues.apache.org/jira/browse/MESOS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971187#comment-14971187 ] Marco Massenzio commented on MESOS-3583: Following from our conversation, I don't think we should consider doing this. Adding session management introduces state that we will then need to manage in the even of failover; we already have failover management in Mesos and I don't really think that adding sessions would help any real-life use case. We discussed the issue of badly implemented frameworks, but then that's a problem that's best solved via better documentation and education of the community. > Introduce sessions in HTTP Scheduler API Subscribed Responses > - > > Key: MESOS-3583 > URL: https://issues.apache.org/jira/browse/MESOS-3583 > Project: Mesos > Issue Type: Task >Reporter: Anand Mazumdar > Labels: mesosphere, tech-debt > > Currently, the HTTP Scheduler API has no concept of Sessions aka > {{SessionID}} or a {{TokenID}}. This is useful in some failure scenarios. As > of now, if a framework fails over and then subscribes again with the same > {{FrameworkID}} with the {{force}} option set. The Mesos master would > subscribe it. > If the previous instance of the framework/scheduler tries to send a Call , > e.g. {{Call::KILL}} with the same previous {{FrameworkID}} set, it would be > still accepted by the master leading to erroneously killing a task. > This is possible because we do not have a way currently of distinguishing > connections. It used to work in the previous driver implementation due to the > master also performing a {{UPID}} check to verify if they matched and only > then allowing the call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3587) Framework failover when framework is 'active' does not trigger allocation.
[ https://issues.apache.org/jira/browse/MESOS-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio reassigned MESOS-3587: -- Assignee: (was: Marco Massenzio) > Framework failover when framework is 'active' does not trigger allocation. > -- > > Key: MESOS-3587 > URL: https://issues.apache.org/jira/browse/MESOS-3587 > Project: Mesos > Issue Type: Bug > Components: allocation, master >Reporter: Benjamin Mahler >Priority: Minor > Labels: mesosphere > > FWICT, this is just a consequence of some technical debt in the master code. > When an active framework fails over, we do not go through the > deactivation->activation code paths, and so: > (1) The framework's filters in the allocator remain after the failover. > (2) The failed over framework does not receive an immediate allocation (it > has to wait for the next allocation interval). > If the framework had disconnected first, then the failover goes through the > deactivation->activation code paths. > This also means that some tests take longer to run than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky
[ https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-3235: --- Shepherd: Till Toenshoff > FetcherCacheHttpTest.HttpCachedSerialized and > FetcherCacheHttpTest.HttpCachedConcurrent are flaky > - > > Key: MESOS-3235 > URL: https://issues.apache.org/jira/browse/MESOS-3235 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Joseph Wu >Assignee: Bernd Mathiske > Labels: mesosphere > > On OSX, {{make clean && make -j8 V=0 check}}: > {code} > [--] 3 tests from FetcherCacheHttpTest > [ RUN ] FetcherCacheHttpTest.HttpCachedSerialized > HTTP/1.1 200 OK > Date: Fri, 07 Aug 2015 17:23:05 GMT > Content-Length: 30 > I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 0 > Forked command at 54363 > sh -c './mesos-fetcher-test-cmd 0' > E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54363) > E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0 > E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave > 20150807-102305-139395082-52338-52313-S0 > E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Registered executor on 10.0.79.8 > Starting task 1 > Forked command at 54411 > sh -c './mesos-fetcher-test-cmd 1' > E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > Command exited with status 0 (pid: 54411) > E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: > Socket is not connected [57] > ../../src/tests/fetcher_cache_tests.cpp:860: Failure > Failed to wait 15secs for awaitFinished(task.get()) > *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are > using GNU date *** > [ FAILED ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms) > [ RUN ] FetcherCacheHttpTest.HttpCachedConcurrent > PC: @0x113723618 process::Owned<>::get() > *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: *** > @ 0x7fff8fcacf1a _sigtramp > @ 0x7f9bc3109710 (unknown) > @0x1136f07e2 mesos::internal::slave::Fetcher::fetch() > @0x113862f9d > mesos::internal::slave::MesosContainerizerProcess::fetch() > @0x1138f1b5d > _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_ > @0x1138f18cf > _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_ > @0x1143768cf std::__1::function<>::operator()() > @0x11435ca7f process::ProcessBase::visit() > @0x1143ed6fe process::DispatchEvent::visit() > @0x11271 process::ProcessBase::serve() > @0x114343b4e process::ProcessManager::resume() > @0x1143431ca process::internal::schedule() > @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_ > @ 0x7fff95090268 _pthread_body > @ 0x7fff950901e5 _pthread_start > @ 0x7fff9508e41d thread_start > Failed to synchronize with slave (it's probably exited) > make[3]: *** [check-local] Segmentation fault: 11 > make[2]: *** [check-am] Error 2 > make[1]: *** [check] Error 2 > make: *** [check-recursive] Error 1 > {code} > This was encountered just once out of 3+ {{make check}}s. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine
[ https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio reassigned MESOS-3793: -- Assignee: Jojy Varghese > Cannot start mesos local on a Debian GNU/Linux 8 docker machine > --- > > Key: MESOS-3793 > URL: https://issues.apache.org/jira/browse/MESOS-3793 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.25.0 > Environment: Debian GNU/Linux 8 docker machine >Reporter: Matthias Veit >Assignee: Jojy Varghese > Labels: mesosphere > > We updated the mesos version to 0.25.0 in our Marathon docker image, that > runs our integration tests. > We use mesos local for those tests. This fails with this message: > {noformat} > root@a06e4b4eb776:/marathon# mesos local > I1022 18:42:26.852485 136 leveldb.cpp:176] Opened db in 6.103258ms > I1022 18:42:26.853302 136 leveldb.cpp:183] Compacted db in 765740ns > I1022 18:42:26.853343 136 leveldb.cpp:198] Created db iterator in 9001ns > I1022 18:42:26.853355 136 leveldb.cpp:204] Seeked to beginning of db in > 1287ns > I1022 18:42:26.853366 136 leveldb.cpp:273] Iterated through 0 keys in the > db in ns > I1022 18:42:26.853406 136 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1022 18:42:26.853775 141 recover.cpp:449] Starting replica recovery > I1022 18:42:26.853862 141 recover.cpp:475] Replica is in EMPTY status > I1022 18:42:26.854751 138 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I1022 18:42:26.854856 140 recover.cpp:195] Received a recover response from > a replica in EMPTY status > I1022 18:42:26.855002 140 recover.cpp:566] Updating replica status to > STARTING > I1022 18:42:26.855655 138 master.cpp:376] Master > a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on > 172.17.0.14:5050 > I1022 18:42:26.855680 138 master.cpp:378] Flags at startup: > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_slave_ping_timeouts="5" --quiet="false" > --recovery_slave_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" > --registry_strict="false" --root_submissions="true" > --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" > --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs" > I1022 18:42:26.855790 138 master.cpp:425] Master allowing unauthenticated > frameworks to register > I1022 18:42:26.855803 138 master.cpp:430] Master allowing unauthenticated > slaves to register > I1022 18:42:26.855815 138 master.cpp:467] Using default 'crammd5' > authenticator > W1022 18:42:26.855829 138 authenticator.cpp:505] No credentials provided, > authentication requests will be refused > I1022 18:42:26.855840 138 authenticator.cpp:512] Initializing server SASL > I1022 18:42:26.856442 136 containerizer.cpp:143] Using isolation: > posix/cpu,posix/mem,filesystem/posix > I1022 18:42:26.856943 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 1.888185ms > I1022 18:42:26.856987 140 replica.cpp:323] Persisted replica status to > STARTING > I1022 18:42:26.857115 140 recover.cpp:475] Replica is in STARTING status > I1022 18:42:26.857270 140 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I1022 18:42:26.857312 140 recover.cpp:195] Received a recover response from > a replica in STARTING status > I1022 18:42:26.857368 140 recover.cpp:566] Updating replica status to VOTING > I1022 18:42:26.857781 140 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 371121ns > I1022 18:42:26.857841 140 replica.cpp:323] Persisted replica status to > VOTING > I1022 18:42:26.857895 140 recover.cpp:580] Successfully joined the Paxos > group > I1022 18:42:26.857928 140 recover.cpp:464] Recover process terminated > I1022 18:42:26.862455 137 master.cpp:1603] The newly elected leader is > master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8 > I1022 18:42:26.862498 137 master.cpp:1616] Elected as the leading master! > I1022 18:42:26.862511 137 master.cpp:1376] Recovering from registrar > I1022 18:42:26.862560 137 registrar.cpp:309] Recovering registrar > Failed to create a containerizer: Could not create MesosContainerizer: Failed > to create launcher: Failed to create Linux launcher: Failed to mount cgr
[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master
[ https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971089#comment-14971089 ] Alexander Rukletsov commented on MESOS-3338: {{.reserved()}} indeed includes dynamically reserved resources. Would you agree that having such non-trivial math for unused resources on an agent is not optimal? I would suggest we revisit the types of resources we store to simplify the math. How about [Total [Reserved] [Offered [Allocated [Used (Reserved may overlap with offered, allocated and used)? > Dynamic reservations are not counted as used resources in the master > > > Key: MESOS-3338 > URL: https://issues.apache.org/jira/browse/MESOS-3338 > Project: Mesos > Issue Type: Bug > Components: allocation, master >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, persistent-volumes > > Dynamically reserved resources should be considered used or allocated and > hence reflected in Mesos bookkeeping structures and {{state.json}}. > I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the > following section: > {code} > // Check that the Master counts the reservation as a used resource. > { > Future response = > process::http::get(master.get(), "state.json"); > AWAIT_READY(response); > Try parse = JSON::parse(response.get().body); > ASSERT_SOME(parse); > Result cpus = > parse.get().find("slaves[0].used_resources.cpus"); > ASSERT_SOME_EQ(JSON::Number(1), cpus); > } > {code} > and got > {noformat} > ../../../src/tests/reservation_tests.cpp:168: Failure > Value of: (cpus).get() > Actual: 0 > Expected: JSON::Number(1) > Which is: 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.
[ https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971038#comment-14971038 ] Benjamin Bannier commented on MESOS-3581: - RRs: - https://reviews.apache.org/r/39590/ - https://reviews.apache.org/r/39591/ - https://reviews.apache.org/r/39592/ > License headers show up all over doxygen documentation. > --- > > Key: MESOS-3581 > URL: https://issues.apache.org/jira/browse/MESOS-3581 > Project: Mesos > Issue Type: Documentation > Components: documentation >Affects Versions: 0.24.1 >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier >Priority: Minor > > Currently license headers are commented in something resembling Javadoc style, > {code} > /** > * Licensed ... > {code} > Since we use Javadoc-style comment blocks for doxygen documentation all > license headers appear in the generated documentation, potentially and likely > hiding the actual documentation. > Using {{/*}} to start the comment blocks would be enough to hide them from > doxygen, but would likely also result in a largish (though mostly > uninteresting) patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971016#comment-14971016 ] Klaus Ma edited comment on MESOS-3765 at 10/23/15 2:09 PM: --- Thanks for explaining current behaviour, it match my understanding :). According to the description of this ticket, we consider assigning all resources of slave to framework is unfair, right? So which case is fairness by re-using the example? Two framework {{f1}} and {{f2}}, one agent with only 1 CPU: 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight / sum of weight}}) 2. 1 to {{f1}} and 0 to {{f2}} 3. or others? IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to launch task, it need a way for allocator to adjust the fairness, so {{requestResources()}} will help. And another option is to allow wast resources: keep offering 0.5 to {{f1}} and 0.5 to {{f2}}. was (Author: klaus1982): Thanks for explaining current behaviour, it match my understanding :). According to the description of this ticket, we consider assigning all resources of slave to framework is unfair, right? So which case is fairness by re-using the example? Two framework {{f1}} and {{f2}}, one agent with only 1 CPU: 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight/ sum of weight}}) 2. 1 to {{f1}} and 0 to {{f2}} 3. or others? IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to launch task, it need a way for allocator to adjust the fairness, so {{requestResources()}} will help. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Klaus Ma updated MESOS-3765: Comment: was deleted (was: Thanks for explaining current behaviour, it match my understanding :). According to the description of this ticket, we consider assigning all resources of slave to framework is unfair, right? So which case is fairness by re-using the example? Two framework {{f1}} and {{f2}}, one agent with only 1 CPU: 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight/ sum of weight}}) 2. 1 to {{f1}} and 0 to {{f2}} 3. or others? IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to launch task, it need a way for allocator to adjust the fairness, so {{requestResources()}} will help.) > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971016#comment-14971016 ] Klaus Ma commented on MESOS-3765: - Thanks for explaining current behaviour, it match my understanding :). According to the description of this ticket, we consider assigning all resources of slave to framework is unfair, right? So which case is fairness by re-using the example? Two framework {{f1}} and {{f2}}, one agent with only 1 CPU: 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight/ sum of weight}}) 2. 1 to {{f1}} and 0 to {{f2}} 3. or others? IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to launch task, it need a way for allocator to adjust the fairness, so {{requestResources()}} will help. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971017#comment-14971017 ] Klaus Ma commented on MESOS-3765: - Thanks for explaining current behaviour, it match my understanding :). According to the description of this ticket, we consider assigning all resources of slave to framework is unfair, right? So which case is fairness by re-using the example? Two framework {{f1}} and {{f2}}, one agent with only 1 CPU: 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight/ sum of weight}}) 2. 1 to {{f1}} and 0 to {{f2}} 3. or others? IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to launch task, it need a way for allocator to adjust the fairness, so {{requestResources()}} will help. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.
[ https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970921#comment-14970921 ] Benjamin Bannier commented on MESOS-3581: - After soliciting feedback on the [mailing list|http://www.mail-archive.com/dev@mesos.apache.org/msg33488.html] there was some consensus that updating the source files was preferable over a workaround using e.g. {{INPUT_FILTER}}. Will propose a patch implementing changed license headers next. > License headers show up all over doxygen documentation. > --- > > Key: MESOS-3581 > URL: https://issues.apache.org/jira/browse/MESOS-3581 > Project: Mesos > Issue Type: Documentation > Components: documentation >Affects Versions: 0.24.1 >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier >Priority: Minor > > Currently license headers are commented in something resembling Javadoc style, > {code} > /** > * Licensed ... > {code} > Since we use Javadoc-style comment blocks for doxygen documentation all > license headers appear in the generated documentation, potentially and likely > hiding the actual documentation. > Using {{/*}} to start the comment blocks would be enough to hide them from > doxygen, but would likely also result in a largish (though mostly > uninteresting) patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master
[ https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970804#comment-14970804 ] Alexander Rukletsov commented on MESOS-3338: I'll have a look, thanks [~gyliu]! I doubt {{slave->totalResources.reserved()}} includes dynamic reservations, but I have to check it. > Dynamic reservations are not counted as used resources in the master > > > Key: MESOS-3338 > URL: https://issues.apache.org/jira/browse/MESOS-3338 > Project: Mesos > Issue Type: Bug > Components: allocation, master >Reporter: Alexander Rukletsov >Assignee: Guangya Liu >Priority: Minor > Labels: mesosphere, persistent-volumes > > Dynamically reserved resources should be considered used or allocated and > hence reflected in Mesos bookkeeping structures and {{state.json}}. > I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the > following section: > {code} > // Check that the Master counts the reservation as a used resource. > { > Future response = > process::http::get(master.get(), "state.json"); > AWAIT_READY(response); > Try parse = JSON::parse(response.get().body); > ASSERT_SOME(parse); > Result cpus = > parse.get().find("slaves[0].used_resources.cpus"); > ASSERT_SOME_EQ(JSON::Number(1), cpus); > } > {code} > and got > {noformat} > ../../../src/tests/reservation_tests.cpp:168: Failure > Value of: (cpus).get() > Actual: 0 > Expected: JSON::Number(1) > Which is: 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2936) Create a design document for Quota support in Master
[ https://issues.apache.org/jira/browse/MESOS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970798#comment-14970798 ] Alexander Rukletsov commented on MESOS-2936: We are in total agreement here. Right now we plan to reject quota request for non-existent roles. Also endpoints related to quota *should not* modify roles or role weights. Regarding dynamic roles: I'll definitely have a look, thanks for pinging me. > Create a design document for Quota support in Master > > > Key: MESOS-2936 > URL: https://issues.apache.org/jira/browse/MESOS-2936 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: mesosphere > > Create a design document for the Quota feature support in Mesos Master > (excluding allocator) to be shared with the Mesos community. > Design Doc: > https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)
[ https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970790#comment-14970790 ] Alexander Rukletsov commented on MESOS-3765: This should work, but changes the current behaviour a bit. Let me reuse your example to demonstrate how it works right now. Two frameworks {{f1}} and {{f2}}, one agent with only 1 CPU. Possible scenarios: * {{f1}} (or {{f2}}) receives 1 CPU, but accepts 0.5 CPU, effectively returning 0.5 CPU to the free pool. {{f2}} is offered 0.5 CPU. * {{f1}} (or {{f2}}) is greedy and accepts 1 CPU, rendering {{f2}} starve. If a framework is a good citizen, it will accept only as much resources, as it needs. At the same time, if a framework is good citizen, it won't lie about its granularity. Hence I don't really see how having frameworks report their preferable offer size can help to resolve allocation unfairness (which is described in the ticket). I do not argue having frameworks' preferable offer size allows for smarter allocation decisions and more efficient bin packing algorithms, but this is out of the scope of the ticket. My intuition is that *there is no way to improve fairness by giving frameworks more tuning mechanism*. Having said that, I think tuning mechanism like {{requestResources}} can be extremely useful. Also, I think it can be difficult to deduce default granularity in presence of multiple frameworks and varying agents. > Make offer size adjustable (granularity) > > > Key: MESOS-3765 > URL: https://issues.apache.org/jira/browse/MESOS-3765 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Alexander Rukletsov > > The built-in allocator performs "coarse-grained" allocation, meaning that it > always allocates the entire remaining agent resources to a single framework. > This may heavily impact allocation fairness in some cases, for example in > presence of numerous greedy frameworks and a small number of powerful agents. > A possible solution would be to allow operators explicitly specify > granularity via allocator flags. While this can be tricky for non-standard > resources, it's pretty straightforward for {{cpus}} and {{mem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-3792) flags.acls in /state.json response is not the flag value passed to Mesos master
[ https://issues.apache.org/jira/browse/MESOS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970753#comment-14970753 ] Jian Qiu edited comment on MESOS-3792 at 10/23/15 10:02 AM: many values (maybe all) in flags has the same issue, I tried --firewall_rules and query /state which return something as below {code} "firewall_rules":"disabled_endpoints { paths: \"\/files\/browse\" paths: \"\/slave(0)\/stats.json\"}" {code} I think the reason is that the value in flag returned by /state is stringified in protobuf format rather than json format? was (Author: qiujian): many values (maybe all) in flags has the same issue, I tried --firewall_rules and query /state which return something as below {code} "firewall_rules":"disabled_endpoints {\n paths: \"\/files\/browse\"\n paths: \"\/slave(0)\/stats.json\"\n}\n" {code} I think the reason is that the value in flag returned by /state is stringified in protobuf format rather than json format? > flags.acls in /state.json response is not the flag value passed to Mesos > master > --- > > Key: MESOS-3792 > URL: https://issues.apache.org/jira/browse/MESOS-3792 > Project: Mesos > Issue Type: Bug >Reporter: James Fisher > > Steps to reproduce: Start Mesos master with the `--acls` flag set to the > following value: > {code} > { "run_tasks": [ { "principals": { "values": ["foo", "bar"] }, "users": { > "values": ["alice"] } } ] } > {code} > Then make a request to {{http://mesosmaster:5050/state.json}} and extract the > value for key `flags.acls` from the JSON body of the response. > Expected behavior: the value is the same JSON string passed on the > command-line. > Actual behavior: the value is this string in some unknown syntax: > {code} > run_tasks { > principals { > values: "foo" > values: "bar" > } > users { > values: "alice" > } > } > {code} > I don't know what this is, but it's not an ACL expression according to the > documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3792) flags.acls in /state.json response is not the flag value passed to Mesos master
[ https://issues.apache.org/jira/browse/MESOS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970753#comment-14970753 ] Jian Qiu commented on MESOS-3792: - many values (maybe all) in flags has the same issue, I tried --firewall_rules and query /state which return something as below {code} "firewall_rules":"disabled_endpoints {\n paths: \"\/files\/browse\"\n paths: \"\/slave(0)\/stats.json\"\n}\n" {code} I think the reason is that the value in flag returned by /state is stringified in protobuf format rather than json format? > flags.acls in /state.json response is not the flag value passed to Mesos > master > --- > > Key: MESOS-3792 > URL: https://issues.apache.org/jira/browse/MESOS-3792 > Project: Mesos > Issue Type: Bug >Reporter: James Fisher > > Steps to reproduce: Start Mesos master with the `--acls` flag set to the > following value: > {code} > { "run_tasks": [ { "principals": { "values": ["foo", "bar"] }, "users": { > "values": ["alice"] } } ] } > {code} > Then make a request to {{http://mesosmaster:5050/state.json}} and extract the > value for key `flags.acls` from the JSON body of the response. > Expected behavior: the value is the same JSON string passed on the > command-line. > Actual behavior: the value is this string in some unknown syntax: > {code} > run_tasks { > principals { > values: "foo" > values: "bar" > } > users { > values: "alice" > } > } > {code} > I don't know what this is, but it's not an ACL expression according to the > documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3795) process::io::write takes parameter as void* which could be const
Benjamin Bannier created MESOS-3795: --- Summary: process::io::write takes parameter as void* which could be const Key: MESOS-3795 URL: https://issues.apache.org/jira/browse/MESOS-3795 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Benjamin Bannier In libprocess we have {code} Future write(int fd, void* data, size_t size); {code} which expects a non-{{const}} {{void*}} for its {{data}} parameter. Under the covers {{data}} appears to be handled as a {{const}} (like one would expect from the signature its inspiration {{::write}}). This function is not used too often, but since it expects a non-{{const}} value for {{data}} automatic conversions to {{void*}} from other pointer types are disabled; instead callers seem cast manually to {{void*}} -- often with C-style casts. We should sync this method's signature with that of {{::write}}. In addition to following the expected semantics of {{::write}}, having this work without casts with any pointer value {{data}} would make it easier to interface this with character literals, or raw data ptrs from STL containers (e.g. {{Container::data}}). It would probably also indirectly eliminate temptation to use C-casts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
[ https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970586#comment-14970586 ] Liqiang Lin commented on MESOS-3747: [~nnielsen] in {{paths.cpp:createExecutorDirectory}} which tests that depend on using user name which may not be existing on test machine? Can you point to me. My proposed fixes: 1) Return Error in {{paths.cpp:createExecutorDirectory}} when {{chown}} failed. 2) Validate whether the "CommandInfo.user" or "FrameworkInfo.user" is existing if if "--switch-user" flag is set to true. If validation failed, return user not existing as failed reason to framework. [~vinodkone] What's your suggestions? > HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string > - > > Key: MESOS-3747 > URL: https://issues.apache.org/jira/browse/MESOS-3747 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.0, 0.24.1, 0.25.0 >Reporter: Ben Whitehead >Assignee: Liqiang Lin >Priority: Blocker > > When using libmesos a framework can set its user to {{""}} (empty string) to > inherit the user the agent processes is running as, this behavior now results > in a {{TASK_FAILED}}. > Full messages and relevant agent logs below. > The error returned to the framework tells me nothing about the user not > existing on the agent host instead it tells me the container died due to OOM. > {code:title=FrameworkInfo} > call { > type: SUBSCRIBE > subscribe: { > frameworkInfo: { > user: "", > name: "testing" > } > } > } > {code} > {code:title=TaskInfo} > call { > framework_id { value: "20151015-125949-16777343-5050-20146-" }, > type: ACCEPT, > accept { > offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }], > operations { > type: LAUNCH, > launch { > task_infos [ > { > name: "task-1", > task_id: { value: "task-1" }, > agent_id: { value: > "20151015-125949-16777343-5050-20146-S0" }, > resources [ > { name: "cpus", type: SCALAR, scalar: { value: > 0.1 }, role: "*" }, > { name: "mem", type: SCALAR, scalar: { value: > 64.0 }, role: "*" }, > { name: "disk", type: SCALAR, scalar: { value: > 0.0 }, role: "*" }, > ], > command: { > environment { > variables [ > { name: "SLEEP_SECONDS" value: "15" } > ] > }, > value: "env | sort && sleep $SLEEP_SECONDS" > } > } > ] > } > } > } > } > {code} > {code:title=Update Status} > event: { > type: UPDATE, > update: { > status: { > task_id: { value: "task-1" }, > state: TASK_FAILED, > message: "Container destroyed while preparing isolators", > agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, > timestamp: 1.444939217401241E9, > executor_id: { value: "task-1" }, > source: SOURCE_AGENT, > reason: REASON_MEMORY_LIMIT, > uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" > } > } > } > {code} > {code:title=agent logs} > I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for > framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- > I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for > framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- > W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory > '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b': > Failed to get user information for '': Success > I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of > framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources > cpus(*):0.1; mem(*):32 in work directory > '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b' > I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for > executor task-1 of framework 'e4de5b96-41cc-4713-af44-7cffbdd63ba6- >