date:20151023

[jira] [Commented] (MESOS-3801) Flaky test on Ubuntu Wily: ReservationTest.DropReserveTooLarge

2015-10-23 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972254#comment-14972254
 ] 

Anand Mazumdar commented on MESOS-3801:
---

[~neilc] Can you add the verbose logs too ?

> Flaky test on Ubuntu Wily: ReservationTest.DropReserveTooLarge
> --
>
> Key: MESOS-3801
> URL: https://issues.apache.org/jira/browse/MESOS-3801
> Project: Mesos
>  Issue Type: Bug
> Environment: Linux vagrant-ubuntu-wily-64 4.2.0-16-generic #19-Ubuntu 
> SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Neil Conway
>Priority: Minor
>  Labels: flaky-test, mesosphere
>
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from ReservationTest
> [ RUN  ] ReservationTest.DropReserveTooLarge
> /mesos/src/tests/reservation_tests.cpp:449: Failure
> Failed to wait 15secs for offers
> /mesos/src/tests/reservation_tests.cpp:439: Failure
> Actual function call count doesn't match EXPECT_CALL(sched, 
> resourceOffers(&driver, _))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> /mesos/src/tests/reservation_tests.cpp:421: Failure
> Actual function call count doesn't match EXPECT_CALL(allocator, addSlave(_, 
> _, _, _, _))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] ReservationTest.DropReserveTooLarge (15302 ms)
> [--] 1 test from ReservationTest (15303 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (15308 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ReservationTest.DropReserveTooLarge
>  1 FAILED TEST
> {noformat}
> Repro'd via "mesos-tests --gtest_filter=ReservationTest.DropReserveTooLarge 
> --gtest_repeat=100". ~4 runs out of 100 resulted in the error. Note that test 
> runtime varied pretty widely: most test runs completed in < 500ms, but many 
> (1/3?) of runs took 5000ms or longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-23 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972219#comment-14972219
 ] 

Klaus Ma commented on MESOS-3765:
-

So the "granularity" is cluster level and access (CRUD) only by operator. In 
allocator, it will assign resources by granularity instead of all slave 
resources.

And I think the granularity should have min value, or the allocation will be 
slow :).

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart

2015-10-23 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972146#comment-14972146
 ] 

Vinod Kone commented on MESOS-1739:
---

Great to hear. Yea. Will be happy to shepherd.

> Allow slave reconfiguration on restart
> --
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Epic
>Reporter: Patrick Reilly
>Assignee: Greg Mann
>  Labels: external-volumes, mesosphere, myriad
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`

2015-10-23 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972123#comment-14972123
 ] 

Greg Mann edited comment on MESOS-3506 at 10/23/15 11:50 PM:
-

Thanks [~haosd...@gmail.com]! I was using a different CentOS6 image than usual, 
and it turns out it had some extra stuff installed by default. You're right, I 
confirmed on another bare image that those are not installed.


was (Author: greggomann):
Thanks [~haosdent]! I was using a different CentOS6 image than usual, and it 
turns out it had some extra stuff installed by default. You're right, I 
confirmed on another bare image that those are not installed.

> Build instructions for CentOS 6.6 should include `sudo yum update`
> --
>
> Key: MESOS-3506
> URL: https://issues.apache.org/jira/browse/MESOS-3506
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the 
> build to break when building {{mesos-0.25.0.jar}}. The build instructions for 
> this platform on the Getting Started page should be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3801) Flaky test on Ubuntu Wily: ReservationTest.DropReserveTooLarge

2015-10-23 Thread Neil Conway (JIRA)

Neil Conway created MESOS-3801:
--

 Summary: Flaky test on Ubuntu Wily: 
ReservationTest.DropReserveTooLarge
 Key: MESOS-3801
 URL: https://issues.apache.org/jira/browse/MESOS-3801
 Project: Mesos
  Issue Type: Bug
 Environment: Linux vagrant-ubuntu-wily-64 4.2.0-16-generic #19-Ubuntu 
SMP Thu Oct 8 15:35:06 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Neil Conway
Priority: Minor


{noformat}
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from ReservationTest
[ RUN  ] ReservationTest.DropReserveTooLarge
/mesos/src/tests/reservation_tests.cpp:449: Failure
Failed to wait 15secs for offers
/mesos/src/tests/reservation_tests.cpp:439: Failure
Actual function call count doesn't match EXPECT_CALL(sched, 
resourceOffers(&driver, _))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
/mesos/src/tests/reservation_tests.cpp:421: Failure
Actual function call count doesn't match EXPECT_CALL(allocator, addSlave(_, _, 
_, _, _))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
[  FAILED  ] ReservationTest.DropReserveTooLarge (15302 ms)
[--] 1 test from ReservationTest (15303 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (15308 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ReservationTest.DropReserveTooLarge

 1 FAILED TEST
{noformat}

Repro'd via "mesos-tests --gtest_filter=ReservationTest.DropReserveTooLarge 
--gtest_repeat=100". ~4 runs out of 100 resulted in the error. Note that test 
runtime varied pretty widely: most test runs completed in < 500ms, but many 
(1/3?) of runs took 5000ms or longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`

2015-10-23 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972124#comment-14972124
 ] 

Greg Mann commented on MESOS-3506:
--

Well it turns out that `sudo yum update -y nss` will take care of the issue (go 
figure). I'll adjust the review accordingly.

> Build instructions for CentOS 6.6 should include `sudo yum update`
> --
>
> Key: MESOS-3506
> URL: https://issues.apache.org/jira/browse/MESOS-3506
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the 
> build to break when building {{mesos-0.25.0.jar}}. The build instructions for 
> this platform on the Getting Started page should be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`

2015-10-23 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972123#comment-14972123
 ] 

Greg Mann commented on MESOS-3506:
--

Thanks [~haosdent]! I was using a different CentOS6 image than usual, and it 
turns out it had some extra stuff installed by default. You're right, I 
confirmed on another bare image that those are not installed.

> Build instructions for CentOS 6.6 should include `sudo yum update`
> --
>
> Key: MESOS-3506
> URL: https://issues.apache.org/jira/browse/MESOS-3506
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the 
> build to break when building {{mesos-0.25.0.jar}}. The build instructions for 
> this platform on the Getting Started page should be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart

2015-10-23 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14972043#comment-14972043
 ] 

Greg Mann commented on MESOS-1739:
--

I'd like to have a go at getting this thing pushed through. [~vinodkone], are 
you still interested in shepherding? I've read through the existing patch and 
reviews; I can try to come up with a solution to the repeated re-registration 
problem outlined above. Once I have an idea in mind, would you like me to 
explain my plan in a small design doc or just here via comments?

> Allow slave reconfiguration on restart
> --
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Epic
>Reporter: Patrick Reilly
>Assignee: Greg Mann
>  Labels: external-volumes, mesosphere, myriad
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-1739) Allow slave reconfiguration on restart

2015-10-23 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-1739:


Assignee: Greg Mann

> Allow slave reconfiguration on restart
> --
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Epic
>Reporter: Patrick Reilly
>Assignee: Greg Mann
>  Labels: external-volumes, mesosphere, myriad
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3800) Containerizer attempts to create Linux launcher by default

2015-10-23 Thread Artem Harutyunyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3800:
-
Labels: Mesosphere  (was: )

> Containerizer attempts to create Linux launcher by default 
> ---
>
> Key: MESOS-3800
> URL: https://issues.apache.org/jira/browse/MESOS-3800
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Artem Harutyunyan
>  Labels: Mesosphere
>
> Mesos containerizer attempts to create a Linux launcher by default without 
> verifying whether the necessary prerequisites (such as availability of 
> cgroups) are met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3800) Containerizer attempts to create Linux launcher by default

2015-10-23 Thread Artem Harutyunyan (JIRA)

Artem Harutyunyan created MESOS-3800:


 Summary: Containerizer attempts to create Linux launcher by 
default 
 Key: MESOS-3800
 URL: https://issues.apache.org/jira/browse/MESOS-3800
 Project: Mesos
  Issue Type: Bug
Reporter: Artem Harutyunyan
Assignee: Artem Harutyunyan


Mesos containerizer attempts to create a Linux launcher by default without 
verifying whether the necessary prerequisites (such as availability of cgroups) 
are met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3799) Compilation warning with Ubuntu wily: auto_ptr is deprecated

2015-10-23 Thread Neil Conway (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3799:
---
Labels: mesosphere  (was: )

> Compilation warning with Ubuntu wily: auto_ptr is deprecated
> 
>
> Key: MESOS-3799
> URL: https://issues.apache.org/jira/browse/MESOS-3799
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> Variants of this message are printed many times during compilation (Wily on 
> AMD64):
> {noformat}
>   CXX  libprocess_la-pid.lo
>   CXX  libprocess_la-poll_socket.lo
>   CXX  libprocess_la-profiler.lo
> In file included from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp:23:0,
>  from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/stringify.hpp:26,
>  from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:59,
>  from 
> /mesos/3rdparty/libprocess/include/process/address.hpp:34,
>  from /mesos/3rdparty/libprocess/include/process/pid.hpp:26,
>  from /mesos/3rdparty/libprocess/src/pid.cpp:28:
> 3rdparty/boost-1.53.0/boost/get_pointer.hpp:27:40: warning: ‘template 
> class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
>  template T * get_pointer(std::auto_ptr const& p)
> ^
> In file included from /usr/include/c++/5/memory:81:0,
>  from 
> 3rdparty/boost-1.53.0/boost/functional/hash/extensions.hpp:32,
>  from 
> 3rdparty/boost-1.53.0/boost/functional/hash/hash.hpp:529,
>  from 3rdparty/boost-1.53.0/boost/functional/hash.hpp:6,
>  from /mesos/3rdparty/libprocess/include/process/pid.hpp:24,
>  from /mesos/3rdparty/libprocess/src/pid.cpp:28:
> /usr/include/c++/5/bits/unique_ptr.h:49:28: note: declared here
>template class auto_ptr;
> ^
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3799) Compilation warning with Ubuntu wily: auto_ptr is deprecated

2015-10-23 Thread Neil Conway (JIRA)

Neil Conway created MESOS-3799:
--

 Summary: Compilation warning with Ubuntu wily: auto_ptr is 
deprecated
 Key: MESOS-3799
 URL: https://issues.apache.org/jira/browse/MESOS-3799
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway
Priority: Minor


Variants of this message are printed many times during compilation (Wily on 
AMD64):

{noformat}
  CXX  libprocess_la-pid.lo
  CXX  libprocess_la-poll_socket.lo
  CXX  libprocess_la-profiler.lo
In file included from 
/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp:23:0,
 from 
/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/stringify.hpp:26,
 from 
/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:59,
 from /mesos/3rdparty/libprocess/include/process/address.hpp:34,
 from /mesos/3rdparty/libprocess/include/process/pid.hpp:26,
 from /mesos/3rdparty/libprocess/src/pid.cpp:28:
3rdparty/boost-1.53.0/boost/get_pointer.hpp:27:40: warning: ‘template 
class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
 template T * get_pointer(std::auto_ptr const& p)
^
In file included from /usr/include/c++/5/memory:81:0,
 from 
3rdparty/boost-1.53.0/boost/functional/hash/extensions.hpp:32,
 from 3rdparty/boost-1.53.0/boost/functional/hash/hash.hpp:529,
 from 3rdparty/boost-1.53.0/boost/functional/hash.hpp:6,
 from /mesos/3rdparty/libprocess/include/process/pid.hpp:24,
 from /mesos/3rdparty/libprocess/src/pid.cpp:28:
/usr/include/c++/5/bits/unique_ptr.h:49:28: note: declared here
   template class auto_ptr;
^
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING

2015-10-23 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971791#comment-14971791
 ] 

Anand Mazumdar commented on MESOS-3766:
---

I can take this up.

> Can not kill task in Status STAGING
> ---
>
> Key: MESOS-3766
> URL: https://issues.apache.org/jira/browse/MESOS-3766
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0
> Environment: OSX 
>Reporter: Matthias Veit
>Assignee: Niklas Quarfot Nielsen
> Attachments: master.log.zip, slave.log.zip
>
>
> I have created a simple Marathon Application with instance count 100 (100 
> tasks) with a simple sleep command. Before all tasks were running, I killed 
> all tasks. This operation was successful, except 2 tasks. These 2 tasks are 
> in state STAGING (according to the mesos UI). Marathon tries to kill those 
> tasks every 5 seconds (for over an hour now) - unsuccessfully.
> I picked one task and grepped the slave log:
> {noformat}
> I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
> I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
> I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
> '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
> I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing 
> executor's forked pid 37096 to 
> '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
> I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
> I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
> I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-26

[jira] [Assigned] (MESOS-3766) Can not kill task in Status STAGING

2015-10-23 Thread Anand Mazumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-3766:
-

Assignee: Anand Mazumdar  (was: Niklas Quarfot Nielsen)

> Can not kill task in Status STAGING
> ---
>
> Key: MESOS-3766
> URL: https://issues.apache.org/jira/browse/MESOS-3766
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0
> Environment: OSX 
>Reporter: Matthias Veit
>Assignee: Anand Mazumdar
> Attachments: master.log.zip, slave.log.zip
>
>
> I have created a simple Marathon Application with instance count 100 (100 
> tasks) with a simple sleep command. Before all tasks were running, I killed 
> all tasks. This operation was successful, except 2 tasks. These 2 tasks are 
> in state STAGING (according to the mesos UI). Marathon tries to kill those 
> tasks every 5 seconds (for over an hour now) - unsuccessfully.
> I picked one task and grepped the slave log:
> {noformat}
> I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
> I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
> I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
> '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
> I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing 
> executor's forked pid 37096 to 
> '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
> I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
> I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
> I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I

[jira] [Commented] (MESOS-3766) Can not kill task in Status STAGING

2015-10-23 Thread Niklas Quarfot Nielsen (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971789#comment-14971789
 ] 

Niklas Quarfot Nielsen commented on MESOS-3766:
---

Thanks [~anandmazumdar]!

[~matth...@mesosphere.io] - I haven't been able to repro yet. How many slaves 
where you running? Is it mesos-local? Can you repro easily (and maybe enable 
verbose logging)?

[~anandmazumdar] - do you have time to take this one on?

> Can not kill task in Status STAGING
> ---
>
> Key: MESOS-3766
> URL: https://issues.apache.org/jira/browse/MESOS-3766
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.25.0
> Environment: OSX 
>Reporter: Matthias Veit
>Assignee: Niklas Quarfot Nielsen
> Attachments: master.log.zip, slave.log.zip
>
>
> I have created a simple Marathon Application with instance count 100 (100 
> tasks) with a simple sleep command. Before all tasks were running, I killed 
> all tasks. This operation was successful, except 2 tasks. These 2 tasks are 
> in state STAGING (according to the mesos UI). Marathon tries to kill those 
> tasks every 5 seconds (for over an hour now) - unsuccessfully.
> I picked one task and grepped the slave log:
> {noformat}
> I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8- with resour
> I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
> I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container 
> '5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
> I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing 
> executor's forked pid 37096 to 
> '/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
> I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
> I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor 
> 'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
> I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task 
> app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework 
> 80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-
> I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task 
>

[jira] [Commented] (MESOS-191) Add support for multiple disk resources

2015-10-23 Thread David Greenberg (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971720#comment-14971720
 ] 

David Greenberg commented on MESOS-191:
---

Hey [~anindya.sinha], that proposal looks very similar to what we've been 
discussing. The key difference is that it also allows for isolated spindles to 
be used as scratch/GC-able storage, which could be advantageous for some 
ephemeral tasks that spill to disk, but also adds more complexity to the 
implmentation. I'm going to add that use case to the other doc; I think that it 
could become its own project once multiple disks are available.

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3798) io::write(fd, const string&) api writes junk sometimes

2015-10-23 Thread Jojy Varghese (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971718#comment-14971718
 ] 

Jojy Varghese commented on MESOS-3798:
--

Preliminary investigation shows that the junk characters are written after the  
os::nonblock(fd) in the write function.

> io::write(fd, const string&) api writes junk sometimes
> --
>
> Key: MESOS-3798
> URL: https://issues.apache.org/jira/browse/MESOS-3798
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
> Environment: osx
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>
> This was noticed during registry client test( please see  MESOS-3773).A brief 
> summary :
> 1. open a file with flags "  O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC" and 
> mode"S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)", 
> 2. Call write(fd, string).
> This causes junk to be written every once in a while to the beginning of the 
> file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3798) io::write(fd, const string&) api writes junk sometimes

2015-10-23 Thread Jojy Varghese (JIRA)

Jojy Varghese created MESOS-3798:


 Summary: io::write(fd, const string&) api writes junk sometimes
 Key: MESOS-3798
 URL: https://issues.apache.org/jira/browse/MESOS-3798
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
 Environment: osx
Reporter: Jojy Varghese
Assignee: Jojy Varghese


This was noticed during registry client test( please see  MESOS-3773).A brief 
summary :

1. open a file with flags "  O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC" and mode 
   "S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)", 

2. Call write(fd, string).

This causes junk to be written every once in a while to the beginning of the 
file.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-10-23 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3775:
---
Labels: mesosphere tech-debt  (was: mesosphere)

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere, tech-debt
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3506) Build instructions for CentOS 6.6 should include `sudo yum update`

2015-10-23 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971413#comment-14971413
 ] 

haosdent commented on MESOS-3506:
-

I test this in CentOS 6 docker image, seem don't contains default
{noformat}
[root@af15e2315ea4 /]# wget
bash: wget: command not found
[root@af15e2315ea4 /]# tar
bash: tar: command not found
[root@af15e2315ea4 /]# which
bash: which: command not found
[root@af15e2315ea4 /]# cat /etc/issue
CentOS release 6.7 (Final)
Kernel \r on an \m
{noformat}

> Build instructions for CentOS 6.6 should include `sudo yum update`
> --
>
> Key: MESOS-3506
> URL: https://issues.apache.org/jira/browse/MESOS-3506
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: documentation, mesosphere
>
> Neglecting to run {{sudo yum update}} on CentOS 6.6 currently causes the 
> build to break when building {{mesos-0.25.0.jar}}. The build instructions for 
> this platform on the Getting Started page should be changed accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3480) Refactor Executor struct in Slave to handle HTTP based executors

2015-10-23 Thread Anand Mazumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971385#comment-14971385
 ] 

Anand Mazumdar commented on MESOS-3480:
---

{code}
commit e1b0e125723dd6f144aa733961c490c1f0e1ef17
Author: Anand Mazumdar 
Date:   Thu Oct 22 23:13:51 2015 -0700

Added HttpConnection to the Executor struct in the Agent.

This lays an initial part of the groundwork needed to
support executors using the HTTP API in the Agent.

Review: https://reviews.apache.org/r/38874

{code}

{code}
commit 02c7d93ceefce19743b0e043ead62fb02a160dbd
Author: Anand Mazumdar 
Date:   Thu Oct 22 18:25:55 2015 -0700

Added output operator for Executor struct in agent.

Review: https://reviews.apache.org/r/39569
{code}

> Refactor Executor struct in Slave to handle HTTP based executors
> 
>
> Key: MESOS-3480
> URL: https://issues.apache.org/jira/browse/MESOS-3480
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> Currently, the {{struct Executor}} in slave only supports executors connected 
> via message passing (driver). We should refactor it to add support for HTTP 
> based Executors similar to what was done for the Scheduler API {{struct 
> Framework}} in {{src/master/master.hpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3786) Backticks are not mentioned in Mesos C++ Style Guide

2015-10-23 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971374#comment-14971374
 ] 

Joseph Wu commented on MESOS-3786:
--

This was definitely intentional for the maintenance comments.

> Backticks are not mentioned in Mesos C++ Style Guide
> 
>
> Key: MESOS-3786
> URL: https://issues.apache.org/jira/browse/MESOS-3786
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Minor
>  Labels: documentation, mesosphere
>
> As far as I can tell, current practice is to quote code excerpts and object 
> names with backticks when writing comments. For example:
> {code}
> // You know, `sadPanda` seems extra sad lately.
> std::string sadPanda;
> sadPanda = "   :'(   ";
> {code}
> However, I don't see this documented in our C++ style guide at all. It should 
> be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3718) Implement Quota support in allocator

2015-10-23 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3718:
---
Sprint: Mesosphere Sprint 21

> Implement Quota support in allocator
> 
>
> Key: MESOS-3718
> URL: https://issues.apache.org/jira/browse/MESOS-3718
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> The built-in Hierarchical DRF allocator should support Quota. This includes 
> (but not limited to): adding, updating, removing and satisfying quota; 
> avoiding both overcomitting resources and handing them to non-quota'ed roles 
> in presence of master failover.
> A [design doc for Quota support in 
> Allocator|https://issues.apache.org/jira/browse/MESOS-2937] provides an 
> overview of a feature set required to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3716) Update Allocator interface to support quota

2015-10-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971331#comment-14971331
 ] 

Alexander Rukletsov commented on MESOS-3716:


https://reviews.apache.org/r/38218/

> Update Allocator interface to support quota
> ---
>
> Key: MESOS-3716
> URL: https://issues.apache.org/jira/browse/MESOS-3716
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> An allocator should be notified when a quota is being set/updated or removed. 
> Also to support master failover in presence of quota, allocator should be 
> notified about the reregistering agents and allocations towards quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3164) Introduce QuotaInfo message

2015-10-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971329#comment-14971329
 ] 

Alexander Rukletsov commented on MESOS-3164:


https://reviews.apache.org/r/39317/

> Introduce QuotaInfo message
> ---
>
> Key: MESOS-3164
> URL: https://issues.apache.org/jira/browse/MESOS-3164
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Alexander Rukletsov
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> A {{QuotaInfo}} protobuf message is internal representation for quota related 
> information (e.g. for persisting quota). The protobuf message should be 
> extendable for future needs and allows for easy aggregation across roles and 
> operator principals. It may also be used to pass quota information to 
> allocators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3716) Update Allocator interface to support quota

2015-10-23 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3716:
---
Sprint: Mesosphere Sprint 21

> Update Allocator interface to support quota
> ---
>
> Key: MESOS-3716
> URL: https://issues.apache.org/jira/browse/MESOS-3716
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> An allocator should be notified when a quota is being set/updated or removed. 
> Also to support master failover in presence of quota, allocator should be 
> notified about the reregistering agents and allocations towards quota.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971282#comment-14971282
 ] 

Alexander Rukletsov commented on MESOS-3765:


My answer is: it depends. I would like to give an operator the ability to 
choose what is better for their cluster.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3574) Support replacing ZooKeeper with replicated log

2015-10-23 Thread Yong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971266#comment-14971266
 ] 

Yong Tang commented on MESOS-3574:
--

Created MESOS-3797 to capture implementation of replacing Zookeeper with 
Consul. In the short term a lot of users would like to remove the dependency of 
Zookeeper (by either replace it with etcd, or with Consul).

> Support replacing ZooKeeper with replicated log
> ---
>
> Key: MESOS-3574
> URL: https://issues.apache.org/jira/browse/MESOS-3574
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election, replicated log
>Reporter: Neil Conway
>  Labels: mesosphere
>
> It would be useful to support using the replicated log without also requiring 
> ZooKeeper to be running. This would simplify the process of 
> configuring/operating a high-availability configuration of Mesos.
> At least three things would need to be done:
> 1. Abstract away the stuff we use Zk for into an interface that can be 
> implemented (e.g., by etcd, consul, rep-log, or Zk). This might be done 
> already as part of [MESOS-1806]
> 2. Enhance the replicated log to be able to do its own leader election + 
> failure detection (to decide when the current master is down).
> 3. Validate replicated log performance to ensure it is adequate (per Joris, 
> likely needs some significant work)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3797) Support replacing Zookeeper with Consul

2015-10-23 Thread Yong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971258#comment-14971258
 ] 

Yong Tang commented on MESOS-3797:
--

Replacing Mesos with Consul could be part of MESOS-3574, and a short term 
solution before the implementation of (no-dependency) leader election in Mesos.

> Support replacing Zookeeper with Consul
> ---
>
> Key: MESOS-3797
> URL: https://issues.apache.org/jira/browse/MESOS-3797
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election
>Reporter: Yong Tang
>
> Currently Mesos only support Zookeeper for leader election. While Zookeeper 
> has been widely used it is not actively developed and the configuration of 
> Zookeeper is often cumbersome or difficult.
> There is already an ongoing MESOS-1806 which would replace Zookeeper with 
> etcd. It would be great if Mesos could support replacing Zookeeper with 
> Consul for its ease of deployment.
> While MESOS-3574 proposed Mesos to do its own leader election and failure 
> detection, replacing Zookeeper with Consul as a short term solution will 
> really benefit a lot of existing Mesos users that want to avoid the 
> dependency of Zookeeper deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3797) Support replacing Zookeeper with Consul

2015-10-23 Thread Yong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971248#comment-14971248
 ] 

Yong Tang commented on MESOS-3797:
--

The implementation of replacing Zookeeper with etcd could help the 
implementation of replacing Zookeeper with Consul.

> Support replacing Zookeeper with Consul
> ---
>
> Key: MESOS-3797
> URL: https://issues.apache.org/jira/browse/MESOS-3797
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election
>Reporter: Yong Tang
>
> Currently Mesos only support Zookeeper for leader election. While Zookeeper 
> has been widely used it is not actively developed and the configuration of 
> Zookeeper is often cumbersome or difficult.
> There is already an ongoing MESOS-1806 which would replace Zookeeper with 
> etcd. It would be great if Mesos could support replacing Zookeeper with 
> Consul for its ease of deployment.
> While MESOS-3574 proposed Mesos to do its own leader election and failure 
> detection, replacing Zookeeper with Consul as a short term solution will 
> really benefit a lot of existing Mesos users that want to avoid the 
> dependency of Zookeeper deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3797) Support replacing Zookeeper with Consul

2015-10-23 Thread Yong Tang (JIRA)

Yong Tang created MESOS-3797:


 Summary: Support replacing Zookeeper with Consul
 Key: MESOS-3797
 URL: https://issues.apache.org/jira/browse/MESOS-3797
 Project: Mesos
  Issue Type: Improvement
  Components: leader election
Reporter: Yong Tang


Currently Mesos only support Zookeeper for leader election. While Zookeeper has 
been widely used it is not actively developed and the configuration of 
Zookeeper is often cumbersome or difficult.

There is already an ongoing MESOS-1806 which would replace Zookeeper with etcd. 
It would be great if Mesos could support replacing Zookeeper with Consul for 
its ease of deployment.

While MESOS-3574 proposed Mesos to do its own leader election and failure 
detection, replacing Zookeeper with Consul as a short term solution will really 
benefit a lot of existing Mesos users that want to avoid the dependency of 
Zookeeper deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3773) RegistryClientTest.SimpleGetBlob is flaky

2015-10-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3773:
---
Story Points: 3

> RegistryClientTest.SimpleGetBlob is flaky
> -
>
> Key: MESOS-3773
> URL: https://issues.apache.org/jira/browse/MESOS-3773
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Joseph Wu
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> {{RegistryClientTest.SimpleGetBlob}} fails about 1/5 times.  This was 
> encountered on OSX.
> {code:title=Repro}
> bin/mesos-tests.sh --gtest_filter="*RegistryClientTest.SimpleGetBlob*" 
> --gtest_repeat=10 --gtest_break_on_failure
> {code}
> {code:title=Example Failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:946: Failure
> Value of: blobResponse
>   Actual: "2015-10-20 20:58:59.579393024+00:00"
> Expected: blob.get()
> Which is: 
> "\x15\x3\x3\00(P~\xCA&\xC6<\x4\x16\xE\xB2\xFF\b1a\xB9Z{\xE0\x80\xDA`\xBCt\x5R\x81x6\xF8
>  \x8B{\xA8\xA9\x4\xAB\xB6" "E\xE6\xDE\xCF\xD9*\xCC!\xC2\x15" "2015-10-20 
> 20:58:59.579393024+00:00"
> *** Aborted at 1445374739 (unix time) try "date -d @1445374739" if you are 
> using GNU date ***
> PC: @0x103144ddc testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 49008 (TID 0x7fff73ca3300) stack trace: ***
> @ 0x7fff8c58af1a _sigtramp
> @ 0x7fff8386e187 malloc
> @0x1031445b7 testing::internal::AssertHelper::operator=()
> @0x1030d32e0 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1030d3562 
> mesos::internal::tests::RegistryClientTest_SimpleGetBlob_Test::TestBody()
> @0x1031ac8f3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103192f87 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031533f5 testing::Test::Run()
> @0x10315493b testing::TestInfo::Run()
> @0x1031555f7 testing::TestCase::Run()
> @0x103163df3 testing::internal::UnitTestImpl::RunAllTests()
> @0x1031af8c3 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x103195397 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1031639f2 testing::UnitTest::Run()
> @0x1025abd41 RUN_ALL_TESTS()
> @0x1025a8089 main
> @ 0x7fff86b155c9 start
> {code}
> {code:title=Less common failure}
> [ RUN  ] RegistryClientTest.SimpleGetBlob
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:926: Failure
> (socket).failure(): Failed accept: connection error: 
> error::lib(0):func(0):reason(0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with --executor_environmnent_variables

2015-10-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3751:
---
Shepherd: Timothy Chen
Story Points: 2

> MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with 
> --executor_environmnent_variables
> ---
>
> Key: MESOS-3751
> URL: https://issues.apache.org/jira/browse/MESOS-3751
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.24.1, 0.25.0
>Reporter: Cody Maloney
>Assignee: Gilbert Song
>  Labels: mesosphere, newbie
>
> When using --executor_environment_variables, and having 
> MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos 
> containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself.
> Relevant code: 
> https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281
> It sees that the variable is in the mesos-slave's environment (os::getenv), 
> rather than checking if it is set in the environment variable set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3796) Mesos Master and Agent http api should support configurable CORS headers

2015-10-23 Thread Jeffrey Schroeder (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Schroeder updated MESOS-3796:
-
Description: 
There are several places where it would be useful to access the mesos master 
api (http port 5050) or agent api (http port 5051) via a javascript client. 
This is inhibited by the fact that the http headers such as 
Access-Control-Allow-Origin are not passed at all by mesos. A stop-gap is to 
write a small proxy which passes requests to/from mesos while adding the 
header, but that is suboptimal for several reasons.

Allowing the option to configure said headers, or just to enable them 
explicitly would be very useful. Some projects such as mesos-ui[1] have an 
issue open about this[2].

[1] http://capgemini.github.io/devops/mesos-ui/
[2] https://github.com/Capgemini/mesos-ui/issues/57

  was:
There are several places where it would be useful to access the mesos master 
api (http port 5050) or agent api (http port 5051) via a javascript client. 
This is inhibited by the fact that the http headers such as 
Access-Control-Allow-Origin are not passed at all by mesos.

Allowing the option to configure said headers, or just to enable them 
explicitly would be very useful. Some projects such as mesos-ui[1] have an 
issue open about this[2].

[1] http://capgemini.github.io/devops/mesos-ui/
[2] https://github.com/Capgemini/mesos-ui/issues/57


> Mesos Master and Agent http api should support configurable CORS headers
> 
>
> Key: MESOS-3796
> URL: https://issues.apache.org/jira/browse/MESOS-3796
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Affects Versions: 0.25.0
>Reporter: Jeffrey Schroeder
>Priority: Minor
>
> There are several places where it would be useful to access the mesos master 
> api (http port 5050) or agent api (http port 5051) via a javascript client. 
> This is inhibited by the fact that the http headers such as 
> Access-Control-Allow-Origin are not passed at all by mesos. A stop-gap is to 
> write a small proxy which passes requests to/from mesos while adding the 
> header, but that is suboptimal for several reasons.
> Allowing the option to configure said headers, or just to enable them 
> explicitly would be very useful. Some projects such as mesos-ui[1] have an 
> issue open about this[2].
> [1] http://capgemini.github.io/devops/mesos-ui/
> [2] https://github.com/Capgemini/mesos-ui/issues/57



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3796) Mesos Master and Agent http api should support configurable CORS headers

2015-10-23 Thread Jeffrey Schroeder (JIRA)

Jeffrey Schroeder created MESOS-3796:


 Summary: Mesos Master and Agent http api should support 
configurable CORS headers
 Key: MESOS-3796
 URL: https://issues.apache.org/jira/browse/MESOS-3796
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Affects Versions: 0.25.0
Reporter: Jeffrey Schroeder
Priority: Minor


There are several places where it would be useful to access the mesos master 
api (http port 5050) or agent api (http port 5051) via a javascript client. 
This is inhibited by the fact that the http headers such as 
Access-Control-Allow-Origin are not passed at all by mesos.

Allowing the option to configure said headers, or just to enable them 
explicitly would be very useful. Some projects such as mesos-ui[1] have an 
issue open about this[2].

[1] http://capgemini.github.io/devops/mesos-ui/
[2] https://github.com/Capgemini/mesos-ui/issues/57



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2015-10-23 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971183#comment-14971183
 ] 

haosdent commented on MESOS-3793:
-

Please add --launcher=posix or mount cgroup as rw when launch docker container. 
http://search-hadoop.com/m/0Vlr6zfCev1S7gRF1

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos 
> group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
> master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
> I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
> I1022 18:42:26.862560   137 registrar.cpp:309] Recovering registrar
> Failed to create a

[jira] [Commented] (MESOS-3583) Introduce sessions in HTTP Scheduler API Subscribed Responses

2015-10-23 Thread Marco Massenzio (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971187#comment-14971187
 ] 

Marco Massenzio commented on MESOS-3583:


Following from our conversation, I don't think we should consider doing this.
Adding session management introduces state that we will then need to manage in 
the even of failover; we already have failover management in Mesos and I don't 
really think that adding sessions would help any real-life use case.

We discussed the issue of badly implemented frameworks, but then that's a 
problem that's best solved via better documentation and education of the 
community.

> Introduce sessions in HTTP Scheduler API Subscribed Responses
> -
>
> Key: MESOS-3583
> URL: https://issues.apache.org/jira/browse/MESOS-3583
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: mesosphere, tech-debt
>
> Currently, the HTTP Scheduler API has no concept of Sessions aka 
> {{SessionID}} or a {{TokenID}}. This is useful in some failure scenarios. As 
> of now, if a framework fails over and then subscribes again with the same 
> {{FrameworkID}} with the {{force}} option set. The Mesos master would 
> subscribe it.
> If the previous instance of the framework/scheduler tries to send a Call , 
> e.g. {{Call::KILL}} with the same previous {{FrameworkID}} set, it would be 
> still accepted by the master leading to erroneously killing a task.
> This is possible because we do not have a way currently of distinguishing 
> connections. It used to work in the previous driver implementation due to the 
> master also performing a {{UPID}} check to verify if they matched and only 
> then allowing the call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3587) Framework failover when framework is 'active' does not trigger allocation.

2015-10-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reassigned MESOS-3587:
--

Assignee: (was: Marco Massenzio)

> Framework failover when framework is 'active' does not trigger allocation.
> --
>
> Key: MESOS-3587
> URL: https://issues.apache.org/jira/browse/MESOS-3587
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Benjamin Mahler
>Priority: Minor
>  Labels: mesosphere
>
> FWICT, this is just a consequence of some technical debt in the master code. 
> When an active framework fails over, we do not go through the 
> deactivation->activation code paths, and so:
> (1) The framework's filters in the allocator remain after the failover.
> (2) The failed over framework does not receive an immediate allocation (it 
> has to wait for the next allocation interval).
> If the framework had disconnected first, then the failover goes through the 
> deactivation->activation code paths.
> This also means that some tests take longer to run than necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2015-10-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-3235:
---
Shepherd: Till Toenshoff

> FetcherCacheHttpTest.HttpCachedSerialized and 
> FetcherCacheHttpTest.HttpCachedConcurrent are flaky
> -
>
> Key: MESOS-3235
> URL: https://issues.apache.org/jira/browse/MESOS-3235
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Joseph Wu
>Assignee: Bernd Mathiske
>  Labels: mesosphere
>
> On OSX, {{make clean && make -j8 V=0 check}}:
> {code}
> [--] 3 tests from FetcherCacheHttpTest
> [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
> HTTP/1.1 200 OK
> Date: Fri, 07 Aug 2015 17:23:05 GMT
> Content-Length: 30
> I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 0
> Forked command at 54363
> sh -c './mesos-fetcher-test-cmd 0'
> E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54363)
> E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
> E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
> 20150807-102305-139395082-52338-52313-S0
> E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Registered executor on 10.0.79.8
> Starting task 1
> Forked command at 54411
> sh -c './mesos-fetcher-test-cmd 1'
> E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> Command exited with status 0 (pid: 54411)
> E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
> Socket is not connected [57]
> ../../src/tests/fetcher_cache_tests.cpp:860: Failure
> Failed to wait 15secs for awaitFinished(task.get())
> *** Aborted at 1438968214 (unix time) try "date -d @1438968214" if you are 
> using GNU date ***
> [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
> [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
> PC: @0x113723618 process::Owned<>::get()
> *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
> @ 0x7fff8fcacf1a _sigtramp
> @ 0x7f9bc3109710 (unknown)
> @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
> @0x113862f9d 
> mesos::internal::slave::MesosContainerizerProcess::fetch()
> @0x1138f1b5d 
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
> @0x1138f18cf 
> _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
> @0x1143768cf std::__1::function<>::operator()()
> @0x11435ca7f process::ProcessBase::visit()
> @0x1143ed6fe process::DispatchEvent::visit()
> @0x11271 process::ProcessBase::serve()
> @0x114343b4e process::ProcessManager::resume()
> @0x1143431ca process::internal::schedule()
> @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
> @ 0x7fff95090268 _pthread_body
> @ 0x7fff950901e5 _pthread_start
> @ 0x7fff9508e41d thread_start
> Failed to synchronize with slave (it's probably exited)
> make[3]: *** [check-local] Segmentation fault: 11
> make[2]: *** [check-am] Error 2
> make[1]: *** [check] Error 2
> make: *** [check-recursive] Error 1
> {code}
> This was encountered just once out of 3+ {{make check}}s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2015-10-23 Thread Marco Massenzio (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reassigned MESOS-3793:
--

Assignee: Jojy Varghese

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos 
> group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
> master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
> I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
> I1022 18:42:26.862560   137 registrar.cpp:309] Recovering registrar
> Failed to create a containerizer: Could not create MesosContainerizer: Failed 
> to create launcher: Failed to create Linux launcher: Failed to mount cgr

[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971089#comment-14971089
 ] 

Alexander Rukletsov commented on MESOS-3338:


{{.reserved()}} indeed includes dynamically reserved resources. Would you agree 
that having such non-trivial math for unused resources on an agent is not 
optimal? I would suggest we revisit the types of resources we store to simplify 
the math. How about [Total [Reserved] [Offered [Allocated [Used (Reserved 
may overlap with offered, allocated and used)?

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-23 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971038#comment-14971038
 ] 

Benjamin Bannier commented on MESOS-3581:
-

RRs:

- https://reviews.apache.org/r/39590/
- https://reviews.apache.org/r/39591/
- https://reviews.apache.org/r/39592/

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-23 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971016#comment-14971016
 ] 

Klaus Ma edited comment on MESOS-3765 at 10/23/15 2:09 PM:
---

Thanks for explaining current behaviour, it match my understanding :). 
According to the description of this ticket, we consider assigning all 
resources of slave to framework is unfair, right? So which case is fairness by 
re-using the example?

Two framework {{f1}} and {{f2}}, one agent with only 1 CPU:
 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight / sum 
of weight}})
 2. 1 to {{f1}} and 0 to {{f2}}
 3. or others?

IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to 
launch task, it need a way for allocator to adjust the fairness, so 
{{requestResources()}} will help. And another option is to allow wast 
resources: keep offering 0.5 to {{f1}} and 0.5 to {{f2}}.


was (Author: klaus1982):
Thanks for explaining current behaviour, it match my understanding :). 
According to the description of this ticket, we consider assigning all 
resources of slave to framework is unfair, right? So which case is fairness by 
re-using the example?

Two framework {{f1}} and {{f2}}, one agent with only 1 CPU:
 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight/ sum 
of weight}})
 2. 1 to {{f1}} and 0 to {{f2}}
 3. or others?

IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to 
launch task, it need a way for allocator to adjust the fairness, so 
{{requestResources()}} will help.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-23 Thread Klaus Ma (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-3765:

Comment: was deleted

(was: Thanks for explaining current behaviour, it match my understanding :). 
According to the description of this ticket, we consider assigning all 
resources of slave to framework is unfair, right? So which case is fairness by 
re-using the example?

Two framework {{f1}} and {{f2}}, one agent with only 1 CPU:
 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight/ sum 
of weight}})
 2. 1 to {{f1}} and 0 to {{f2}}
 3. or others?

IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to 
launch task, it need a way for allocator to adjust the fairness, so 
{{requestResources()}} will help.)

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-23 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971016#comment-14971016
 ] 

Klaus Ma commented on MESOS-3765:
-

Thanks for explaining current behaviour, it match my understanding :). 
According to the description of this ticket, we consider assigning all 
resources of slave to framework is unfair, right? So which case is fairness by 
re-using the example?

Two framework {{f1}} and {{f2}}, one agent with only 1 CPU:
 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight/ sum 
of weight}})
 2. 1 to {{f1}} and 0 to {{f2}}
 3. or others?

IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to 
launch task, it need a way for allocator to adjust the fairness, so 
{{requestResources()}} will help.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-23 Thread Klaus Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971017#comment-14971017
 ] 

Klaus Ma commented on MESOS-3765:
-

Thanks for explaining current behaviour, it match my understanding :). 
According to the description of this ticket, we consider assigning all 
resources of slave to framework is unfair, right? So which case is fairness by 
re-using the example?

Two framework {{f1}} and {{f2}}, one agent with only 1 CPU:
 1. 0.5 to {{f1}} and 0.5 to {{f2}} (both weight are 1: {{total * weight/ sum 
of weight}})
 2. 1 to {{f1}} and 0 to {{f2}}
 3. or others?

IMO, #1 is fair to both framework; but if {{f1}}/{{f2}} acquired 1 CPU to 
launch task, it need a way for allocator to adjust the fairness, so 
{{requestResources()}} will help.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-23 Thread Benjamin Bannier (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970921#comment-14970921
 ] 

Benjamin Bannier commented on MESOS-3581:
-

After soliciting feedback on the [mailing 
list|http://www.mail-archive.com/dev@mesos.apache.org/msg33488.html] there was 
some consensus that updating the source files was preferable over a workaround 
using e.g. {{INPUT_FILTER}}.

Will propose a patch implementing changed license headers next. 

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3338) Dynamic reservations are not counted as used resources in the master

2015-10-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970804#comment-14970804
 ] 

Alexander Rukletsov commented on MESOS-3338:


I'll have a look, thanks [~gyliu]! I doubt {{slave->totalResources.reserved()}} 
includes dynamic reservations, but I have to check it.

> Dynamic reservations are not counted as used resources in the master
> 
>
> Key: MESOS-3338
> URL: https://issues.apache.org/jira/browse/MESOS-3338
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, persistent-volumes
>
> Dynamically reserved resources should be considered used or allocated and 
> hence reflected in Mesos bookkeeping structures and {{state.json}}.
> I expanded the {{ReservationTest.ReserveThenUnreserve}} test with the 
> following section:
> {code}
>   // Check that the Master counts the reservation as a used resource.
>   {
> Future response =
>   process::http::get(master.get(), "state.json");
> AWAIT_READY(response);
> Try parse = JSON::parse(response.get().body);
> ASSERT_SOME(parse);
> Result cpus =
>   parse.get().find("slaves[0].used_resources.cpus");
> ASSERT_SOME_EQ(JSON::Number(1), cpus);
>   }
> {code}
> and got
> {noformat}
> ../../../src/tests/reservation_tests.cpp:168: Failure
> Value of: (cpus).get()
>   Actual: 0
> Expected: JSON::Number(1)
> Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2936) Create a design document for Quota support in Master

2015-10-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970798#comment-14970798
 ] 

Alexander Rukletsov commented on MESOS-2936:


We are in total agreement here. Right now we plan to reject quota request for 
non-existent roles. Also endpoints related to quota *should not* modify roles 
or role weights.

Regarding dynamic roles: I'll definitely have a look, thanks for pinging me.

> Create a design document for Quota support in Master
> 
>
> Key: MESOS-2936
> URL: https://issues.apache.org/jira/browse/MESOS-2936
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: mesosphere
>
> Create a design document for the Quota feature support in Mesos Master 
> (excluding allocator) to be shared with the Mesos community.
> Design Doc:
> https://docs.google.com/document/d/16iRNmziasEjVOblYp5bbkeBZ7pnjNlaIzPQqMTHQ-9I/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-23 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970790#comment-14970790
 ] 

Alexander Rukletsov commented on MESOS-3765:


This should work, but changes the current behaviour a bit. Let me reuse your 
example to demonstrate how it works right now.

Two frameworks {{f1}} and {{f2}}, one agent with only 1 CPU. Possible scenarios:
 * {{f1}} (or {{f2}}) receives 1 CPU, but accepts 0.5 CPU, effectively 
returning 0.5 CPU to the free pool. {{f2}} is offered 0.5 CPU.
 * {{f1}} (or {{f2}}) is greedy and accepts 1 CPU, rendering {{f2}} starve.

If a framework is a good citizen, it will accept only as much resources, as it 
needs. At the same time, if a framework is good citizen, it won't lie about its 
granularity. Hence I don't really see how having frameworks report their 
preferable offer size can help to resolve allocation unfairness (which is 
described in the ticket). I do not argue having frameworks' preferable offer 
size allows for smarter allocation decisions and more efficient bin packing 
algorithms, but this is out of the scope of the ticket.

My intuition is that *there is no way to improve fairness by giving frameworks 
more tuning mechanism*. Having said that, I think tuning mechanism like 
{{requestResources}} can be extremely useful.

Also, I think it can be difficult to deduce default granularity in presence of 
multiple frameworks and varying agents.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-3792) flags.acls in /state.json response is not the flag value passed to Mesos master

2015-10-23 Thread Jian Qiu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970753#comment-14970753
 ] 

Jian Qiu edited comment on MESOS-3792 at 10/23/15 10:02 AM:


many values (maybe all) in flags has the same issue, I tried --firewall_rules 
and query /state which return something as below

{code}
"firewall_rules":"disabled_endpoints {
  paths: \"\/files\/browse\"
  paths: \"\/slave(0)\/stats.json\"}"
{code}

I think the reason is that the value in flag returned by /state is stringified 
in protobuf format rather than json format?


was (Author: qiujian):
many values (maybe all) in flags has the same issue, I tried --firewall_rules 
and query /state which return something as below

{code}
"firewall_rules":"disabled_endpoints {\n  paths: \"\/files\/browse\"\n  paths: 
\"\/slave(0)\/stats.json\"\n}\n"
{code}

I think the reason is that the value in flag returned by /state is stringified 
in protobuf format rather than json format?

> flags.acls in /state.json response is not the flag value passed to Mesos 
> master
> ---
>
> Key: MESOS-3792
> URL: https://issues.apache.org/jira/browse/MESOS-3792
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Fisher
>
> Steps to reproduce: Start Mesos master with the `--acls` flag set to the 
> following value:
> {code}
> { "run_tasks": [ { "principals": { "values": ["foo", "bar"] }, "users": { 
> "values": ["alice"] } } ] }
> {code}
> Then make a request to {{http://mesosmaster:5050/state.json}} and extract the 
> value for key `flags.acls` from the JSON body of the response.
> Expected behavior: the value is the same JSON string passed on the 
> command-line.
> Actual behavior: the value is this string in some unknown syntax:
> {code}
> run_tasks {
>   principals {
> values: "foo"
> values: "bar"
>   }
>   users {
> values: "alice"
>   }
> }
> {code}
> I don't know what this is, but it's not an ACL expression according to the 
> documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3792) flags.acls in /state.json response is not the flag value passed to Mesos master

2015-10-23 Thread Jian Qiu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970753#comment-14970753
 ] 

Jian Qiu commented on MESOS-3792:
-

many values (maybe all) in flags has the same issue, I tried --firewall_rules 
and query /state which return something as below

{code}
"firewall_rules":"disabled_endpoints {\n  paths: \"\/files\/browse\"\n  paths: 
\"\/slave(0)\/stats.json\"\n}\n"
{code}

I think the reason is that the value in flag returned by /state is stringified 
in protobuf format rather than json format?

> flags.acls in /state.json response is not the flag value passed to Mesos 
> master
> ---
>
> Key: MESOS-3792
> URL: https://issues.apache.org/jira/browse/MESOS-3792
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Fisher
>
> Steps to reproduce: Start Mesos master with the `--acls` flag set to the 
> following value:
> {code}
> { "run_tasks": [ { "principals": { "values": ["foo", "bar"] }, "users": { 
> "values": ["alice"] } } ] }
> {code}
> Then make a request to {{http://mesosmaster:5050/state.json}} and extract the 
> value for key `flags.acls` from the JSON body of the response.
> Expected behavior: the value is the same JSON string passed on the 
> command-line.
> Actual behavior: the value is this string in some unknown syntax:
> {code}
> run_tasks {
>   principals {
> values: "foo"
> values: "bar"
>   }
>   users {
> values: "alice"
>   }
> }
> {code}
> I don't know what this is, but it's not an ACL expression according to the 
> documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3795) process::io::write takes parameter as void* which could be const

2015-10-23 Thread Benjamin Bannier (JIRA)

Benjamin Bannier created MESOS-3795:
---

 Summary: process::io::write takes parameter as void* which could 
be const
 Key: MESOS-3795
 URL: https://issues.apache.org/jira/browse/MESOS-3795
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Benjamin Bannier


In libprocess we have

{code}
Future write(int fd, void* data, size_t size);
{code}

which expects a non-{{const}} {{void*}} for its {{data}} parameter. Under the 
covers {{data}} appears to be handled as a {{const}} (like one would expect 
from the signature its inspiration {{::write}}).

This function is not used too often, but since it expects a non-{{const}} value 
for {{data}} automatic conversions to {{void*}} from other pointer types are 
disabled; instead callers seem cast manually to {{void*}} -- often with C-style 
casts.

We should sync this method's signature with that of {{::write}}.

In addition to following the expected semantics of {{::write}}, having this 
work without casts with any pointer value {{data}} would make it easier to 
interface this with character literals, or raw data ptrs from STL containers 
(e.g. {{Container::data}}). It would probably also indirectly eliminate 
temptation to use C-casts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3747) HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string

2015-10-23 Thread Liqiang Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970586#comment-14970586
 ] 

Liqiang Lin commented on MESOS-3747:


[~nnielsen] in {{paths.cpp:createExecutorDirectory}} which tests that depend on 
using user name which may not be existing on test machine? Can you point to me. 
 My proposed fixes:

1) Return Error in {{paths.cpp:createExecutorDirectory}} when {{chown}} failed.
2) Validate whether the "CommandInfo.user" or "FrameworkInfo.user" is existing 
if if "--switch-user" flag is set to true. If validation failed, return user 
not existing as failed reason to framework.

[~vinodkone] What's your suggestions?

> HTTP Scheduler API no longer allows FrameworkInfo.user to be empty string
> -
>
> Key: MESOS-3747
> URL: https://issues.apache.org/jira/browse/MESOS-3747
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Ben Whitehead
>Assignee: Liqiang Lin
>Priority: Blocker
>
> When using libmesos a framework can set its user to {{""}} (empty string) to 
> inherit the user the agent processes is running as, this behavior now results 
> in a {{TASK_FAILED}}.
> Full messages and relevant agent logs below.
> The error returned to the framework tells me nothing about the user not 
> existing on the agent host instead it tells me the container died due to OOM.
> {code:title=FrameworkInfo}
> call {
> type: SUBSCRIBE
> subscribe: {
> frameworkInfo: {
> user: "",
> name: "testing"
> }
> }
> }
> {code}
> {code:title=TaskInfo}
> call {
> framework_id { value: "20151015-125949-16777343-5050-20146-" },
> type: ACCEPT,
> accept { 
> offer_ids: [{ value: "20151015-125949-16777343-5050-20146-O0" }],
> operations { 
> type: LAUNCH, 
> launch { 
> task_infos [
> {
> name: "task-1",
> task_id: { value: "task-1" },
> agent_id: { value: 
> "20151015-125949-16777343-5050-20146-S0" },
> resources [
> { name: "cpus", type: SCALAR, scalar: { value: 
> 0.1 },  role: "*" },
> { name: "mem",  type: SCALAR, scalar: { value: 
> 64.0 }, role: "*" },
> { name: "disk", type: SCALAR, scalar: { value: 
> 0.0 },  role: "*" },
> ],
> command: { 
> environment { 
> variables [ 
> { name: "SLEEP_SECONDS" value: "15" } 
> ] 
> },
> value: "env | sort && sleep $SLEEP_SECONDS"
> }
> }
> ]
>  }
>  }
>  }
> }
> {code}
> {code:title=Update Status}
> event: {
> type: UPDATE,
> update: { 
> status: { 
> task_id: { value: "task-1" }, 
> state: TASK_FAILED,
> message: "Container destroyed while preparing isolators",
> agent_id: { value: "20151015-125949-16777343-5050-20146-S0" }, 
> timestamp: 1.444939217401241E9,
> executor_id: { value: "task-1" },
> source: SOURCE_AGENT, 
> reason: REASON_MEMORY_LIMIT,
> uuid: "\237g()L\026EQ\222\301\261\265\\\221\224|" 
> } 
> }
> }
> {code}
> {code:title=agent logs}
> I1015 13:15:34.260592 19639 slave.cpp:1270] Got assigned task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> I1015 13:15:34.260921 19639 slave.cpp:1386] Launching task task-1 for 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6-
> W1015 13:15:34.262243 19639 paths.cpp:423] Failed to chown executor directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b':
>  Failed to get user information for '': Success
> I1015 13:15:34.262444 19639 slave.cpp:4852] Launching executor task-1 of 
> framework e4de5b96-41cc-4713-af44-7cffbdd63ba6- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/home/ben.whitehead/opt/mesos/work/slave/work_dir/slaves/e4de5b96-41cc-4713-af44-7cffbdd63ba6-S0/frameworks/e4de5b96-41cc-4713-af44-7cffbdd63ba6-/executors/task-1/runs/3958ff84-8dd9-4c3c-995d-5aba5250541b'
> I1015 13:15:34.262581 19639 slave.cpp:1604] Queuing task 'task-1' for 
> executor task-1 of framework 'e4de5b96-41cc-4713-af44-7cffbdd63ba6-
>

55 matches

Mail list logo