[jira] [Updated] (MESOS-3891) Add a helper function to the Agent to check available resources before launching a task.

2015-12-10 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-3891:
---

There is a ticket MESOS-2647 handling same issue for usage slack.

> Add a helper function to the Agent to check available resources before 
> launching a task. 
> -
>
> Key: MESOS-3891
> URL: https://issues.apache.org/jira/browse/MESOS-3891
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Guangya Liu
>  Labels: mesosphere
>
> Launching a task using revocable resources should be funnelled through an 
> accounting system:
> * If a task is launched using revocable resources, the resources must not be 
> in use when launching the task.  If they are in use, then the task should 
> fail to start.
> * If a task is launched using reserved resources, the resources must be made 
> available.  This means potentially evicting tasks which are using revocable 
> resources.
> Both cases could be implemented by adding a check in Slave::runTask, like a 
> new helper method:
> {noformat}
> class Slave {
>   ...
>   // Checks if the given resources are available (i.e. not utilized)
>   // for starting a task.  If not, the task should either fail to
>   // start or result in the eviction of revocable resources.
>   virtual process::Future checkAvailableResources(
>   const Resources& resources);
>   ...
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.

2015-12-10 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050358#comment-15050358
 ] 

Benjamin Bannier commented on MESOS-4106:
-

Late to the party as this already went in.

Just {{sleep}}ing here to have the message out is a very weak guarantee (it 
does not guarantee that the message was actually sent). What one should 
probably do instead to make this robust is block until a state change in 
{{executor}} happens (with a timeout), e.g., observe change of state of 
{{taskID}} via querying the {{executor}}.

> The health checker may fail to inform the executor to kill an unhealthy task 
> after max_consecutive_failures.
> 
>
> Key: MESOS-4106
> URL: https://issues.apache.org/jira/browse/MESOS-4106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, 
> 0.23.1, 0.24.0, 0.24.1, 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Blocker
> Fix For: 0.27.0
>
>
> This was reported by [~tan] experimenting with health checks. Many tasks were 
> launched with the following health check, taken from the container 
> stdout/stderr:
> {code}
> Launching health check process: /usr/local/libexec/mesos/mesos-health-check 
> --executor=(1)@127.0.0.1:39629 
> --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0}
>  --task_id=sleepy-2
> {code}
> This should have led to all tasks getting killed due to 
> {{\-\-consecutive_failures}} being set, however, only some tasks get killed, 
> while other remain running.
> It turns out that the health check binary does a {{send}} and promptly exits. 
> Unfortunately, this may lead to a message drop since libprocess may not have 
> sent this message over the socket by the time the process exits.
> We work around this in the command executor with a manual sleep, which has 
> been around since the svn days. See 
> [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.

2015-12-10 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050358#comment-15050358
 ] 

Benjamin Bannier edited comment on MESOS-4106 at 12/10/15 8:50 AM:
---

Late to the party as this already went in.

Just sleeping here to have the message out is a very weak guarantee (it does 
not guarantee that the message was actually sent). What one should probably do 
instead to make this robust is block until a state change in {{executor}} 
happens (with a timeout), e.g., observe change of state of {{taskID}} via 
querying the {{executor}}.


was (Author: bbannier):
Late to the party as this already went in.

Just {{sleep}}ing here to have the message out is a very weak guarantee (it 
does not guarantee that the message was actually sent). What one should 
probably do instead to make this robust is block until a state change in 
{{executor}} happens (with a timeout), e.g., observe change of state of 
{{taskID}} via querying the {{executor}}.

> The health checker may fail to inform the executor to kill an unhealthy task 
> after max_consecutive_failures.
> 
>
> Key: MESOS-4106
> URL: https://issues.apache.org/jira/browse/MESOS-4106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, 
> 0.23.1, 0.24.0, 0.24.1, 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Blocker
> Fix For: 0.27.0
>
>
> This was reported by [~tan] experimenting with health checks. Many tasks were 
> launched with the following health check, taken from the container 
> stdout/stderr:
> {code}
> Launching health check process: /usr/local/libexec/mesos/mesos-health-check 
> --executor=(1)@127.0.0.1:39629 
> --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0}
>  --task_id=sleepy-2
> {code}
> This should have led to all tasks getting killed due to 
> {{\-\-consecutive_failures}} being set, however, only some tasks get killed, 
> while other remain running.
> It turns out that the health check binary does a {{send}} and promptly exits. 
> Unfortunately, this may lead to a message drop since libprocess may not have 
> sent this message over the socket by the time the process exits.
> We work around this in the command executor with a manual sleep, which has 
> been around since the svn days. See 
> [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.

2015-12-10 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050374#comment-15050374
 ] 

Timothy Chen commented on MESOS-4106:
-

Thanks for fixing my bug!

> The health checker may fail to inform the executor to kill an unhealthy task 
> after max_consecutive_failures.
> 
>
> Key: MESOS-4106
> URL: https://issues.apache.org/jira/browse/MESOS-4106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, 
> 0.23.1, 0.24.0, 0.24.1, 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Blocker
> Fix For: 0.27.0
>
>
> This was reported by [~tan] experimenting with health checks. Many tasks were 
> launched with the following health check, taken from the container 
> stdout/stderr:
> {code}
> Launching health check process: /usr/local/libexec/mesos/mesos-health-check 
> --executor=(1)@127.0.0.1:39629 
> --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0}
>  --task_id=sleepy-2
> {code}
> This should have led to all tasks getting killed due to 
> {{\-\-consecutive_failures}} being set, however, only some tasks get killed, 
> while other remain running.
> It turns out that the health check binary does a {{send}} and promptly exits. 
> Unfortunately, this may lead to a message drop since libprocess may not have 
> sent this message over the socket by the time the process exits.
> We work around this in the command executor with a manual sleep, which has 
> been around since the svn days. See 
> [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4112) Clean up libprocess gtest macros

2015-12-10 Thread Michael Park (JIRA)
Michael Park created MESOS-4112:
---

 Summary: Clean up libprocess gtest macros
 Key: MESOS-4112
 URL: https://issues.apache.org/jira/browse/MESOS-4112
 Project: Mesos
  Issue Type: Task
  Components: libprocess, test
Reporter: Michael Park


This ticket is regarding the libprocess gtest helpers in 
{{3rdparty/libprocess/include/process/gtest.hpp}}.

The pattern in this file seems to be a set of macros:

* {{AWAIT_ASSERT__FOR}}
* {{AWAIT_ASSERT_}} -- default of 15 seconds
* {{AWAIT_\_FOR}} -- alias for {{AWAIT_ASSERT__FOR}}
* {{AWAIT_}} -- alias for {{AWAIT_ASSERT_}}
* {{AWAIT_EXPECT__FOR}}
* {{AWAIT_EXPECT_}} -- default of 15 seconds

(1) {{AWAIT_EQ_FOR}} should be added for completeness.

(2) In {{gtest}}, we've got {{EXPECT_EQ}} as well as the {{bool}}-specific 
versions: {{EXPECT_TRUE}} and {{EXPECT_FALSE}}.

We should adopt this pattern in these helpers as well. Keeping the pattern 
above in mind, the following are missing:

* {{AWAIT_ASSERT_TRUE_FOR}}
* {{AWAIT_ASSERT_TRUE}}
* {{AWAIT_ASSERT_FALSE_FOR}}
* {{AWAIT_ASSERT_FALSE}}
* {{AWAIT_EXPECT_TRUE_FOR}}
* {{AWAIT_EXPECT_FALSE_FOR}}

(3) There are HTTP response related macros at the bottom of the file, e.g. 
{{AWAIT_EXPECT_RESPONSE_STATUS_EQ}}, however these are missing their {{ASSERT}} 
counterparts.

(4) The reason for (3) presumably is because we reach for {{EXPECT}} over 
{{ASSERT}} in general due to the test suite crashing behavior of {{ASSERT}}. If 
this is the case, it would be worthwhile considering whether macros such as 
{{AWAIT_READY}} should alias {{AWAIT_EXPECT_READY}} rather than 
{{AWAIT_ASSERT_READY}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4113) Docker Executor should not set container IP during bridged mode

2015-12-10 Thread Sargun Dhillon (JIRA)
Sargun Dhillon created MESOS-4113:
-

 Summary: Docker Executor should not set container IP during 
bridged mode
 Key: MESOS-4113
 URL: https://issues.apache.org/jira/browse/MESOS-4113
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.25.0
Reporter: Sargun Dhillon
Priority: Minor


The docker executor currently sets the IP address of the container into 
ContainerStatus.NetworkInfo.IPAddresses. This isn't a good thing, because 
during bridged mode execution, it makes it so that that IP address is useless, 
since it's behind the Docker NAT. I would like a flag that disables filling the 
IP address in, and allows it to fall back to the agent IP. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4114) Add field VIP to message Port

2015-12-10 Thread Sargun Dhillon (JIRA)
Sargun Dhillon created MESOS-4114:
-

 Summary: Add field VIP to message Port
 Key: MESOS-4114
 URL: https://issues.apache.org/jira/browse/MESOS-4114
 Project: Mesos
  Issue Type: Wish
Reporter: Sargun Dhillon
Priority: Trivial


We would like to extend the Mesos protocol buffer 'Port' to include an optional 
string string named "VIP" - to map it to a well known virtual IP, or virtual 
hostname for discovery purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4114) Add field VIP to message Port

2015-12-10 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan reassigned MESOS-4114:


Assignee: Avinash Sridharan

> Add field VIP to message Port
> -
>
> Key: MESOS-4114
> URL: https://issues.apache.org/jira/browse/MESOS-4114
> Project: Mesos
>  Issue Type: Wish
>Reporter: Sargun Dhillon
>Assignee: Avinash Sridharan
>Priority: Trivial
>  Labels: mesosphere
>
> We would like to extend the Mesos protocol buffer 'Port' to include an 
> optional string string named "VIP" - to map it to a well known virtual IP, or 
> virtual hostname for discovery purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4114) Add field VIP to message Port

2015-12-10 Thread Sargun Dhillon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sargun Dhillon updated MESOS-4114:
--
Description: 
We would like to extend the Mesos protocol buffer 'Port' to include an optional 
string string named "VIP" - to map it to a well known virtual IP, or virtual 
hostname for discovery purposes.

We also want this field exposed in DiscoveryInfo in state.json.

  was:We would like to extend the Mesos protocol buffer 'Port' to include an 
optional string string named "VIP" - to map it to a well known virtual IP, or 
virtual hostname for discovery purposes.


> Add field VIP to message Port
> -
>
> Key: MESOS-4114
> URL: https://issues.apache.org/jira/browse/MESOS-4114
> Project: Mesos
>  Issue Type: Wish
>Reporter: Sargun Dhillon
>Assignee: Avinash Sridharan
>Priority: Trivial
>  Labels: mesosphere
>
> We would like to extend the Mesos protocol buffer 'Port' to include an 
> optional string string named "VIP" - to map it to a well known virtual IP, or 
> virtual hostname for discovery purposes.
> We also want this field exposed in DiscoveryInfo in state.json.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4056) Respond with `MethodNotAllowed` if a request uses an unsupported method

2015-12-10 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051177#comment-15051177
 ] 

Joris Van Remoortere commented on MESOS-4056:
-

Alex will file a JIRA to discuss cleaning up the error message construction 
such that the constructor generates them. This is less error prone and allows 
us to control the format of the message in 1 place.

> Respond with `MethodNotAllowed` if a request uses an unsupported method
> ---
>
> Key: MESOS-4056
> URL: https://issues.apache.org/jira/browse/MESOS-4056
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> We are inconsistent right now in how we respond to endpoint requests with 
> unsupported methods: both {{MethodNotAllowed}} and {{BadRequest}} are used. 
> We are also not consistent in the error message we include in the body.
> This ticket proposes use {{MethodNotAllowed}} with standardized message text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4056) Respond with `MethodNotAllowed` if a request uses an unsupported method

2015-12-10 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4056:

Sprint: Mesosphere Sprint 24

> Respond with `MethodNotAllowed` if a request uses an unsupported method
> ---
>
> Key: MESOS-4056
> URL: https://issues.apache.org/jira/browse/MESOS-4056
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> We are inconsistent right now in how we respond to endpoint requests with 
> unsupported methods: both {{MethodNotAllowed}} and {{BadRequest}} are used. 
> We are also not consistent in the error message we include in the body.
> This ticket proposes use {{MethodNotAllowed}} with standardized message text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3841) Master HTTP API support to get the leader

2015-12-10 Thread Cosmin Lehene (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051208#comment-15051208
 ] 

Cosmin Lehene commented on MESOS-3841:
--

[~qiujian] that would solve the immediate ask of this issue. I wonder though, 
if the master API needs to do more (e.g. get all masters, including 
non-leaders), whether it shouldn't also be prepended with something like 
{{api/v1}} so that it could be evolved properly later on.


> Master HTTP API support to get the leader
> -
>
> Key: MESOS-3841
> URL: https://issues.apache.org/jira/browse/MESOS-3841
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Cosmin Lehene
>Assignee: Jian Qiu
>
> There's currently no good way to query the current master ensemble leader.
> Some workarounds to get the leader (and parse it from leader@ip) from 
> {{/state.json}} or to grep it from  {{master/redirect}}. 
> The scheduler API does an HTTP redirect, but that requires an HTTP  POST 
> coming from a framework as well
> {{POST /api/v1/scheduler  HTTP/1.1}}
> There should be a lightweight API call to get the current master. 
> This could be part of a more granular representation (REST) of the current 
> state.json.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4071) Master crash during framework teardown ( Check failed: total.resources.contains(slaveId))

2015-12-10 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051216#comment-15051216
 ] 

Joris Van Remoortere commented on MESOS-4071:
-

That is the right idea; however, I want to see a test that does this with a 
much higher repetition.
IIUC if this math is repeated enough, the delta will grow beyond the epsilon 
and void the strategy of "NEAR" checks.

> Master crash during framework teardown ( Check failed: 
> total.resources.contains(slaveId))
> -
>
> Key: MESOS-4071
> URL: https://issues.apache.org/jira/browse/MESOS-4071
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.25.0
>Reporter: Mandeep Chadha
>
> Stack Trace :
> NOTE : Replaced IP address with XX.XX.XX.XX 
> {code}
> I1204 10:31:03.391127 2588810 master.cpp:5564] Processing TEARDOWN call for 
> framework 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 
> (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST) at 
> scheduler-c8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237
> I1204 10:31:03.391177 2588810 master.cpp:5576] Removing framework 
> 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 
> (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST)) at 
> schedulerc8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237
> I1204 10:31:03.391337 2588805 hierarchical.hpp:605] Deactivated framework 
> 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014
> F1204 10:31:03.395500 2588810 sorter.cpp:233] Check failed: 
> total.resources.contains(slaveId)
> *** Check failure stack trace: ***
> @ 0x7f2b3dda53d8  google::LogMessage::Fail()
> @ 0x7f2b3dda5327  google::LogMessage::SendToLog()
> @ 0x7f2b3dda4d38  google::LogMessage::Flush()
> @ 0x7f2b3dda7a6c  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f2b3d3351a1  
> mesos::internal::master::allocator::DRFSorter::remove()
> @ 0x7f2b3d0b8c29  
> mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
> @ 0x7f2b3d0ca823 
> _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDES6_EEvRKNS_3PIDIT_EEMSA_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESJ_
> @ 0x7f2b3d0dc8dc  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_11FrameworkIDESA_EEvRKNS0_3PIDIT_EEMSE_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2
> _
> @ 0x7f2b3dd2cc35  std::function<>::operator()()
> @ 0x7f2b3dd15ae5  process::ProcessBase::visit()
> @ 0x7f2b3dd188e2  process::DispatchEvent::visit()
> @   0x472366  process::ProcessBase::serve()
> @ 0x7f2b3dd1203f  process::ProcessManager::resume()
> @ 0x7f2b3dd061b2  process::internal::schedule()
> @ 0x7f2b3dd63efd  
> _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Inde
> x_tupleIJXspT_EEE
> @ 0x7f2b3dd63e4d  std::_Bind_simple<>::operator()()
> @ 0x7f2b3dd63de6  std::thread::_Impl<>::_M_run()
> @   0x318c2b6470  (unknown)
> @   0x318b2079d1  (unknown)
> @   0x318aae8b5d  (unknown)
> @  (nil)  (unknown)
> Aborted (core dumped)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3861) Authenticate quota requests

2015-12-10 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-3861:

Shepherd: Joris Van Remoortere  (was: Till Toenshoff)

> Authenticate quota requests
> ---
>
> Key: MESOS-3861
> URL: https://issues.apache.org/jira/browse/MESOS-3861
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere, security
>
> Quota requests need to be authenticated.
> This ticket will authenticate quota requests using credentials provided by 
> the {{Authorization}} field of the HTTP request. This is similar to how 
> authentication is implemented in {{Master::Http}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051302#comment-15051302
 ] 

haosdent commented on MESOS-4024:
-

still not idea. And health check log are located in sandbox and not display in 
jenkins test log. But we could change consecutiveFailures to 2 and 
timeoutSeconds to 2 to reduce test time first.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.338601 30307 recover.cpp:578] Successfully joined the Paxos 
> group
> I1201 13:03:15.338803 30307 recover.cpp:4

[jira] [Updated] (MESOS-4114) Add field VIP to message Port

2015-12-10 Thread Sargun Dhillon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sargun Dhillon updated MESOS-4114:
--
Description: 
We would like to extend the Mesos protocol buffer 'Port' to include an optional 
repeated string named "VIP" - to map it to a well known virtual IP, or virtual 
hostname for discovery purposes.

We also want this field exposed in DiscoveryInfo in state.json.

  was:
We would like to extend the Mesos protocol buffer 'Port' to include an optional 
string string named "VIP" - to map it to a well known virtual IP, or virtual 
hostname for discovery purposes.

We also want this field exposed in DiscoveryInfo in state.json.


> Add field VIP to message Port
> -
>
> Key: MESOS-4114
> URL: https://issues.apache.org/jira/browse/MESOS-4114
> Project: Mesos
>  Issue Type: Wish
>Reporter: Sargun Dhillon
>Assignee: Avinash Sridharan
>Priority: Trivial
>  Labels: mesosphere
>
> We would like to extend the Mesos protocol buffer 'Port' to include an 
> optional repeated string named "VIP" - to map it to a well known virtual IP, 
> or virtual hostname for discovery purposes.
> We also want this field exposed in DiscoveryInfo in state.json.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4115) Fix possible race conditions in registry client tests.

2015-12-10 Thread Jojy Varghese (JIRA)
Jojy Varghese created MESOS-4115:


 Summary: Fix possible race conditions in registry client tests.
 Key: MESOS-4115
 URL: https://issues.apache.org/jira/browse/MESOS-4115
 Project: Mesos
  Issue Type: Improvement
 Environment: linux
Reporter: Jojy Varghese
Assignee: Jojy Varghese


RegistryClient tests show flakiness which manifests as socket timeouts or 
unexpected buffer showing up in the blobs. Investigate them for possible race 
conditions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4024) HealthCheckTest.CheckCommandTimeout is flaky.

2015-12-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051350#comment-15051350
 ] 

haosdent commented on MESOS-4024:
-

Could I change 
{noformat}
  Try containerizer =
MesosContainerizer::create(flags, false, &fetcher);
{noformat}
to local in HealthCheckTest.CheckCommandTimeout?
{noformat}
  Try containerizer =
MesosContainerizer::create(flags, true, &fetcher);
{noformat}

So that it could print log to stdout.

> HealthCheckTest.CheckCommandTimeout is flaky.
> -
>
> Key: MESOS-4024
> URL: https://issues.apache.org/jira/browse/MESOS-4024
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: flaky-test
> Attachments: HealthCheckTest_CheckCommandTimeout.log
>
>
> {noformat: title=Failed Run}
> [ RUN  ] HealthCheckTest.CheckCommandTimeout
> I1201 13:03:15.211911 30288 leveldb.cpp:174] Opened db in 126.548747ms
> I1201 13:03:15.254041 30288 leveldb.cpp:181] Compacted db in 42.053948ms
> I1201 13:03:15.254226 30288 leveldb.cpp:196] Created db iterator in 25588ns
> I1201 13:03:15.254281 30288 leveldb.cpp:202] Seeked to beginning of db in 
> 3231ns
> I1201 13:03:15.254294 30288 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 256ns
> I1201 13:03:15.254348 30288 replica.cpp:778] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 13:03:15.255162 30311 recover.cpp:447] Starting replica recovery
> I1201 13:03:15.255502 30311 recover.cpp:473] Replica is in EMPTY status
> I1201 13:03:15.257158 30311 replica.cpp:674] Replica in EMPTY status received 
> a broadcasted recover request from (1898)@172.17.21.0:52024
> I1201 13:03:15.258224 30318 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I1201 13:03:15.259735 30310 recover.cpp:564] Updating replica status to 
> STARTING
> I1201 13:03:15.265080 30322 master.cpp:365] Master 
> dd5bff66-362f-4efc-963a-54756b2edcce (fa812f474cf4) started on 
> 172.17.21.0:52024
> I1201 13:03:15.265121 30322 master.cpp:367] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/IaRntP/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.27.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/IaRntP/master" --zk_session_timeout="10secs"
> I1201 13:03:15.265487 30322 master.cpp:412] Master only allowing 
> authenticated frameworks to register
> I1201 13:03:15.265504 30322 master.cpp:417] Master only allowing 
> authenticated slaves to register
> I1201 13:03:15.265513 30322 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/IaRntP/credentials'
> I1201 13:03:15.265842 30322 master.cpp:456] Using default 'crammd5' 
> authenticator
> I1201 13:03:15.266006 30322 master.cpp:493] Authorization enabled
> I1201 13:03:15.266464 30308 hierarchical.cpp:162] Initialized hierarchical 
> allocator process
> I1201 13:03:15.267225 30321 whitelist_watcher.cpp:77] No whitelist given
> I1201 13:03:15.268847 30322 master.cpp:1637] The newly elected leader is 
> master@172.17.21.0:52024 with id dd5bff66-362f-4efc-963a-54756b2edcce
> I1201 13:03:15.268887 30322 master.cpp:1650] Elected as the leading master!
> I1201 13:03:15.268905 30322 master.cpp:1395] Recovering from registrar
> I1201 13:03:15.270830 30322 registrar.cpp:307] Recovering registrar
> I1201 13:03:15.291272 30318 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 31.410668ms
> I1201 13:03:15.291363 30318 replica.cpp:321] Persisted replica status to 
> STARTING
> I1201 13:03:15.291733 30318 recover.cpp:473] Replica is in STARTING status
> I1201 13:03:15.293392 30318 replica.cpp:674] Replica in STARTING status 
> received a broadcasted recover request from (1900)@172.17.21.0:52024
> I1201 13:03:15.294251 30307 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I1201 13:03:15.294756 30307 recover.cpp:564] Updating replica status to VOTING
> I1201 13:03:15.338260 30307 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 43.256127ms
> I1201 13:03:15.338348 30307 replica.cpp:321] Persisted replica status to 
> VOTING
> I1201 13:03:15.3386

[jira] [Commented] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.

2015-12-10 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051371#comment-15051371
 ] 

Benjamin Mahler commented on MESOS-4106:


I'm not sure we should say sleeping provides a "very weak guarantee", there is 
indeed *no guarantee* with a sleep that the message is sent.

The approach you've suggested with querying with a timeout still provides no 
form of guarantee, unless you are going to wait indefinitely or use the timeout 
mentioned to trigger a retry rather than an exit (what did you intend to happen 
after the timeout?). This approach is guaranteeing application-level delivery, 
and we generally just use an "acknowledgement" message with retries to do this, 
rather than a separate query.

However, since the executor resides on the same machine, and executor failover 
is not supported, we're unlikely to bother implementing acknowledgements with 
retries here. We only need to wait for the data to be sent on the socket (this 
gives a "weak guarantee": e.g. if there are no socket errors (note that both 
ends of the socket are within the same machine), and the executor remains up, 
the message will eventually be processed by the executor). MESOS-4111 discusses 
the general issue of being able to exit after ensuring that messages are 
processed in libprocess.

In the case of the long-standing command executor sleep, we needed to handle 
agent failure. So we are already using acknowledgements there, and can use them 
to {{stop()}} cleanly.

> The health checker may fail to inform the executor to kill an unhealthy task 
> after max_consecutive_failures.
> 
>
> Key: MESOS-4106
> URL: https://issues.apache.org/jira/browse/MESOS-4106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, 
> 0.23.1, 0.24.0, 0.24.1, 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Blocker
> Fix For: 0.27.0
>
>
> This was reported by [~tan] experimenting with health checks. Many tasks were 
> launched with the following health check, taken from the container 
> stdout/stderr:
> {code}
> Launching health check process: /usr/local/libexec/mesos/mesos-health-check 
> --executor=(1)@127.0.0.1:39629 
> --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0}
>  --task_id=sleepy-2
> {code}
> This should have led to all tasks getting killed due to 
> {{\-\-consecutive_failures}} being set, however, only some tasks get killed, 
> while other remain running.
> It turns out that the health check binary does a {{send}} and promptly exits. 
> Unfortunately, this may lead to a message drop since libprocess may not have 
> sent this message over the socket by the time the process exits.
> We work around this in the command executor with a manual sleep, which has 
> been around since the svn days. See 
> [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3843) Audit `src/CMakelists.txt` to make sure we're compiling everything we need to build the agent binary.

2015-12-10 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051388#comment-15051388
 ] 

Joris Van Remoortere commented on MESOS-3843:
-

{code}
commit eab2a77fe8b412a3a4784b496b16ed76f1a530dd
Author: Diana Arroyo 
Date:   Thu Dec 10 09:54:52 2015 -0800

CMake: Updated LFLAG for dl library to defined label.

Review: https://reviews.apache.org/r/41185

commit 386bcbc2f4546623b77fa1f111e8c25b5159c889
Author: Diana Arroyo 
Date:   Thu Dec 10 09:54:35 2015 -0800

CMake: Added LFLAGs need for linux cmake build.

Review: https://reviews.apache.org/r/41096
{code}

> Audit `src/CMakelists.txt` to make sure we're compiling everything we need to 
> build the agent binary.
> -
>
> Key: MESOS-3843
> URL: https://issues.apache.org/jira/browse/MESOS-3843
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Diana Arroyo
>
> `src/CMakeLists.txt` has fallen into some state of disrepair. There are some 
> source files that seem to be missing (e.g., the `src/launcher/` and 
> `src/linux`/ directories), so the first step is to audit the source file to 
> make sure everything we need is there. Likely this will mean looking at the 
> corresponding `src/Makefile.am` to see that's missing.
> Once we understand the limitations of the current build, we can fan out more 
> tickets or proceed to generating the agent binary, as well as the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4116) Add tests for quotas + empty roles (no registered frameworks)

2015-12-10 Thread Neil Conway (JIRA)
Neil Conway created MESOS-4116:
--

 Summary: Add tests for quotas + empty roles (no registered 
frameworks)
 Key: MESOS-4116
 URL: https://issues.apache.org/jira/browse/MESOS-4116
 Project: Mesos
  Issue Type: Task
  Components: allocation
Reporter: Neil Conway
Assignee: Neil Conway






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4116) Add tests for quotas + empty roles (no registered frameworks)

2015-12-10 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051432#comment-15051432
 ] 

Joris Van Remoortere commented on MESOS-4116:
-

https://reviews.apache.org/r/41215

> Add tests for quotas + empty roles (no registered frameworks)
> -
>
> Key: MESOS-4116
> URL: https://issues.apache.org/jira/browse/MESOS-4116
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, quota, test
> Fix For: 0.27.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4116) Add tests for quotas + empty roles (no registered frameworks)

2015-12-10 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4116:

  Sprint: Mesosphere Sprint 24
Story Points: 2

> Add tests for quotas + empty roles (no registered frameworks)
> -
>
> Key: MESOS-4116
> URL: https://issues.apache.org/jira/browse/MESOS-4116
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, quota, test
> Fix For: 0.27.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4117) Add check for pending libprocess events during test teardown

2015-12-10 Thread Neil Conway (JIRA)
Neil Conway created MESOS-4117:
--

 Summary: Add check for pending libprocess events during test 
teardown
 Key: MESOS-4117
 URL: https://issues.apache.org/jira/browse/MESOS-4117
 Project: Mesos
  Issue Type: Improvement
  Components: test
Reporter: Neil Conway


If calling {{Clock::settle()}} during test teardown would _not_ be a no-op 
(i.e., if there are any pending libprocess events in-flight), that seems like a 
likely test bug. We should consider adding a {{CHECK}} for this during test 
shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.

2015-12-10 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4106:
--
Fix Version/s: (was: 0.27.0)
   0.26.0

> The health checker may fail to inform the executor to kill an unhealthy task 
> after max_consecutive_failures.
> 
>
> Key: MESOS-4106
> URL: https://issues.apache.org/jira/browse/MESOS-4106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, 
> 0.23.1, 0.24.0, 0.24.1, 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Blocker
> Fix For: 0.26.0
>
>
> This was reported by [~tan] experimenting with health checks. Many tasks were 
> launched with the following health check, taken from the container 
> stdout/stderr:
> {code}
> Launching health check process: /usr/local/libexec/mesos/mesos-health-check 
> --executor=(1)@127.0.0.1:39629 
> --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0}
>  --task_id=sleepy-2
> {code}
> This should have led to all tasks getting killed due to 
> {{\-\-consecutive_failures}} being set, however, only some tasks get killed, 
> while other remain running.
> It turns out that the health check binary does a {{send}} and promptly exits. 
> Unfortunately, this may lead to a message drop since libprocess may not have 
> sent this message over the socket by the time the process exits.
> We work around this in the command executor with a manual sleep, which has 
> been around since the svn days. See 
> [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4015) Expose task / executor health in master & slave state.json

2015-12-10 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4015:
--
Fix Version/s: (was: 0.27.0)
   0.26.0

> Expose task / executor health in master & slave state.json
> --
>
> Key: MESOS-4015
> URL: https://issues.apache.org/jira/browse/MESOS-4015
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.25.0
>Reporter: Sargun Dhillon
>Assignee: Artem Harutyunyan
>Priority: Trivial
>  Labels: mesosphere
> Fix For: 0.26.0
>
>
> Right now, if I specify a healthcheck for a task, the only way to get to it 
> is via the Task Status updates that come to the framework. Unfortunately, 
> this information isn't exposed in the state.json either in the slave or 
> master. It'd be ideal to have that information to enable tools like Mesos-DNS 
> to be health-aware.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4118) Update Getting Started for Mac OS X El Capitan

2015-12-10 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-4118:
--

 Summary: Update Getting Started for Mac OS X El Capitan
 Key: MESOS-4118
 URL: https://issues.apache.org/jira/browse/MESOS-4118
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
 Environment: Mac OS X
Reporter: Kevin Klues
Assignee: Kevin Klues
Priority: Minor


This ticket pertains to the Getting Started guide on the apache mesos website

The current instructions for installing on Mac OS X only include instructions 
for Yosemite.  To run after an upgrade to El Capitan requires a trivial (but 
important) step which is non-obvious -- you have to rerun 'xcode-select 
--install' after you complete the upgrade.

Let's change the heading for installing on Mac OS X to say:
Mac OS X Yosemite & El Capitan

and then add a comment at the bottom of the section to point out that a rerun 
of 'xcode-select --install' is necessary after an upgrade from Yosemite to El 
Capitan.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4118) Update Getting Started for Mac OS X El Capitan

2015-12-10 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-4118:
---
Description: 
This ticket pertains to the Getting Started guide on the apache mesos website

The current instructions for installing on Mac OS X only include instructions 
for Yosemite.  The instructions to build for El Capitan are identical except in 
the case of upgrading from Yosemite to El Capitan.  To build after an upgrade 
requires a trivial (but important) step which is non-obvious -- you have to 
rerun 'xcode-select --install' after you complete the upgrade.

Let's change the heading for installing on Mac OS X to say:
Mac OS X Yosemite & El Capitan

and then add a comment at the bottom of the section to point out that a rerun 
of 'xcode-select --install' is necessary after an upgrade from Yosemite to El 
Capitan.


  was:
This ticket pertains to the Getting Started guide on the apache mesos website

The current instructions for installing on Mac OS X only include instructions 
for Yosemite.  To run after an upgrade to El Capitan requires a trivial (but 
important) step which is non-obvious -- you have to rerun 'xcode-select 
--install' after you complete the upgrade.

Let's change the heading for installing on Mac OS X to say:
Mac OS X Yosemite & El Capitan

and then add a comment at the bottom of the section to point out that a rerun 
of 'xcode-select --install' is necessary after an upgrade from Yosemite to El 
Capitan.



> Update Getting Started for Mac OS X El Capitan
> --
>
> Key: MESOS-4118
> URL: https://issues.apache.org/jira/browse/MESOS-4118
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
> Environment: Mac OS X
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Minor
>  Labels: documentation, mesosphere
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> This ticket pertains to the Getting Started guide on the apache mesos website
> The current instructions for installing on Mac OS X only include instructions 
> for Yosemite.  The instructions to build for El Capitan are identical except 
> in the case of upgrading from Yosemite to El Capitan.  To build after an 
> upgrade requires a trivial (but important) step which is non-obvious -- you 
> have to rerun 'xcode-select --install' after you complete the upgrade.
> Let's change the heading for installing on Mac OS X to say:
> Mac OS X Yosemite & El Capitan
> and then add a comment at the bottom of the section to point out that a rerun 
> of 'xcode-select --install' is necessary after an upgrade from Yosemite to El 
> Capitan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4090) Create a light-weight, executor only mesos egg

2015-12-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051713#comment-15051713
 ] 

Vinod Kone commented on MESOS-4090:
---

One thing I had in mind was that we could do something similar to generate a 
stripped down version of the egg.

Also, not sure how this approach plays with our previous split of mesos egg 
into native, interface etc eggs.

[~tillt] [~wickman] [~thomasr] Comments on the above review/approach?

> Create a light-weight, executor only mesos egg
> --
>
> Key: MESOS-4090
> URL: https://issues.apache.org/jira/browse/MESOS-4090
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>
> Currently, when running tasks in docker containers, if the executor uses the 
> mesos.native python library, the execution environment inside the container 
> (OS, native libs, etc) must match the execution environment outside the 
> container fairly closely in order to load the mesos.so library.
> The solution here can be to introduce a much lighter weight python egg, 
> mesos.executor, which only includes code (and dependencies) needed to create 
> and run an MesosExecutorDriver.  Executors can then use this native library 
> instead of mesos.native.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4090) Create a light-weight, executor only mesos egg

2015-12-10 Thread Thomas Rampelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051736#comment-15051736
 ] 

Thomas Rampelberg commented on MESOS-4090:
--

I like the idea quite a bit. Leaving `native` as a catch all doesn't make much 
sense though. I'd recommend either putting a couple different libs into 
`native` (eg. executor, scheduler, ) or splitting them out into totally 
separate python packages named the way they're used (again, executor, 
scheduler, ...). As the number of python packages are starting to get a little 
ridiculous, my preference would be to put the libs into native.

> Create a light-weight, executor only mesos egg
> --
>
> Key: MESOS-4090
> URL: https://issues.apache.org/jira/browse/MESOS-4090
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>
> Currently, when running tasks in docker containers, if the executor uses the 
> mesos.native python library, the execution environment inside the container 
> (OS, native libs, etc) must match the execution environment outside the 
> container fairly closely in order to load the mesos.so library.
> The solution here can be to introduce a much lighter weight python egg, 
> mesos.executor, which only includes code (and dependencies) needed to create 
> and run an MesosExecutorDriver.  Executors can then use this native library 
> instead of mesos.native.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2154) Port CFS quota support to Docker Containerizer

2015-12-10 Thread Steve Niemitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051749#comment-15051749
 ] 

Steve Niemitz commented on MESOS-2154:
--

The issue with only setting the CFS quota on the command line is that the quota 
can change over time as tasks add/leave the container.  Correctly supporting 
CFS requires directly updating the cgroup to handle this case, the same way 
cpu.shares is handled.

We've been running a version of mesos w/ a custom CFS patch [1] applied for ~6 
months now, I had a review open a long time ago that I gave up on trying to get 
upstream, but I think I'm going to revisit it now.

[1] https://reviews.apache.org/r/33174/diff/2/

> Port CFS quota support to Docker Containerizer
> --
>
> Key: MESOS-2154
> URL: https://issues.apache.org/jira/browse/MESOS-2154
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker, isolation
>Affects Versions: 0.21.0
> Environment: Linux (Ubuntu 14.04.1)
>Reporter: Andrew Ortman
>Assignee: haosdent
>Priority: Minor
>
> Port the CFS quota support the Mesos Containerizer has to the Docker 
> Containerizer. Whenever the --cgroup_enable_cfs flag is set, the Docker 
> Containerizer should update the cfs_period_us and cfs_quota_us values to 
> allow hard CPU capping on the container. 
> Current workaround is to pass those values as LXC configuration parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4119) Add support for enabling --3way to apply-reviews.py.

2015-12-10 Thread Artem Harutyunyan (JIRA)
Artem Harutyunyan created MESOS-4119:


 Summary: Add support for enabling --3way to apply-reviews.py.
 Key: MESOS-4119
 URL: https://issues.apache.org/jira/browse/MESOS-4119
 Project: Mesos
  Issue Type: Task
Reporter: Artem Harutyunyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4090) Create a light-weight, executor only mesos egg

2015-12-10 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051713#comment-15051713
 ] 

Vinod Kone edited comment on MESOS-4090 at 12/10/15 10:25 PM:
--

One thing I had in mind was that we could do something similar to generate a 
stripped down version of the scheduler egg.

Also, not sure how this approach plays with our previous split of mesos egg 
into native, interface etc eggs.

[~tillt] [~wickman] [~thomasr] Comments on the above review/approach?


was (Author: vinodkone):
One thing I had in mind was that we could do something similar to generate a 
stripped down version of the egg.

Also, not sure how this approach plays with our previous split of mesos egg 
into native, interface etc eggs.

[~tillt] [~wickman] [~thomasr] Comments on the above review/approach?

> Create a light-weight, executor only mesos egg
> --
>
> Key: MESOS-4090
> URL: https://issues.apache.org/jira/browse/MESOS-4090
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>
> Currently, when running tasks in docker containers, if the executor uses the 
> mesos.native python library, the execution environment inside the container 
> (OS, native libs, etc) must match the execution environment outside the 
> container fairly closely in order to load the mesos.so library.
> The solution here can be to introduce a much lighter weight python egg, 
> mesos.executor, which only includes code (and dependencies) needed to create 
> and run an MesosExecutorDriver.  Executors can then use this native library 
> instead of mesos.native.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4090) Create a light-weight, executor only mesos egg

2015-12-10 Thread Steve Niemitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051766#comment-15051766
 ] 

Steve Niemitz commented on MESOS-4090:
--

I like splitting them into multiple packages, since it allows greater isolation 
between the two different libraries.  It's pretty unlikely (imo) that someone 
would ever want to use both the scheduler and executor driver in the same 
package, and splitting them also makes the executor driver much smaller.

My plan for this that I spoke with Vinod about offline was to keep the native 
package around, but deprecate it, and then replace it with a scheduler package 
for the next release.

> Create a light-weight, executor only mesos egg
> --
>
> Key: MESOS-4090
> URL: https://issues.apache.org/jira/browse/MESOS-4090
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>
> Currently, when running tasks in docker containers, if the executor uses the 
> mesos.native python library, the execution environment inside the container 
> (OS, native libs, etc) must match the execution environment outside the 
> container fairly closely in order to load the mesos.so library.
> The solution here can be to introduce a much lighter weight python egg, 
> mesos.executor, which only includes code (and dependencies) needed to create 
> and run an MesosExecutorDriver.  Executors can then use this native library 
> instead of mesos.native.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4090) Create a light-weight, executor only mesos egg

2015-12-10 Thread Thomas Rampelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051789#comment-15051789
 ] 

Thomas Rampelberg commented on MESOS-4090:
--

sgtm :thumbsup:

> Create a light-weight, executor only mesos egg
> --
>
> Key: MESOS-4090
> URL: https://issues.apache.org/jira/browse/MESOS-4090
> Project: Mesos
>  Issue Type: Improvement
>  Components: build
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>
> Currently, when running tasks in docker containers, if the executor uses the 
> mesos.native python library, the execution environment inside the container 
> (OS, native libs, etc) must match the execution environment outside the 
> container fairly closely in order to load the mesos.so library.
> The solution here can be to introduce a much lighter weight python egg, 
> mesos.executor, which only includes code (and dependencies) needed to create 
> and run an MesosExecutorDriver.  Executors can then use this native library 
> instead of mesos.native.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4120) Make DiscoveryInfo dynamically updatable

2015-12-10 Thread Sargun Dhillon (JIRA)
Sargun Dhillon created MESOS-4120:
-

 Summary: Make DiscoveryInfo dynamically updatable
 Key: MESOS-4120
 URL: https://issues.apache.org/jira/browse/MESOS-4120
 Project: Mesos
  Issue Type: Improvement
Reporter: Sargun Dhillon
Priority: Critical


K8s tasks can dynamically update what they expose to make discoverable by the 
cluster. Unfortunately, all DiscoveryInfo the cluster is immutable, at the time 
of task start. 

We would like to enable DiscoveryInfo to be dynamically updatable, so that 
executors can change what they're advertising based on their internal state, 
versus requiring DiscoveryInfo to be known prior to starting the tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4120) Make DiscoveryInfo dynamically updatable

2015-12-10 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051817#comment-15051817
 ] 

James DeFelice edited comment on MESOS-4120 at 12/10/15 11:10 PM:
--

>From a K8s integration perspective it's preferable that some sidecar framework 
>component could update the task's DiscoveryInfo vs. putting all of that 
>responsibility on the executor.


was (Author: jdef):
>From a K8s integration perspective it's preferable that some sidecar framework 
>component could update the a task's DiscoveryInfo vs. putting all of that 
>responsibility on the executor.

> Make DiscoveryInfo dynamically updatable
> 
>
> Key: MESOS-4120
> URL: https://issues.apache.org/jira/browse/MESOS-4120
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: mesosphere
>
> K8s tasks can dynamically update what they expose to make discoverable by the 
> cluster. Unfortunately, all DiscoveryInfo the cluster is immutable, at the 
> time of task start. 
> We would like to enable DiscoveryInfo to be dynamically updatable, so that 
> executors can change what they're advertising based on their internal state, 
> versus requiring DiscoveryInfo to be known prior to starting the tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4120) Make DiscoveryInfo dynamically updatable

2015-12-10 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051817#comment-15051817
 ] 

James DeFelice commented on MESOS-4120:
---

>From a K8s integration perspective it's preferable that some sidecar framework 
>component could update the a task's DiscoveryInfo vs. putting all of that 
>responsibility on the executor.

> Make DiscoveryInfo dynamically updatable
> 
>
> Key: MESOS-4120
> URL: https://issues.apache.org/jira/browse/MESOS-4120
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: mesosphere
>
> K8s tasks can dynamically update what they expose to make discoverable by the 
> cluster. Unfortunately, all DiscoveryInfo the cluster is immutable, at the 
> time of task start. 
> We would like to enable DiscoveryInfo to be dynamically updatable, so that 
> executors can change what they're advertising based on their internal state, 
> versus requiring DiscoveryInfo to be known prior to starting the tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4121) initialize crypto independently of SSL

2015-12-10 Thread James Peach (JIRA)
James Peach created MESOS-4121:
--

 Summary: initialize crypto independently of SSL
 Key: MESOS-4121
 URL: https://issues.apache.org/jira/browse/MESOS-4121
 Project: Mesos
  Issue Type: Improvement
Reporter: James Peach


I am writing a module that does some crypto and I'd like to be able to use the 
OpenSSL crypto APIs that are linked into Mesos without necessarily having to 
enable SSL at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4122) slave should ignore attribute changes

2015-12-10 Thread James Peach (JIRA)
James Peach created MESOS-4122:
--

 Summary: slave should ignore attribute changes
 Key: MESOS-4122
 URL: https://issues.apache.org/jira/browse/MESOS-4122
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Reporter: James Peach
Priority: Minor


{{mesos-slave}} should ignore changes in attributes when it checks for 
incompatible {{SlaveInfo}} changes.

This is a trivial change and I'm going to carry this patch internally. Let's 
have a discussion on what this means semantically. It is not clear to me 
whether it is a generally correct change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4123) Enable agent/master know resource type is USAGE_SLACK for QoS Controller related resources

2015-12-10 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-4123:
--

 Summary: Enable agent/master know resource type is USAGE_SLACK for 
QoS Controller related resources
 Key: MESOS-4123
 URL: https://issues.apache.org/jira/browse/MESOS-4123
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu


Now the master/agent have endpoint to get all revocable resources but the 
current revocable resources are only for QoS controller.

The current use of those resource are only for some display issues, but we may 
need those APIs in future such as MESOS-2647 , it need to calculate if there 
are enough usage_slack revocable resources before launch a task.

So I think that we need to update the helper functions of
{code}_resources_revocable_total{code}
{code}_resources_revocable_used{code}
{code}_resources_revocable_percent{code}
to only get usage_slack revocable resources .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4123) Enable agent/master know resource type is USAGE_SLACK for QoS Controller related resources

2015-12-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052036#comment-15052036
 ] 

Guangya Liu commented on MESOS-4123:


To keep compatibility issue, I think that we can add some new helper functions 
to get usage_slack resources in both master and agent. 

> Enable agent/master know resource type is USAGE_SLACK for QoS Controller 
> related resources
> --
>
> Key: MESOS-4123
> URL: https://issues.apache.org/jira/browse/MESOS-4123
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>
> Now the master/agent have endpoint to get all revocable resources but the 
> current revocable resources are only for QoS controller.
> The current use of those resource are only for some display issues, but we 
> may need those APIs in future such as MESOS-2647 , it need to calculate if 
> there are enough usage_slack revocable resources before launch a task.
> So I think that we need to update the helper functions of
> {code}_resources_revocable_total{code}
> {code}_resources_revocable_used{code}
> {code}_resources_revocable_percent{code}
> to only get usage_slack revocable resources .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4124) Enable agent/master know resource type is ALLOCATION_SLACK

2015-12-10 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-4124:
--

 Summary: Enable agent/master know resource type is ALLOCATION_SLACK
 Key: MESOS-4124
 URL: https://issues.apache.org/jira/browse/MESOS-4124
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


Add some helper function in both master and slave to get those allocation slack 
resources. Those helper functions can be used by slave checking and 
master/slave endpoint checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4125) Use 'git rev-parse --git-dir' in bootstrap instead of simply '.git'

2015-12-10 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-4125:
--

 Summary: Use 'git rev-parse --git-dir' in bootstrap instead of 
simply '.git'
 Key: MESOS-4125
 URL: https://issues.apache.org/jira/browse/MESOS-4125
 Project: Mesos
  Issue Type: Improvement
  Components: build
 Environment: All systems
Reporter: Kevin Klues
Assignee: Kevin Klues
Priority: Minor


This issue relates to the 'bootstrap' file in the top level directory of the 
mesos tree.

When building from git, bootstrap will (among other things) install pre-commit 
and post-rewirte hooks into the .git/hooks directory of the mesos tree.  
However the current implementation always assumes that .git exists in the same 
directory as the bootstrap file.  This may not always be true.

Most notably, it is not true if the mesos tree is included as a submodule 
inside another project. When included as a submodule, .git is no longer a 
directory, but rather a file whose text contains a pointer back to the actual 
location of the .git folder inside the containing project.  To get at this 
directory, we need to run 'git rev-parse --git-dir' instead of simply assuming 
that the local .git is the proper directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4025) SlaveRecoveryTest/0.GCExecutor is flaky.

2015-12-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052125#comment-15052125
 ] 

haosdent commented on MESOS-4025:
-

Thank you very much for your help [~nfnt] [~greggomann]. Let me check whether 
ROOT_DOCKER_DockerHealthStatusChange break SlaveRecoveryTest

> SlaveRecoveryTest/0.GCExecutor is flaky.
> 
>
> Key: MESOS-4025
> URL: https://issues.apache.org/jira/browse/MESOS-4025
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Till Toenshoff
>Assignee: Jan Schlicht
>  Labels: flaky, flaky-test, test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN  ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @  0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
> @ 0x7f1be92b80b7 os::Linux::chained_handler()
> @ 0x7f1be92bc219 JVM_handle_linux_signal
> @ 0x7f1bf7bbc340 (unknown)
> @  0x1443e9a testing::UnitTest::AddTestPartResult()
> @  0x1438b99 testing::internal::AssertHelper::operator=()
> @   0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
> @  0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x143de4a testing::Test::Run()
> @  0x143e584 testing::TestInfo::Run()
> @  0x143ebca testing::TestCase::Run()
> @  0x1445312 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14440ae testing::UnitTest::Run()
> @   0xd15cd4 RUN_ALL_TESTS()
> @   0xd158c1 main
> @ 0x7f1bf7808ec5 (unknown)
> @   0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
> vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
> vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
> vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
> vb.memory = ${VAGRANT_MEM}
> vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
> sudo apt-get update
> sudo apt-get -y install openjdk-7-jdk autoconf libtool
> sudo apt-get -y install build-essential python-dev python-boto  \
> libcurl4-nss-dev libsasl2-dev maven \
> libapr1-dev libsvn-dev libssl-dev libevent-dev
> sudo apt-get -y install git
> sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2203) Old Centos 6.5 kernels/headers not sufficient for building Mesos

2015-12-10 Thread fhyfufangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052163#comment-15052163
 ] 

fhyfufangyu commented on MESOS-2203:


Yes,this works.

> Old Centos 6.5 kernels/headers not sufficient for building Mesos
> 
>
> Key: MESOS-2203
> URL: https://issues.apache.org/jira/browse/MESOS-2203
> Project: Mesos
>  Issue Type: Documentation
>Affects Versions: 0.21.0
> Environment: E.g. Centos 6.5 with kernel 2.6.32-279.14.1 
>Reporter: Hans van den Bogert
>Priority: Minor
>
> Old kernels are not sufficient for building Mesos:
> bq. 
> Error:
> bq. libtool: compile:  g++ -DPACKAGE_NAME=\"mesos\" 
> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" 
> "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" 
> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 
> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 
> -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
> -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 
> -DHAVE_PYTHON=\"2.6\" -DMESOS_HAS_PYTHON=1 -I. -I../../src -Wall -Werror 
> -DLIBDIR=\"/var/scratch/vdbogert/lib\" 
> -DPKGLIBEXECDIR=\"/var/scratch/vdbogert/libexec/mesos\" 
> -DPKGDATADIR=\"/var/scratch/vdbogert/share/mesos\" -I../../include 
> -I../../3rdparty/libprocess/include 
> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
> -I/var/scratch/vdbogert//include/subversion-1 -I/usr/include/apr-1 
> -I/usr/include/apr-1.0 -pthread -g1 -O0 -Wno-unused-local-typedefs -std=c++11 
> -MT slave/containerizer/isolators/namespaces/libmesos_no_3rdparty_la-pid.lo 
> -MD -MP -MF 
> slave/containerizer/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
>  -c ../../src/slave/containerizer/isolators/namespaces/pid.cpp  -fPIC -DPIC 
> -o 
> slave/containerizer/isolators/namespaces/.libs/libmesos_no_3rdparty_la-pid.o
> In file included from /usr/include/sys/syscall.h:32:0,
>  from ../../src/linux/ns.hpp:26,
>  from 
> ../../src/slave/containerizer/isolators/namespaces/pid.cpp:31:
> ../../src/linux/ns.hpp: In function 'Try ns::setns(const string&, 
> const string&)':
> ../../src/linux/ns.hpp:167:23: error: '__NR_setns' was not declared in this 
> scope
>int ret = ::syscall(SYS_setns, fd.get(), nstype.get());
>^
> Perhaps this should be stated on:
> http://mesos.apache.org/gettingstarted/ because taking myself as example, 
> this has cost me a lot of time to pinpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)