[jira] [Commented] (MESOS-6183) mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images

2016-09-16 Thread yongyu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498298#comment-15498298
 ] 

yongyu commented on MESOS-6183:
---

I am launching docker.
I use docker daemon with "--insecure-registry :5000" to launch docker, it 
is ok.
But i use mesos Containerizer to lauch docker, it can not support private 
registry.

> mesos's Unified Containerizer cannot set "--insecure-registry" when 
> provisioning images
> ---
>
> Key: MESOS-6183
> URL: https://issues.apache.org/jira/browse/MESOS-6183
> Project: Mesos
>  Issue Type: Bug
>Reporter: yongyu
>Priority: Minor
>
> mesos's Unified Containerizer cannot set "--insecure-registry" when 
> provisioning images



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498217#comment-15498217
 ] 

Aaron Wood commented on MESOS-6127:
---

Thanks, that would be great!
Looks like http-parser still doesn't support HTTP/2.

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early

2016-09-16 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497951#comment-15497951
 ] 

Greg Mann commented on MESOS-6180:
--

Thanks for the patch to address the mount leak [~jieyu]! 
(https://reviews.apache.org/r/51963/)

I ran {{sudo MESOS_VERBOSE=1 GLOG_v=2 GTEST_REPEAT=-1 GTEST_BREAK_ON_FAILURE=1 
GTEST_FILTER="*MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespace*" 
bin/mesos-tests.sh}} and stressed my machine with {{stress -c N -i N -m N -d 
1}}, where {{N}} is number of cores, and I was able to reproduce a couple of 
these offer future timeout failures after a few tens of repetitions. I attached 
logs above as {{flaky-containerizer-pid-namespace-forward.txt}} and 
{{flaky-containerizer-pid-namespace-backward.txt}}.

We can see the master beginning agent registration, but we never see the line 
{{Registered agent ...}} from {{Master::_registerSlave()}}, which indicates 
that registration is complete and the registered message has been sent to the 
agent:
{code}
I0917 01:35:17.184216   480 master.cpp:4886] Registering agent at 
slave(11)@172.31.1.104:57341 (ip-172-31-1-104.us-west-2.compute.internal) with 
id fa7a42d0-5d0c-4799-b19f-2a85b43039f3-S0
I0917 01:35:17.184232   474 process.cpp:2707] Resuming 
__reaper__(1)@172.31.1.104:57341 at 2016-09-17 01:35:17.184222976+00:00
I0917 01:35:17.184377   474 process.cpp:2707] Resuming 
registrar(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.184371968+00:00
I0917 01:35:17.184554   474 registrar.cpp:464] Applied 1 operations in 79217ns; 
attempting to update the registry
I0917 01:35:17.184953   474 process.cpp:2697] Spawned process 
__latch__(141)@172.31.1.104:57341
I0917 01:35:17.184990   485 process.cpp:2707] Resuming 
log-storage(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.184982016+00:00
I0917 01:35:17.185561   485 process.cpp:2707] Resuming 
log-writer(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.185552896+00:00
I0917 01:35:17.185609   485 log.cpp:577] Attempting to append 434 bytes to the 
log
I0917 01:35:17.185804   485 process.cpp:2707] Resuming 
log-coordinator(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.185797888+00:00
I0917 01:35:17.185863   485 coordinator.cpp:348] Coordinator attempting to 
write APPEND action at position 3
I0917 01:35:17.185998   485 process.cpp:2697] Spawned process 
log-write(29)@172.31.1.104:57341
I0917 01:35:17.186030   475 process.cpp:2707] Resuming 
log-write(29)@172.31.1.104:57341 at 2016-09-17 01:35:17.186021888+00:00
I0917 01:35:17.186189   475 process.cpp:2707] Resuming 
log-network(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.186182912+00:00
I0917 01:35:17.186275   475 process.cpp:2707] Resuming 
log-write(29)@172.31.1.104:57341 at 2016-09-17 01:35:17.186267904+00:00
I0917 01:35:17.186424   475 process.cpp:2707] Resuming 
log-network(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.186416896+00:00
I0917 01:35:17.186575   475 process.cpp:2697] Spawned process 
__req_res__(55)@172.31.1.104:57341
I0917 01:35:17.186724   475 process.cpp:2707] Resuming 
log-write(29)@172.31.1.104:57341 at 2016-09-17 01:35:17.186717952+00:00
I0917 01:35:17.186609   485 process.cpp:2707] Resuming 
__req_res__(55)@172.31.1.104:57341 at 2016-09-17 01:35:17.186601984+00:00
I0917 01:35:17.186898   485 process.cpp:2707] Resuming 
log-replica(6)@172.31.1.104:57341 at 2016-09-17 01:35:17.186892032+00:00
I0917 01:35:17.186962   485 replica.cpp:537] Replica received write request for 
position 3 from __req_res__(55)@172.31.1.104:57341
I0917 01:35:17.185014   471 process.cpp:2707] Resuming 
__gc__@172.31.1.104:57341 at 2016-09-17 01:35:17.185008896+00:00
I0917 01:35:17.185036   480 process.cpp:2707] Resuming 
__latch__(141)@172.31.1.104:57341 at 2016-09-17 01:35:17.185029120+00:00
I0917 01:35:17.196358   482 process.cpp:2707] Resuming 
slave(11)@172.31.1.104:57341 at 2016-09-17 01:35:17.196335104+00:00
I0917 01:35:17.196900   482 slave.cpp:1471] Will retry registration in 
25.224033ms if necessary
I0917 01:35:17.197029   482 process.cpp:2707] Resuming 
master@172.31.1.104:57341 at 2016-09-17 01:35:17.197024000+00:00
I0917 01:35:17.197157   482 master.cpp:4874] Ignoring register agent message 
from slave(11)@172.31.1.104:57341 (ip-172-31-1-104.us-west-2.compute.internal) 
as admission is already in progress
I0917 01:35:17.224309   482 process.cpp:2707] Resuming 
slave(11)@172.31.1.104:57341 at 2016-09-17 01:35:17.224284928+00:00
I0917 01:35:17.224845   482 slave.cpp:1471] Will retry registration in 
63.510932ms if necessary
I0917 01:35:17.224900   475 process.cpp:2707] Resuming 
master@172.31.1.104:57341 at 2016-09-17 01:35:17.224888064+00:00
I0917 01:35:17.225109   475 master.cpp:4874] Ignoring register agent message 
from slave(11)@172.31.1.104:57341 (ip-172-31-1-104.us-west-2.compute.internal) 
as admission is already in progress
{code}

> Several tests are flaky, with futures timing out early
> ---

[jira] [Updated] (MESOS-6180) Several tests are flaky, with futures timing out early

2016-09-16 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6180:
-
Attachment: flaky-containerizer-pid-namespace-forward.txt
flaky-containerizer-pid-namespace-backward.txt

> Several tests are flaky, with futures timing out early
> --
>
> Key: MESOS-6180
> URL: https://issues.apache.org/jira/browse/MESOS-6180
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Greg Mann
>Assignee: haosdent
>  Labels: mesosphere, tests
> Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, 
> CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log, 
> flaky-containerizer-pid-namespace-backward.txt, 
> flaky-containerizer-pid-namespace-forward.txt
>
>
> Following the merging of a large patch chain, it was noticed on our internal 
> CI that several tests had become flaky, with a similar pattern in the 
> failures: the tests fail early when a future times out. Often, this occurs 
> when a test cluster is being spun up and one of the offer futures times out. 
> This has been observed in the following tests:
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
> * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch
> * RoleTest.ImplicitRoleRegister
> * SlaveRecoveryTest/0.MultipleFrameworks
> * SlaveRecoveryTest/0.ReconcileShutdownFramework
> * SlaveTest.ContainerizerUsageFailure
> * MesosSchedulerDriverTest.ExplicitAcknowledgements
> * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164)
> * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165)
> * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166)
> See the linked JIRAs noted above for individual tickets addressing a couple 
> of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5821) Clean up the billions of compiler warnings on MSVC

2016-09-16 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497843#comment-15497843
 ] 

Joseph Wu commented on MESOS-5821:
--

These two brought the ASF Windows CI's warnings from
{code}
4478 Warning(s)
{code}
to 
{code}
3671 Warning(s)
{code}

Admittedly, this is a "dirty" build (so not all the warnings show up), but good 
progress nonetheless:
https://builds.apache.org/job/Mesos-Windows/523/

> Clean up the billions of compiler warnings on MSVC
> --
>
> Key: MESOS-5821
> URL: https://issues.apache.org/jira/browse/MESOS-5821
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Alex Clemmer
>Assignee: Daniel Pravat
>  Labels: mesosphere, slave
>
> Clean builds of Mesos on Windows will result in approximately {{5800 
> Warning(s)}} or more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497790#comment-15497790
 ] 

Anand Mazumdar edited comment on MESOS-6127 at 9/17/16 12:18 AM:
-

I worked on trying to add HTTP2 support partially at this years MesosCon NA 
hackathon.

You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 
etc. as you had pointed out earlier. The one which we would eventually choose 
should be ideally weighted upon in the design doc. Currently, Mesos uses the 
[http-parser|https://github.com/nodejs/http-parser] library which does not 
understand HTTP2 the last time I had a look at it.

I would be happy to shepherd the work for putting the design document in place 
for sharing it with the community.


was (Author: anandmazumdar):
I tried to work on trying to add HTTP2 support partially at this years MesosCon 
NA hackathon.

You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 
etc. as you had pointed out earlier. The one which we would eventually choose 
should be ideally weighted upon in the design doc. Currently, Mesos uses the 
[http-parser|https://github.com/nodejs/http-parser] library which does not 
understand HTTP2 the last time I had a look at it.

I would be happy to shepherd the work for putting the design document in place 
for sharing it with the community.

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497790#comment-15497790
 ] 

Anand Mazumdar commented on MESOS-6127:
---

I tried to work on trying to add HTTP2 support partially at this years MesosCon 
NA hackathon.

You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 
etc. as you had pointed out earlier. The one which we would eventually choose 
should be ideally be weighted upon in the design doc. Currently, Mesos uses the 
[http-parser|https://github.com/nodejs/http-parser] library which does not 
understand HTTP2 the last time I had a look at it.

I would be happy to shepherd the work for putting the design document in place 
for sharing it with the community.

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497790#comment-15497790
 ] 

Anand Mazumdar edited comment on MESOS-6127 at 9/17/16 12:15 AM:
-

I tried to work on trying to add HTTP2 support partially at this years MesosCon 
NA hackathon.

You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 
etc. as you had pointed out earlier. The one which we would eventually choose 
should be ideally weighted upon in the design doc. Currently, Mesos uses the 
[http-parser|https://github.com/nodejs/http-parser] library which does not 
understand HTTP2 the last time I had a look at it.

I would be happy to shepherd the work for putting the design document in place 
for sharing it with the community.


was (Author: anandmazumdar):
I tried to work on trying to add HTTP2 support partially at this years MesosCon 
NA hackathon.

You would still need a library that can understand/parse HTTP2 e.g., libnghttp2 
etc. as you had pointed out earlier. The one which we would eventually choose 
should be ideally be weighted upon in the design doc. Currently, Mesos uses the 
[http-parser|https://github.com/nodejs/http-parser] library which does not 
understand HTTP2 the last time I had a look at it.

I would be happy to shepherd the work for putting the design document in place 
for sharing it with the community.

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5821) Clean up the billions of compiler warnings on MSVC

2016-09-16 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497785#comment-15497785
 ] 

Joseph Wu edited comment on MESOS-5821 at 9/17/16 12:10 AM:


{code}
commit 862da54368841adf23be83e0eddd050b20733948
Author: Daniel Pravat 
Date:   Fri Sep 16 17:03:32 2016 -0700

Windows: Disabled some deprecated function warnings.

Visual Studio emits warnings for using deprecated functions in CRT
and the use of insecure functions in CRT. This commit supresses the
warning generation temporarily.

Review: https://reviews.apache.org/r/51860/
{code}
{code}
commit 45fea210711c7a2e3c0e2cdacd5ca29d07453888
Author: Daniel Pravat 
Date:   Fri Sep 16 17:06:16 2016 -0700

Windows: Removed macro redefinition.

The `__STRINGIZE` macro is already defined in Visual Studio headers.

Review: https://reviews.apache.org/r/51861/
{code}


was (Author: kaysoky):
{code}
commit 862da54368841adf23be83e0eddd050b20733948
Author: Daniel Pravat 
Date:   Fri Sep 16 17:03:32 2016 -0700

Windows: Disabled some deprecated function warnings.

Visual Studio emits warnings for using deprecated functions in CRT
and the use of insecure functions in CRT. This commit supresses the
warning generation temporarily.

Review: https://reviews.apache.org/r/51860/
{code}
{code}
commit 45fea210711c7a2e3c0e2cdacd5ca29d07453888
Author: Daniel Pravat 
Date:   Fri Sep 16 17:06:16 2016 -0700

Windows: Removed macro redefinition.

The `__STRINGIZE` macro is already defined in Visual Studio headers.

Review: https://reviews.apache.org/r/51861/
[code}

> Clean up the billions of compiler warnings on MSVC
> --
>
> Key: MESOS-5821
> URL: https://issues.apache.org/jira/browse/MESOS-5821
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Alex Clemmer
>Assignee: Daniel Pravat
>  Labels: mesosphere, slave
>
> Clean builds of Mesos on Windows will result in approximately {{5800 
> Warning(s)}} or more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5821) Clean up the billions of compiler warnings on MSVC

2016-09-16 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497785#comment-15497785
 ] 

Joseph Wu commented on MESOS-5821:
--

{code}
commit 862da54368841adf23be83e0eddd050b20733948
Author: Daniel Pravat 
Date:   Fri Sep 16 17:03:32 2016 -0700

Windows: Disabled some deprecated function warnings.

Visual Studio emits warnings for using deprecated functions in CRT
and the use of insecure functions in CRT. This commit supresses the
warning generation temporarily.

Review: https://reviews.apache.org/r/51860/
{code}
{code}
commit 45fea210711c7a2e3c0e2cdacd5ca29d07453888
Author: Daniel Pravat 
Date:   Fri Sep 16 17:06:16 2016 -0700

Windows: Removed macro redefinition.

The `__STRINGIZE` macro is already defined in Visual Studio headers.

Review: https://reviews.apache.org/r/51861/
[code}

> Clean up the billions of compiler warnings on MSVC
> --
>
> Key: MESOS-5821
> URL: https://issues.apache.org/jira/browse/MESOS-5821
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Alex Clemmer
>Assignee: Daniel Pravat
>  Labels: mesosphere, slave
>
> Clean builds of Mesos on Windows will result in approximately {{5800 
> Warning(s)}} or more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497649#comment-15497649
 ] 

Aaron Wood commented on MESOS-6127:
---

Sounds good. If you think we should start with HTTP/2 and leave gRPC for later 
what are your thoughts on using libnghttp2_asio vs. implementing support 
directly into what exists in http.cpp and http.hpp? There seems to be a lot of 
custom implementation.

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497621#comment-15497621
 ] 

Aaron Wood edited comment on MESOS-6127 at 9/16/16 10:53 PM:
-

Also, should these changes go upstream https://github.com/3rdparty/libprocess 
and/or directly in Mesos?


was (Author: aaronjwood):
Also, should these changes go upstream https://github.com/3rdparty/libprocess 
and/or directly in Mesos?

[~vinodkone] if you think we should start with HTTP/2 and leave gRPC for later 
what are your thoughts on using libnghttp2_asio vs. implementing support 
directly into what exists in http.cpp and http.hpp?

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497621#comment-15497621
 ] 

Aaron Wood edited comment on MESOS-6127 at 9/16/16 10:53 PM:
-

Also, should these changes go upstream https://github.com/3rdparty/libprocess 
and/or directly in Mesos?

[~vinodkone] if you think we should start with HTTP/2 and leave gRPC for later 
what are your thoughts on using libnghttp2_asio vs. implementing support 
directly into what exists in http.cpp and http.hpp?


was (Author: aaronjwood):
Also, should these changes go upstream https://github.com/3rdparty/libprocess 
and/or directly in Mesos?

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6181) The logic for BadACLNoPrincipal and BadACLDropCreateAndDestroy is not correct

2016-09-16 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497632#comment-15497632
 ] 

Guangya Liu commented on MESOS-6181:


 cc [~greggomann] 

> The logic for BadACLNoPrincipal and BadACLDropCreateAndDestroy is not correct
> -
>
> Key: MESOS-6181
> URL: https://issues.apache.org/jira/browse/MESOS-6181
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> Two issues for those two test cases:
> 1) No need to add `{}` in the test case as there is no need to add `{}`, 
> adding the `{}` will cause the driver decline a non exist offer.
> 2) If destroy volume failed, we should get the last offer to make sure that 
> the last offer also contain the volume resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497625#comment-15497625
 ] 

Vinod Kone commented on MESOS-6127:
---

Directly to Mesos. The 3rdparty repo is not up-to-date.

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497624#comment-15497624
 ] 

Vinod Kone commented on MESOS-6127:
---

There is no standard format, but you can look at few examples: 
https://github.com/apache/mesos/blob/6f970a7badacf16953ebbc2c72c6ae7eb5e662e2/docs/design-docs.md

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497621#comment-15497621
 ] 

Aaron Wood commented on MESOS-6127:
---

Also, should these changes go upstream https://github.com/3rdparty/libprocess 
and/or directly in Mesos?

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Aaron Wood (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497593#comment-15497593
 ] 

Aaron Wood commented on MESOS-6127:
---

Yes, definitely. Is there a defined design document process in place for 
Apache/Mesos?

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6189) Add a virtual method to Isolator to indicate if it supports nesting.

2016-09-16 Thread Jie Yu (JIRA)
Jie Yu created MESOS-6189:
-

 Summary: Add a virtual method to Isolator to indicate if it 
supports nesting.
 Key: MESOS-6189
 URL: https://issues.apache.org/jira/browse/MESOS-6189
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6127) Implement suppport for HTTP/2

2016-09-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497438#comment-15497438
 ] 

Vinod Kone commented on MESOS-6127:
---

It is not trivial to update libprocess (the internal communication library that 
mesos uses) to speak gRPC. It might be relatively easier to add support for 
HTTP/2 into libprocess. Note that libprocess already supports SSL/TLS when used 
in conjunction with libevent. Either way, this needs an extensive design doc. 
[~aaronjwood] Is this something you are interested in working on?

> Implement suppport for HTTP/2
> -
>
> Key: MESOS-6127
> URL: https://issues.apache.org/jira/browse/MESOS-6127
> Project: Mesos
>  Issue Type: Epic
>  Components: HTTP API, libprocess
>Reporter: Aaron Wood
>  Labels: performance
>
> HTTP/2 will allow us to take advantage of connection multiplexing, header 
> compression, streams, server push, etc. Add support for communication over 
> HTTP/2 between masters and agents, framework endpoints, etc.
> Should we support HTTP/2 without TLS? The spec allows for this but most major 
> browser vendors, libraries, and implementations aren't supporting it unless 
> TLS is used. If we do require TLS, what can be done to reduce the performance 
> hit of the TLS handshake? Might need to change more code to make sure that we 
> are taking advantage of connection sharing so that we can (ideally) only ever 
> have a one-time TLS handshake per shared connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6183) mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images

2016-09-16 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6183:
-
   Flags:   (was: Important)
Priority: Minor  (was: Major)

The Mesos containerizer doesn't have an option/flag for insecure registries.  

What task are you launching?

> mesos's Unified Containerizer cannot set "--insecure-registry" when 
> provisioning images
> ---
>
> Key: MESOS-6183
> URL: https://issues.apache.org/jira/browse/MESOS-6183
> Project: Mesos
>  Issue Type: Bug
>Reporter: yongyu
>Priority: Minor
>
> mesos's Unified Containerizer cannot set "--insecure-registry" when 
> provisioning images



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6188) Make the `gpu/nvidia` isolator nesting aware

2016-09-16 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6188:
--

 Summary: Make the `gpu/nvidia` isolator nesting aware
 Key: MESOS-6188
 URL: https://issues.apache.org/jira/browse/MESOS-6188
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues
Assignee: Kevin Klues
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6187) "double free or corruption" with Java 8

2016-09-16 Thread Neil Conway (JIRA)
Neil Conway created MESOS-6187:
--

 Summary: "double free or corruption" with Java 8
 Key: MESOS-6187
 URL: https://issues.apache.org/jira/browse/MESOS-6187
 Project: Mesos
  Issue Type: Bug
 Environment: Linux archlinux.vagrant.vm 4.7.2-1-ARCH #1 SMP PREEMPT 
Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux
Reporter: Neil Conway


Observed running the unit tests on recent Arch Linux. Haven't repro'd yet; took 
~17 iterations of the entire test suite (~2 hours of runtime) before it 
occurred.

{{noformat}}
% ./src/mesos-tests --gtest_repeat=30 >>& ~/test_log.txt
*** Error in `/usr/lib/jvm/java-8-openjdk/bin/java': double free or corruption 
(fasttop): 0x7f102c00c760 ***
=== Backtrace: =
/usr/lib/libc.so.6(+0x70c4b)[0x7f1097f25c4b]
/usr/lib/libc.so.6(+0x76fe6)[0x7f1097f2bfe6]
/usr/lib/libc.so.6(+0x777de)[0x7f1097f2c7de]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2NS1_9_DeferredIDTcl4bindadsrSt8functionIFvT0_T1_EEclcvSF__Efp1_fp2_RKNS1_3PIDIT_EEMSJ_FvSC_SD_ET2_T3_EUliiE_E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE+0x31)[0x7f1035f838b1]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2NS1_9_DeferredIDTcl4bindadsrSt8functionIFvT0_T1_EEclcvSF__Efp1_fp2_RKNS1_3PIDIT_EEMSJ_FvSC_SD_ET2_T3_EUliiE_E10_M_managerERSt9_Any_dataRKST_St18_Manager_operation+0x9e)[0x7f1035f70598]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_baseD1Ev+0x33)[0x7f10357576f7]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt8functionIFviiEED1Ev+0x18)[0x7f1035f2e4b6]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt10_Head_baseILm0ESt8functionIFviiEELb0EED1Ev+0x18)[0x7f1035f2f5ce]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt11_Tuple_implILm0EJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2D1Ev+0x18)[0x7f1035f2f5ea]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt5tupleIJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2D1Ev+0x18)[0x7f1035f2f606]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES3_St12_PlaceholderILi1EES7_ILi2D1Ev+0x1c)[0x7f1035f2f626]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_base13_Base_managerISt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES5_St12_PlaceholderILi1EES9_ILi2E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE+0x29)[0x7f1035f83c31]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_base13_Base_managerISt5_BindIFSt7_Mem_fnIMSt8functionIFviiEEKFviiEES5_St12_PlaceholderILi1EES9_ILi2E10_M_managerERSt9_Any_dataRKSF_St18_Manager_operation+0x9e)[0x7f1035f70857]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt14_Function_baseD1Ev+0x33)[0x7f10357576f7]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZNSt8functionIFviiEED1Ev+0x18)[0x7f1035f2e4b6]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZN2os8internal15configureSignalEPKSt8functionIFviiEE+0x3b)[0x7f1035eebbe7]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZN5mesos8internal5slave5Slave10initializeEv+0x44af)[0x7f1035ef15c5]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(_ZN7process14ProcessManager6resumeEPNS_11ProcessBaseE+0x283)[0x7f1036b4552f]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(+0x4d86450)[0x7f1036b42450]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(+0x4d9450e)[0x7f1036b5050e]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(+0x4d944ab)[0x7f1036b504ab]
/home/vagrant/build-mesos-default-opts/src/.libs/libmesos-1.1.0.so(+0x4d9448a)[0x7f1036b5048a]
/usr/lib/libstdc++.so.6(+0xbb31f)[0x7f106eb1031f]
/usr/lib/libpthread.so.0(+0x7454)[0x7f1098881454]
/usr/lib/libc.so.6(clone+0x5f)[0x7f1097f9d7df]
=== Memory map: 
0040-00401000 r-xp  08:01 183769 
/usr/lib/jvm/java-8-openjdk/jre/bin/java
0060-00601000 rw-p  08:01 183769 
/usr/lib/jvm/java-8-openjdk/jre/bin/java
022b-022d1000 rw-p  00:00 0  [heap]
8320-8860 rw-p  00:00 0
8860-d660 ---p  00:00 0
d660-d900 rw-p  00:00 0
d900-1 ---p  00:00 0
1-1000a rw-p  00:00 0
1000a-14000 ---p  00:00 0
7f100400-7f1004026000 rw-p  00:00 0
7f1004026000-7f100800 ---p  00:00 0
7f100800-7f10080210

[jira] [Created] (MESOS-6186) Make the generic `cgroups` isolator nesting aware

2016-09-16 Thread Kevin Klues (JIRA)
Kevin Klues created MESOS-6186:
--

 Summary: Make the generic `cgroups` isolator nesting aware
 Key: MESOS-6186
 URL: https://issues.apache.org/jira/browse/MESOS-6186
 Project: Mesos
  Issue Type: Task
Reporter: Kevin Klues
Assignee: Kevin Klues
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6145) Isolator namespaces/pid is leaking mounts

2016-09-16 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15497064#comment-15497064
 ] 

Jie Yu commented on MESOS-6145:
---

Simplified the isolator:
https://reviews.apache.org/r/51963/

The bind mounts in the pid namespace isolator turns out to be
unnecessary as the linux launcher will use freezer to kill all tasks
anyway. It makes the isolator unnecessarily complex, and has a mount
leak bug (MESOS-6145). This patch removes all the unnecessary bind
mounts, making the isolator extremely simple.

> Isolator namespaces/pid is leaking mounts
> -
>
> Key: MESOS-6145
> URL: https://issues.apache.org/jira/browse/MESOS-6145
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, isolation, security
>Reporter: Stephan Erb
>Assignee: Jie Yu
>
> As the operator of a Mesos cluster, I would like every container/executor to 
> run in a single PID namespace, so that a task cannot see what else is running 
> on the same host.
> The existing {{namespaces/pid}} isolator seems to provide this feature. 
> However, it seems like it is leaking files. I have exactly one task running 
> currently, but there are still left overs from earlier invocations
> {code}
> vagrant@aurora:~/aurora$ ls -l /var/run/mesos/pidns/
> total 0
> -rw-r--r-- 1 root root 0 Aug 26 20:30 32b6e4c7-3d22-47ed-a350-9eb929daa241
> -rw-r--r-- 1 root root 0 Aug 26 20:30 7b812f00-4614-4016-a76c-ff78a175a1b0
> -rw-r--r-- 1 root root 0 Aug 26 20:24 d501829e-7cf8-40fb-a895-0ad3416da7dc
> -rw-r--r-- 1 root root 0 Aug 26 20:24 d56ca91f-eb72-426c-8bbb-f3239358a4ef
> -r--r--r-- 1 root root 0 Aug 26 20:35 fef9a109-de52-45f3-ae41-171de6495705
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early

2016-09-16 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496966#comment-15496966
 ] 

Vinod Kone commented on MESOS-6180:
---

Looking at `CGROUPS_ROOT_PidNamespaceForward` the TASK_LOST is expected because 
the test doesn't wait for TASK_RUNNING update before terminating the agent.

{quote}
  Future registerExecutorMessage =
FUTURE_MESSAGE(Eq(RegisterExecutorMessage().GetTypeName()), _, _);

  driver.launchTasks(offers1.get()[0].id(), {task1});

  AWAIT_READY(registerExecutorMessage);

  Future> containers = containerizer->containers();
  AWAIT_READY(containers);
  EXPECT_EQ(1u, containers.get().size());

  ContainerID containerId = *(containers.get().begin());

  // Stop the slave.
  slave.get()->terminate();

{quote}

> Several tests are flaky, with futures timing out early
> --
>
> Key: MESOS-6180
> URL: https://issues.apache.org/jira/browse/MESOS-6180
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Greg Mann
>Assignee: haosdent
>  Labels: mesosphere, tests
> Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, 
> CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log
>
>
> Following the merging of a large patch chain, it was noticed on our internal 
> CI that several tests had become flaky, with a similar pattern in the 
> failures: the tests fail early when a future times out. Often, this occurs 
> when a test cluster is being spun up and one of the offer futures times out. 
> This has been observed in the following tests:
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
> * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch
> * RoleTest.ImplicitRoleRegister
> * SlaveRecoveryTest/0.MultipleFrameworks
> * SlaveRecoveryTest/0.ReconcileShutdownFramework
> * SlaveTest.ContainerizerUsageFailure
> * MesosSchedulerDriverTest.ExplicitAcknowledgements
> * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164)
> * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165)
> * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166)
> See the linked JIRAs noted above for individual tickets addressing a couple 
> of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early

2016-09-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496724#comment-15496724
 ] 

haosdent commented on MESOS-6180:
-

Yep, the order of the log you mentioned is correct as well.

Let's split it to stdout and stderr.

{code:title=grep -v 'W:' (stdout)|borderStyle=solid}
[02:57:42] : [Step 10/10] [ RUN  ] 
SlaveRecoveryTest/0.ReconnectHTTPExecutor
[02:57:43] : [Step 10/10] Received SUBSCRIBED event
[02:57:43] : [Step 10/10] Subscribed executor on 
ip-172-30-2-23.mesosphere.io
[02:57:43] : [Step 10/10] Received LAUNCH event
[02:57:43] : [Step 10/10] Starting task c1ba3f0b-2f6a-46a1-b752-592394c6d726
[02:57:43] : [Step 10/10] 
/mnt/teamcity/work/4240ba9ddd0997c3/build/src/mesos-containerizer launch 
--command="{"shell":true,"value":"sleep 1000"}" --help="false" 
--unshare_namespace_mnt="false"
[02:57:43] : [Step 10/10] Forked command at 4653
[02:57:43] : [Step 10/10] Received ERROR event
[02:57:43] : [Step 10/10] Received ERROR event
[02:57:58] : [Step 10/10] ../../src/tests/slave_recovery_tests.cpp:510: 
Failure
[02:57:58] : [Step 10/10] Failed to wait 15secs for status
[02:57:58] : [Step 10/10] ../../src/tests/slave_recovery_tests.cpp:491: 
Failure
[02:57:58] : [Step 10/10] Actual function call count doesn't match 
EXPECT_CALL(sched, statusUpdate(_, _))...
[02:57:58] : [Step 10/10]  Expected: to be called at least once
[02:57:58] : [Step 10/10]Actual: never called - unsatisfied and 
active
[02:58:13] : [Step 10/10] ../../src/tests/cluster.cpp:560: Failure
[02:58:13] : [Step 10/10] Failed to wait 15secs for wait
[02:59:18] : [Step 10/10] [  FAILED  ] 
SlaveRecoveryTest/0.ReconnectHTTPExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer (95963 ms)
{code}

{code:title=grep 'W:' (stdout - 
SlaveRecoveryTest/0.RecoverStatusUpdateManager)|borderStyle=solid}
[02:59:18]W: [Step 10/10] I0915 02:57:42.726838 24222 
hierarchical.cpp:1770] No inverse offers to send out!
[02:59:18]W: [Step 10/10] I0915 02:57:42.726851 24222 
hierarchical.cpp:1271] Performed allocation for 1 agents in 80513ns
[02:59:18]W: [Step 10/10] I0915 02:57:42.929819 24218 slave.cpp:3521] 
Cleaning up un-reregistered executors
[02:59:18]W: [Step 10/10] I0915 02:57:42.929872 24218 slave.cpp:5197] 
Finished recovery
[02:59:18]W: [Step 10/10] I0915 02:57:42.930137 24218 slave.cpp:5369] 
Querying resource estimator for oversubscribable resources
[02:59:18]W: [Step 10/10] I0915 02:57:42.930229 24220 slave.cpp:5383] 
Received oversubscribable resources  from the resource estimator
[02:59:18]W: [Step 10/10] I0915 02:57:42.930289 24220 slave.cpp:911] New 
master detected at master@172.30.2.23:32968
[02:59:18]W: [Step 10/10] I0915 02:57:42.930301 24220 slave.cpp:970] 
Authenticating with master master@172.30.2.23:32968
[02:59:18]W: [Step 10/10] I0915 02:57:42.930315 24220 slave.cpp:981] Using 
default CRAM-MD5 authenticatee
[02:59:18]W: [Step 10/10] I0915 02:57:42.930336 24217 
status_update_manager.cpp:177] Pausing sending status updates
[02:59:18]W: [Step 10/10] I0915 02:57:42.930364 24220 slave.cpp:943] 
Detecting new master
[02:59:18]W: [Step 10/10] I0915 02:57:42.930382 24217 
authenticatee.cpp:121] Creating new client SASL connection
[02:59:18]W: [Step 10/10] I0915 02:57:42.930631 24216 master.cpp:6234] 
Authenticating slave(353)@172.30.2.23:32968
[02:59:18]W: [Step 10/10] I0915 02:57:42.930697 24219 
authenticator.cpp:414] Starting authentication session for 
crammd5-authenticatee(755)@172.30.2.23:32968
[02:59:18]W: [Step 10/10] I0915 02:57:42.930804 24218 authenticator.cpp:98] 
Creating new server SASL connection
[02:59:18]W: [Step 10/10] I0915 02:57:42.930964 24218 
authenticatee.cpp:213] Received SASL authentication mechanisms: CRAM-MD5
[02:59:18]W: [Step 10/10] I0915 02:57:42.930977 24218 
authenticatee.cpp:239] Attempting to authenticate with mechanism 'CRAM-MD5'
[02:59:18]W: [Step 10/10] I0915 02:57:42.931010 24218 
authenticator.cpp:204] Received SASL authentication start
[02:59:18]W: [Step 10/10] I0915 02:57:42.931037 24218 
authenticator.cpp:326] Authentication requires more steps
[02:59:18]W: [Step 10/10] I0915 02:57:42.931064 24218 
authenticatee.cpp:259] Received SASL authentication step
[02:59:18]W: [Step 10/10] I0915 02:57:42.931098 24218 
authenticator.cpp:232] Received SASL authentication step
[02:59:18]W: [Step 10/10] I0915 02:57:42.931109 24218 auxprop.cpp:109] 
Request to lookup properties for user: 'test-principal' realm: 
'ip-172-30-2-23.mesosphere.io' server FQDN: 'ip-172-30-2-23.mesosphere.io' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false
[02:59:18]W: [Step 10/10] I0915 02:57:42.931114 24218 auxprop.cpp:181] 
Looking up auxiliary property '*userPassw

[jira] [Updated] (MESOS-4431) Sharing of persistent volumes via reference counting

2016-09-16 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4431:
--
Affects Version/s: (was: 0.25.0)

> Sharing of persistent volumes via reference counting
> 
>
> Key: MESOS-4431
> URL: https://issues.apache.org/jira/browse/MESOS-4431
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: persistent-volumes
> Fix For: 1.1.0
>
>
> Add capability for specific resources to be shared amongst tasks within or 
> across frameworks/roles. Enable this functionality for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4431) Sharing of persistent volumes via reference counting

2016-09-16 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4431:
--
Fix Version/s: 1.1.0

> Sharing of persistent volumes via reference counting
> 
>
> Key: MESOS-4431
> URL: https://issues.apache.org/jira/browse/MESOS-4431
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: persistent-volumes
> Fix For: 1.1.0
>
>
> Add capability for specific resources to be shared amongst tasks within or 
> across frameworks/roles. Enable this functionality for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4325) Offer shareable resources to frameworks only if it is opted in

2016-09-16 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4325:
--
Affects Version/s: (was: 0.25.0)

> Offer shareable resources to frameworks only if it is opted in
> --
>
> Key: MESOS-4325
> URL: https://issues.apache.org/jira/browse/MESOS-4325
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>  Labels: external-volumes, persistent-volumes
> Fix For: 1.1.0
>
>
> Added a new capability SHAREABLE_RESOURCES that frameworks need to opt in if 
> they are interested in receiving shared resources in their offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early

2016-09-16 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496662#comment-15496662
 ] 

Greg Mann commented on MESOS-6180:
--

Thanks for the patches, [~haosd...@gmail.com]!! I'll review and do some testing 
this morning.

Regarding the interleaving: for example, in the log posted in MESOS-6164 we 
find the line:
{code}
Checkpointing framework pid 
'scheduler-26d5bb2d-7233-4725-9755-169f84aee769@172.30.2.23:32968' to 
'/mnt/teamcity/temp/buildTmp/SlaveRecoveryTest_0_RecoverStatusUpdateManager_w0ToCt/meta/slaves/d22b6309-24c3-422f-a501-a672e7c3e046-S0/frameworks/d22b6309-24c3-422f-a501-a672e7c3e046-/framework.pid'
{code}
which indicates that this output can be attributed to 
{{SlaveRecoveryTest.RecoverStatusUpdateManager}}. I think 
{{SlaveRecoveryTest.ReconnectHTTPExecutor}} begins much later with the line: 
{{I0915 02:57:42.981866 24202 cluster.cpp:157] Creating default 'local' 
authorizer}}.

> Several tests are flaky, with futures timing out early
> --
>
> Key: MESOS-6180
> URL: https://issues.apache.org/jira/browse/MESOS-6180
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Greg Mann
>Assignee: haosdent
>  Labels: mesosphere, tests
> Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, 
> CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log
>
>
> Following the merging of a large patch chain, it was noticed on our internal 
> CI that several tests had become flaky, with a similar pattern in the 
> failures: the tests fail early when a future times out. Often, this occurs 
> when a test cluster is being spun up and one of the offer futures times out. 
> This has been observed in the following tests:
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
> * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch
> * RoleTest.ImplicitRoleRegister
> * SlaveRecoveryTest/0.MultipleFrameworks
> * SlaveRecoveryTest/0.ReconcileShutdownFramework
> * SlaveTest.ContainerizerUsageFailure
> * MesosSchedulerDriverTest.ExplicitAcknowledgements
> * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164)
> * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165)
> * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166)
> See the linked JIRAs noted above for individual tickets addressing a couple 
> of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6185) Improve test coverage for shared persistent volumes.

2016-09-16 Thread Yan Xu (JIRA)
Yan Xu created MESOS-6185:
-

 Summary: Improve test coverage for shared persistent volumes.
 Key: MESOS-6185
 URL: https://issues.apache.org/jira/browse/MESOS-6185
 Project: Mesos
  Issue Type: Task
Affects Versions: 1.1.0
Reporter: Yan Xu
Assignee: Anindya Sinha


In addition to tests in MESOS-4431 we need to improve coverage on new code 
paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4325) Offer shareable resources to frameworks only if it is opted in

2016-09-16 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4325:
--
Fix Version/s: 1.1.0

> Offer shareable resources to frameworks only if it is opted in
> --
>
> Key: MESOS-4325
> URL: https://issues.apache.org/jira/browse/MESOS-4325
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>  Labels: external-volumes, persistent-volumes
> Fix For: 1.1.0
>
>
> Added a new capability SHAREABLE_RESOURCES that frameworks need to opt in if 
> they are interested in receiving shared resources in their offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4431) Support sharing of persistent volumes via shared resources.

2016-09-16 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4431:
--
Summary: Support sharing of persistent volumes via shared resources.  (was: 
Sharing of persistent volumes via reference counting)

> Support sharing of persistent volumes via shared resources.
> ---
>
> Key: MESOS-4431
> URL: https://issues.apache.org/jira/browse/MESOS-4431
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: persistent-volumes
> Fix For: 1.1.0
>
>
> Add capability for specific resources to be shared amongst tasks within or 
> across frameworks/roles. Enable this functionality for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4431) Sharing of persistent volumes via reference counting

2016-09-16 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4431:
--
Labels: persistent-volumes  (was: external-volumes persistent-volumes)

> Sharing of persistent volumes via reference counting
> 
>
> Key: MESOS-4431
> URL: https://issues.apache.org/jira/browse/MESOS-4431
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: persistent-volumes
> Fix For: 1.1.0
>
>
> Add capability for specific resources to be shared amongst tasks within or 
> across frameworks/roles. Enable this functionality for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4325) Offer shareable resources to frameworks only if it is opted in

2016-09-16 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4325:
--
Labels: persistent-volumes  (was: external-volumes persistent-volumes)

> Offer shareable resources to frameworks only if it is opted in
> --
>
> Key: MESOS-4325
> URL: https://issues.apache.org/jira/browse/MESOS-4325
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>  Labels: persistent-volumes
> Fix For: 1.1.0
>
>
> Added a new capability SHAREABLE_RESOURCES that frameworks need to opt in if 
> they are interested in receiving shared resources in their offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6184) Change health check to use childHooks to enter the namespaces of the container

2016-09-16 Thread haosdent (JIRA)
haosdent created MESOS-6184:
---

 Summary: Change health check to use childHooks to enter the 
namespaces of the container
 Key: MESOS-6184
 URL: https://issues.apache.org/jira/browse/MESOS-6184
 Project: Mesos
  Issue Type: Improvement
Reporter: haosdent
Assignee: haosdent


To perform health checks for tasks, we need to enter the corresponding 
namespaces of the container. For now health check use custom clone to implement 
this
{code}
  return process::defaultClone([=]() -> int {
if (taskPid.isSome()) {
  foreach (const string& ns, namespaces) {
Try setns = ns::setns(taskPid.get(), ns);
if (setns.isError()) {
  ...
}
  }
}
return func();
  });
{code}

After the childHooks patches merged, we could change the health check to use 
childHooks to call {{setns}} and make {{process::defaultClone}} private again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6183) mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images

2016-09-16 Thread yongyu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496498#comment-15496498
 ] 

yongyu commented on MESOS-6183:
---

I get the error:
Subscribed with ID 'fd64c22d-cca5-4a65-b275-fef05fc63730-0920'
Submitted task 'test_mesos' to agent 'fd64c22d-cca5-4a65-b275-fef05fc63730-S10'
Received status update TASK_FAILED for task 'test_mesos'
  message: 'Failed to launch container: Failed to perform 'curl': curl: (35) 
SSL received a record that exceeded the maximum permissible length.
; Container destroyed while provisioning images'
  source: SOURCE_AGENT
  reason: REASON_CONTAINER_LAUNCH_FAILED

> mesos's Unified Containerizer cannot set "--insecure-registry" when 
> provisioning images
> ---
>
> Key: MESOS-6183
> URL: https://issues.apache.org/jira/browse/MESOS-6183
> Project: Mesos
>  Issue Type: Bug
>Reporter: yongyu
>
> mesos's Unified Containerizer cannot set "--insecure-registry" when 
> provisioning images



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6183) mesos's Unified Containerizer cannot set "--insecure-registry" when provisioning images

2016-09-16 Thread yongyu (JIRA)
yongyu created MESOS-6183:
-

 Summary: mesos's Unified Containerizer cannot set 
"--insecure-registry" when provisioning images
 Key: MESOS-6183
 URL: https://issues.apache.org/jira/browse/MESOS-6183
 Project: Mesos
  Issue Type: Bug
Reporter: yongyu


mesos's Unified Containerizer cannot set "--insecure-registry" when 
provisioning images



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files

2016-09-16 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496439#comment-15496439
 ] 

Benjamin Bannier commented on MESOS-6182:
-

That is exactly what I tried, but since even on supported platforms like 
ubuntu-14 we try to add non-existant making this case fatal requires fixing the 
fallout (for some reason the existing test suite seems to run successfully even 
though some files are missing from the rootfs though :/).

Maybe [~gilbert] knowns if we can throw out stuff we don't need; otherwise we'd 
need to wait for the proper solution for MESOS-6011.

> LinuxRootfs::create ignores failures from adding non-existing files
> ---
>
> Key: MESOS-6182
> URL: https://issues.apache.org/jira/browse/MESOS-6182
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the 
> created rootfs. However, if a file does not exist no failure is created, but 
> the file will be missing from the rootfs.
> This can then lead to failures in tests using the rootfs and relying on files 
> in it.
> We should make failures to compose the planned rootfs explicit so users of 
> this test code know what they can rely on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files

2016-09-16 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496363#comment-15496363
 ] 

Alexander Rukletsov commented on MESOS-6182:


As a first thing, can we make {{LinuxRootfs::create()}} return an error in case 
{{os::realpath()}} returns {{None()}}, which indicates the file can't be found?

> LinuxRootfs::create ignores failures from adding non-existing files
> ---
>
> Key: MESOS-6182
> URL: https://issues.apache.org/jira/browse/MESOS-6182
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the 
> created rootfs. However, if a file does not exist no failure is created, but 
> the file will be missing from the rootfs.
> This can then lead to failures in tests using the rootfs and relying on files 
> in it.
> We should make failures to compose the planned rootfs explicit so users of 
> this test code know what they can rely on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files

2016-09-16 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496333#comment-15496333
 ] 

Benjamin Bannier commented on MESOS-6182:
-

Making failure to copy a file fatal in {{LinuxRootfs::create}} is technically 
not hard (just check all three states of the {{Result}} of {{realpath}}), but 
it looks like already the existing list of files to copy contains files which 
do not exists even on core supported distributions. On an ubuntu-14 image I get
{code}
for f in /bin/echo /usr/bin/bash /bin/ls /bin/sh /bin/sleep 
/lib/x86_64-linux-gnu /lib64/ld-linux-x86-64.so.2 /lib64/libc.so.6 
/lib64/libdl.so.2 /lib64/libtinfo.so.5 /lib64/libselinux.so.1 
/lib64/libpcre.so.1 /lib64/liblzma.so.5 /lib64/libpthread.so.0 
/lib64/libcap.so.2 /lib64/libacl.so.1 /lib64/libattr.so.1 /lib64/librt.so.1 
/etc/passwd; do ls -d $f; done
/bin/echo
ls: cannot access /usr/bin/bash: No such file or directory
/bin/ls
/bin/sh
/bin/sleep
/lib/x86_64-linux-gnu
/lib64/ld-linux-x86-64.so.2
ls: cannot access /lib64/libc.so.6: No such file or directory
ls: cannot access /lib64/libdl.so.2: No such file or directory
ls: cannot access /lib64/libtinfo.so.5: No such file or directory
ls: cannot access /lib64/libselinux.so.1: No such file or directory
ls: cannot access /lib64/libpcre.so.1: No such file or directory
ls: cannot access /lib64/liblzma.so.5: No such file or directory
ls: cannot access /lib64/libpthread.so.0: No such file or directory
ls: cannot access /lib64/libcap.so.2: No such file or directory
ls: cannot access /lib64/libacl.so.1: No such file or directory
ls: cannot access /lib64/libattr.so.1: No such file or directory
ls: cannot access /lib64/librt.so.1: No such file or directory
/etc/passwd
{code}

It looks like this problem requires a more general solution for MESOS-6011 
where we e.g., pick up executable paths from {{PATH}} and their dynamic 
dependencies from the ELF headers.

> LinuxRootfs::create ignores failures from adding non-existing files
> ---
>
> Key: MESOS-6182
> URL: https://issues.apache.org/jira/browse/MESOS-6182
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the 
> created rootfs. However, if a file does not exist no failure is created, but 
> the file will be missing from the rootfs.
> This can then lead to failures in tests using the rootfs and relying on files 
> in it.
> We should make failures to compose the planned rootfs explicit so users of 
> this test code know what they can rely on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6011) Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)

2016-09-16 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496280#comment-15496280
 ] 

Benjamin Bannier commented on MESOS-6011:
-

For dynamic dependencies it tooks like {{stout/elf.hpp}} added the necessary 
tooling for us to discover dynamic library dependencies from ELF headers.

> Tests relying on LinuxRootfs::create fail with Fedora 24 (and probably others)
> --
>
> Key: MESOS-6011
> URL: https://issues.apache.org/jira/browse/MESOS-6011
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, tests
>Reporter: Jan Schlicht
>Assignee: Gilbert Song
>  Labels: test
>
> Tests like {{AppcProvisionerIntegrationTest.ROOT_SimpleLinuxImageTest}} and 
> {{ProvisionerDockerPullerTest.ROOT_LocalPullerSimpleCommand}} will fail with 
> Fedora 24 because {{LinuxRootFs::create}}, used in these tests, assumes that 
> the binaries provided by the rootfs link to certain versions of shared 
> libraries. Because Fedora 24 has newer versions of some of these libraries, 
> tests using the binaries will fail. E.g.
> {noformat}
> $ ldd /bin/sh
>   linux-vdso.so.1 (0x7ffc98bfb000)
>   libtinfo.so.6 => /lib64/libtinfo.so.6 (0x7fcd59df6000)
>   libdl.so.2 => /lib64/libdl.so.2 (0x7fcd59bf2000)
>   libc.so.6 => /lib64/libc.so.6 (0x7fcd5982f000)
>   /lib64/ld-linux-x86-64.so.2 (0x55fb8e6ea000)
> {noformat}
> but {{LinuxRootFs::create}} will try to copy {{/lib64/libtinfo.so.5}} into 
> the rootfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files

2016-09-16 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-6182:
---

 Summary: LinuxRootfs::create ignores failures from adding 
non-existing files
 Key: MESOS-6182
 URL: https://issues.apache.org/jira/browse/MESOS-6182
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier


{{LinuxRootfs::create}} attempts to add a hardcoded list of files to the 
created rootfs. However, if a file does not exist no failure is created, but 
the file will be missing from the rootfs.

This can then lead to failures in tests using the rootfs and relying on files 
in it.

We should make failures to compose the planned rootfs explicit so users of this 
test code know what they can rely on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5951) Remove "strict registry" code

2016-09-16 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-5951:
---
Shepherd: Vinod Kone

> Remove "strict registry" code
> -
>
> Key: MESOS-5951
> URL: https://issues.apache.org/jira/browse/MESOS-5951
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Once {{PARTITION_AWARE}} frameworks are supported, we should eventually 
> remove the code that supports the "non-strict" semantics in the master. That 
> is:
> 1. The master will be "strict" in Mesos 1.1, in the sense that master 
> behavior will always reflect the content of the registry and will not change 
> depending on whether the master has failed over. The exception here is that 
> for non-PARTITION_AWARE frameworks, we will _only_ kill such tasks on a 
> reregistering agent if the master hasn't failed over in the meantime. i.e., 
> we'll remain backwards compatible with the previous "non-strict" semantics 
> that old frameworks might depend on.
> 2. The "strict" semantics will be less problematic, because the master will 
> no longer be killing tasks and shutting down agents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6182) LinuxRootfs::create ignores failures from adding non-existing files

2016-09-16 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496264#comment-15496264
 ] 

Benjamin Bannier commented on MESOS-6182:
-

Linking MESOS-6011 as an example of tests failing because 
{{LinuxRootfs::create}} does not create a deterministic rootfs.

> LinuxRootfs::create ignores failures from adding non-existing files
> ---
>
> Key: MESOS-6182
> URL: https://issues.apache.org/jira/browse/MESOS-6182
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> {{LinuxRootfs::create}} attempts to add a hardcoded list of files to the 
> created rootfs. However, if a file does not exist no failure is created, but 
> the file will be missing from the rootfs.
> This can then lead to failures in tests using the rootfs and relying on files 
> in it.
> We should make failures to compose the planned rootfs explicit so users of 
> this test code know what they can rely on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early

2016-09-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495932#comment-15495932
 ] 

haosdent commented on MESOS-6180:
-

[~greggomann] I use {{grep 'W:'}} and {{grep -v 'W:'}} to filter the 
stdout/stderr of MESOS-6164, MESOS-6165, and MESOS-6166. Looks like their log 
are not overlapping. Do you have some overlap examples that not meet this?

> Several tests are flaky, with futures timing out early
> --
>
> Key: MESOS-6180
> URL: https://issues.apache.org/jira/browse/MESOS-6180
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Greg Mann
>Assignee: haosdent
>  Labels: mesosphere, tests
> Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, 
> CGROUPS_ROOT_PidNamespaceForward.log, FetchAndStoreAndStoreAndFetch.log
>
>
> Following the merging of a large patch chain, it was noticed on our internal 
> CI that several tests had become flaky, with a similar pattern in the 
> failures: the tests fail early when a future times out. Often, this occurs 
> when a test cluster is being spun up and one of the offer futures times out. 
> This has been observed in the following tests:
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
> * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch
> * RoleTest.ImplicitRoleRegister
> * SlaveRecoveryTest/0.MultipleFrameworks
> * SlaveRecoveryTest/0.ReconcileShutdownFramework
> * SlaveTest.ContainerizerUsageFailure
> * MesosSchedulerDriverTest.ExplicitAcknowledgements
> * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164)
> * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165)
> * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166)
> See the linked JIRAs noted above for individual tickets addressing a couple 
> of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6149) Checkpoint used subsystems for containers

2016-09-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495794#comment-15495794
 ] 

haosdent commented on MESOS-6149:
-

Currently, Agent would clean up those orphan isolations when it re-enable again.

For example,

1. start Agent with {{--isolation=A,B}} and launch contaienr {{x}}
2. restart Agent with {{--isolation=A}} and destroy container {{x}}. Then {{x}} 
remain somethings need to be clean in {{B}} 
3, restart Agent with {{--isolation=B}}, Agent would clean up those leak things 
for {{x}} in the recovery stage of {{B}}.

> Checkpoint used subsystems for containers
> -
>
> Key: MESOS-6149
> URL: https://issues.apache.org/jira/browse/MESOS-6149
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>
> In MESOS-6063, we have tracked recovered and prepared subsystems for 
> containers. To make it works better, we could checkpoint this information and 
> recover it after Agent restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6181) The logic for BadACLNoPrincipal and BadACLDropCreateAndDestroy is not correct

2016-09-16 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-6181:
--

 Summary: The logic for BadACLNoPrincipal and 
BadACLDropCreateAndDestroy is not correct
 Key: MESOS-6181
 URL: https://issues.apache.org/jira/browse/MESOS-6181
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu


Two issues for those two test cases:

1) No need to add `{}` in the test case as there is no need to add `{}`, adding 
the `{}` will cause the driver decline a non exist offer.
2) If destroy volume failed, we should get the last offer to make sure that the 
last offer also contain the volume resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5384) Improve error message for missing resources file

2016-09-16 Thread Kris Paprocki (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495703#comment-15495703
 ] 

Kris Paprocki  commented on MESOS-5384:
---

Looking for a shepherd for this issue.

> Improve error message for missing resources file
> 
>
> Key: MESOS-5384
> URL: https://issues.apache.org/jira/browse/MESOS-5384
> Project: Mesos
>  Issue Type: Bug
>  Components: general
>Affects Versions: 0.28.1
> Environment: Centos 7
>Reporter: John Yost
>Assignee: Kris Paprocki
>Priority: Minor
>  Labels: easyfix, newbie
>
> Attempting to specify resources file via 
> --resources=/etc/mesos-slave/small-slave-config.json threw the following 
> error:
> Failed to determine slave resources: Bad value for resources, missing or 
> extra ':' in /etc/mesos-slave/small-slave-config.json
> I confirmed I had valid JSON: 
> [
>   {
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
>   "value": 0.5
> }
>   },
>   {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
>   "value": 512
> }
>   }
> ]
> In actuality, I misread to docs with my file pattern. Once I changed to 
> resources=file:///etc/mesos-slave/small-slave-config.json the mesos slave 
> started up fine. Just need a missing file check and corresponding error 
> message to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5893) mesos-executor should adopt and reap orphan child processes

2016-09-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495690#comment-15495690
 ] 

Stéphane Cottin commented on MESOS-5893:


Tini is a (sub)reaper commonly used in docker containers, its source may help 
to implement a process reaper in the executor.

https://github.com/krallin/tini/blob/master/src/tini.c

> mesos-executor should adopt and reap orphan child processes
> ---
>
> Key: MESOS-5893
> URL: https://issues.apache.org/jira/browse/MESOS-5893
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.1.0
> Environment: mesos compiled from git master ( 1.1.0 ) 
> {{../configure --enable-ssl --enable-libevent --prefix=/usr --enable-optimize 
> --enable-silent-rules --enable-xfs-disk-isolator}}
> isolators : 
> {{namespaces/pid,cgroups/cpu,cgroups/mem,filesystem/linux,docker/runtime,network/cni,docker/volume}}
>Reporter: Stéphane Cottin
>  Labels: containerizer
>
> mesos containerizer does not properly handle children death.
> discovered using marathon-lb, each topology update fork another haproxy,  the 
> old haproxy process should properly die after its last client connection is 
> terminated, but turn into a zombie.
> {noformat}
>  7716 ?Ssl0:00  |   \_ mesos-executor 
> --launcher_dir=/usr/libexec/mesos --sandbox_directory=/mnt/mesos/sandbox 
> --user=root --working_directory=/marathon-lb 
> --rootfs=/mnt/mesos/provisioner/containers/3b381d5c-7490-4dcd-ab4b-81051226075a/backends/overlay/rootfses/a4beacac-2d7e-445b-80c8-a9b4e480c491
>  7813 ?Ss 0:00  |   |   \_ sh -c /marathon-lb/run sse 
> --marathon https://marathon:8443 --auth-credentials user:pass --group 
> 'external' --ssl-certs /certs --max-serv-port-ip-per-task 20050
>  7823 ?S  0:00  |   |   |   \_ /bin/bash /marathon-lb/run sse 
> --marathon https://marathon:8443 --auth-credentials user:pass --group 
> external --ssl-certs /certs --max-serv-port-ip-per-task 20050
>  7827 ?S  0:00  |   |   |   \_ /usr/bin/runsv 
> /marathon-lb/service/haproxy
>  7829 ?S  0:00  |   |   |   |   \_ /bin/bash ./run
>  8879 ?S  0:00  |   |   |   |   \_ sleep 0.5
>  7828 ?Sl 0:00  |   |   |   \_ python3 
> /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config 
> /marathon-lb/haproxy.cfg --ssl-certs /certs --command sv reload 
> /marathon-lb/service/haproxy --sse --marathon https://marathon:8443 
> --auth-credentials user:pass --group external --max-serv-port-ip-per-task 
> 20050
>  7906 ?Zs 0:00  |   |   \_ [haproxy] 
>  8628 ?Zs 0:00  |   |   \_ [haproxy] 
>  8722 ?Ss 0:00  |   |   \_ haproxy -p /tmp/haproxy.pid -f 
> /marathon-lb/haproxy.cfg -D -sf 144 52
> {noformat}
> update: mesos-executor should be registered as a subreaper ( 
> http://man7.org/linux/man-pages/man2/prctl.2.html ) and propagate signals. 
> code sample: https://github.com/krallin/tini/blob/master/src/tini.c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6145) Isolator namespaces/pid is leaking mounts

2016-09-16 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495610#comment-15495610
 ] 

haosdent commented on MESOS-6145:
-

The recover method also incorrect, because we bind mount {{/var/empty/mesos}} 
to {{/var/run/mesos/pidns}}, so 
{code}
  Try> entries = os::ls(PID_NS_BIND_MOUNT_ROOT);
  if (entries.isError()) {
return Failure("Failed to list existing containers in '" +
   string(PID_NS_BIND_MOUNT_ROOT) + "': " + entries.error());
  }
{code}

could not see the mount points under {{/var/run/mesos/pidns}}

> Isolator namespaces/pid is leaking mounts
> -
>
> Key: MESOS-6145
> URL: https://issues.apache.org/jira/browse/MESOS-6145
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, isolation, security
>Reporter: Stephan Erb
>Assignee: Jie Yu
>
> As the operator of a Mesos cluster, I would like every container/executor to 
> run in a single PID namespace, so that a task cannot see what else is running 
> on the same host.
> The existing {{namespaces/pid}} isolator seems to provide this feature. 
> However, it seems like it is leaking files. I have exactly one task running 
> currently, but there are still left overs from earlier invocations
> {code}
> vagrant@aurora:~/aurora$ ls -l /var/run/mesos/pidns/
> total 0
> -rw-r--r-- 1 root root 0 Aug 26 20:30 32b6e4c7-3d22-47ed-a350-9eb929daa241
> -rw-r--r-- 1 root root 0 Aug 26 20:30 7b812f00-4614-4016-a76c-ff78a175a1b0
> -rw-r--r-- 1 root root 0 Aug 26 20:24 d501829e-7cf8-40fb-a895-0ad3416da7dc
> -rw-r--r-- 1 root root 0 Aug 26 20:24 d56ca91f-eb72-426c-8bbb-f3239358a4ef
> -r--r--r-- 1 root root 0 Aug 26 20:35 fef9a109-de52-45f3-ae41-171de6495705
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)