[jira] [Created] (MESOS-3831) Document operator HTTP endpoints

2015-11-04 Thread Neil Conway (JIRA)
Neil Conway created MESOS-3831:
--

 Summary: Document operator HTTP endpoints
 Key: MESOS-3831
 URL: https://issues.apache.org/jira/browse/MESOS-3831
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Neil Conway
Priority: Minor


These are not exhaustively documented; they probably should be.

Some endpoints have docs: e.g., `/reserve` and `/unreserve` are described in 
the reservation doc page. But it would be good to have a single page that lists 
all the endpoints and their semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2455) Add operator endpoints to create/destroy persistent volumes.

2015-11-04 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-2455:
---
Shepherd: Michael Park  (was: Adam B)

> Add operator endpoints to create/destroy persistent volumes.
> 
>
> Key: MESOS-2455
> URL: https://issues.apache.org/jira/browse/MESOS-2455
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Neil Conway
>Priority: Critical
>  Labels: mesosphere, persistent-volumes
>
> Persistent volumes will not be released automatically.
> So we probably need an endpoint for operators to forcefully release 
> persistent volumes. We probably need to add principal to Persistence struct 
> and use ACLs to control who can release what.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2455) Add operator endpoints to create/destroy persistent volumes.

2015-11-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-2455:
-
Summary: Add operator endpoints to create/destroy persistent volumes.  
(was: Add operator endpoint to destroy persistent volumes.)

> Add operator endpoints to create/destroy persistent volumes.
> 
>
> Key: MESOS-2455
> URL: https://issues.apache.org/jira/browse/MESOS-2455
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Michael Park
>Priority: Critical
>  Labels: mesosphere, persistent-volumes
>
> Persistent volumes will not be released automatically.
> So we probably need an endpoint for operators to forcefully release 
> persistent volumes. We probably need to add principal to Persistence struct 
> and use ACLs to control who can release what.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2455) Add operator endpoints to create/destroy persistent volumes.

2015-11-04 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-2455:
--

Assignee: Neil Conway  (was: Michael Park)

> Add operator endpoints to create/destroy persistent volumes.
> 
>
> Key: MESOS-2455
> URL: https://issues.apache.org/jira/browse/MESOS-2455
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Neil Conway
>Priority: Critical
>  Labels: mesosphere, persistent-volumes
>
> Persistent volumes will not be released automatically.
> So we probably need an endpoint for operators to forcefully release 
> persistent volumes. We probably need to add principal to Persistence struct 
> and use ACLs to control who can release what.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2455) Add operator endpoints to create/destroy persistent volumes.

2015-11-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-2455:
-
Description: 
Persistent volumes will not be released automatically.

So we probably need an endpoint for operators to forcefully release persistent 
volumes. We probably need to add principal to Persistence struct and use ACLs 
to control who can release what.

Additionally, it would be useful to have an endpoint for operators to create 
persistent volumes.

  was:
Persistent volumes will not be released automatically.

So we probably need an endpoint for operators to forcefully release persistent 
volumes. We probably need to add principal to Persistence struct and use ACLs 
to control who can release what.


> Add operator endpoints to create/destroy persistent volumes.
> 
>
> Key: MESOS-2455
> URL: https://issues.apache.org/jira/browse/MESOS-2455
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Neil Conway
>Priority: Critical
>  Labels: mesosphere, persistent-volumes
>
> Persistent volumes will not be released automatically.
> So we probably need an endpoint for operators to forcefully release 
> persistent volumes. We probably need to add principal to Persistence struct 
> and use ACLs to control who can release what.
> Additionally, it would be useful to have an endpoint for operators to create 
> persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3829) Error on gracefully shutdown task

2015-11-04 Thread Rafael Capucho (JIRA)
Rafael Capucho created MESOS-3829:
-

 Summary: Error on gracefully shutdown task
 Key: MESOS-3829
 URL: https://issues.apache.org/jira/browse/MESOS-3829
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0
 Environment: Marathon: 0.12.0-RC1
Mesos: 0.25.0
Docker 1.9.0
Reporter: Rafael Capucho


Hello,

I'm suffering from the same error reported here[1]. I have configured my 
mesos-slave environment as [2] setting DOCKER_STOP_TIMEOUT and 
EXECUTOR_SHUTDOWN_GRACE_PERIOD.

When I see the sandbox stdout, I can see in the first line the declaration:
--stop_timeout="30secs"
properly configured, but when I click in "Destroy App" in Marathon the stdout 
keep showing weird things[3] like repeatedly "Killing docker task Shutting 
down".

In my code I deal with SIGTERM and it isn't being reached.

Thank you.

[1] - 
https://groups.google.com/forum/?hl=en#!topic/marathon-framework/Oy0dN0Lron0
[2] - https://paste.ee/r/grRyS
[3] - https://paste.ee/r/SghOr




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.

2015-11-04 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990461#comment-14990461
 ] 

Michael Park commented on MESOS-2353:
-

[~vinodkone] Is Monday acceptable to you?

> Improve performance of the master's state.json endpoint for large clusters.
> ---
>
> Key: MESOS-2353
> URL: https://issues.apache.org/jira/browse/MESOS-2353
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>  Labels: newbie, scalability, twitter
>
> The master's state.json endpoint consistently takes a long time to compute 
> the JSON result, for large clusters:
> {noformat}
> $ time curl -s -o /dev/null localhost:5050/master/state.json
> Mon Jan 26 22:38:50 UTC 2015
> real  0m13.174s
> user  0m0.003s
> sys   0m0.022s
> {noformat}
> This can cause the master to get backlogged if there are many state.json 
> requests in flight.
> Looking at {{perf}} data, it seems most of the time is spent doing memory 
> allocation / de-allocation. This ticket will try to capture any low hanging 
> fruit to speed this up. Possibly we can leverage moves if they are not 
> already being used by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3293) Failing ROOT_ tests on CentOS 7.1 - LimitedCpuIsolatorTest

2015-11-04 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-3293:
--
Shepherd: Kapil Arya  (was: Vinod Kone)

> Failing ROOT_ tests on CentOS 7.1 - LimitedCpuIsolatorTest
> --
>
> Key: MESOS-3293
> URL: https://issues.apache.org/jira/browse/MESOS-3293
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker, test
>Affects Versions: 0.23.0, 0.24.0
> Environment: CentOS Linux release 7.1
> Linux 3.10.0
>Reporter: Marco Massenzio
>Assignee: Jian Qiu
>Priority: Blocker
>  Labels: flaky-test, tech-debt
> Attachments: 20150818-mesos-tests.log
>
>
> h2. LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> This is one of several ROOT failing tests: we want to track them 
> *individually* and for each of them decide whether to:
> * fix;
> * remove; OR
> * redesign.
> (full verbose logs attached)
> h2. Steps to Reproduce
> Completely cleaned the build, removed directory, clean pull from {{master}} 
> (SHA: {{fb93d93}}) - same results, 9 failed tests:
> {noformat}
> [==] 751 tests from 114 test cases ran. (231218 ms total)
> [  PASSED  ] 742 tests.
> [  FAILED  ] 9 tests, listed below:
> [  FAILED  ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [  FAILED  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where 
> TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess
> [  FAILED  ] ContainerizerTest.ROOT_CGROUPS_BalloonFramework
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem
> [  FAILED  ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs
>  9 FAILED TESTS
>   YOU HAVE 10 DISABLED TESTS
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3831) Document operator HTTP endpoints

2015-11-04 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3831:
---
Description: 
These are not exhaustively documented; they probably should be.

Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described 
in the reservation doc page. But it would be good to have a single page that 
lists all the endpoints and their semantics.

  was:
These are not exhaustively documented; they probably should be.

Some endpoints have docs: e.g., `/reserve` and `/unreserve` are described in 
the reservation doc page. But it would be good to have a single page that lists 
all the endpoints and their semantics.


> Document operator HTTP endpoints
> 
>
> Key: MESOS-3831
> URL: https://issues.apache.org/jira/browse/MESOS-3831
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere, newbie
>
> These are not exhaustively documented; they probably should be.
> Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described 
> in the reservation doc page. But it would be good to have a single page that 
> lists all the endpoints and their semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3830) Provide a means to do async data transfer with async back-pressure.

2015-11-04 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-3830:
--

 Summary: Provide a means to do async data transfer with async 
back-pressure.
 Key: MESOS-3830
 URL: https://issues.apache.org/jira/browse/MESOS-3830
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Benjamin Mahler


I had starting thinking about this while implementing http::Pipe and more 
recently when seeing the docker registry client streaming to a file and the 
fetcher cache refactoring work. This should still be seen as an active thought 
process and this description will be edited along the way.

The overall idea here is to provide a composable abstraction to support 
asynchronously streaming data in libprocess with asynchronous back-pressure. 
The following characteristics are desired:

{noformat}
(1) Ability to express both finite and infinite streams of data.
(2) Asynchronous zero-copy data transfer.
(3) Asynchronous back-pressure.
(4) Support for composition (prefer this to polymorphism seen in other 
implementations):
   (a) Allow data to flow down through a "pipeline" of transformations.
   (b) Allow backpressure to flow back up the "pipeline".
   (c) Allow failures and closures to flow back up the "pipeline".
{noformat}

The existing 
[http::Pipe|https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/include/process/http.hpp#L172]
 is part of the way there (needs enhancements for zero-copy, async 
back-pressure, and composition) and can be pulled up to become process::Pipe.

Composition allows us to create a pipeline of data transfer:

{code}
// Stream the contents of the url to a file.
Future complete = http::download(url) | io::fileWriter(file);

// Signatures.
Pipe::Reader http::download(const Url&);
Pipe::Writer io::writer(const Path&);
{code}

In this example, composition occurs by connecting the read end of pipe1 (from 
http::download) with the write end of pipe2 (from io::fileWriter). This 
composition terminates the pipeline (i.e. A | B | C is not allowed here) and 
returns a future for the caller to detect completion or trigger a discard to 
close down the pipeline.

Data will now flow through the pipeline without copies and backpressure from 
the file writing will slow down the consumption of data from the tcp socket 
downloading the url contents.

Another form of composition occurs when we want to apply a transformation:

{code}
// Stream the contents of the url to a file.
Future complete = http::download(url) | zip | io::fileWriter(file);

// TODO: Figure out 'zip' type signature to enable composition.
{code}

Here (A | B) returns another Pipe::Reader because B is a transformation rather 
than a Pipe::Writer. This enables A | B | C where composing with C terminates 
the pipeline once again.

Related work that is interesting to examine:

NodeJS's Stream: https://nodejs.org/api/stream.html
Reactive Streams: http://www.reactive-streams.org/
Netty's Channels / Java NIO ByteBuffer: 
http://seeallhearall.blogspot.com/2012/05/netty-tutorial-part-1-introduction-to.html

We may also want typed Pipes to capture the process::Stream abstraction 
we've wanted in the past. process::Stream and process:Pipe discussed here 
share much of the same functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3832) Scheduler HTTP API does not redirect to leading master

2015-11-04 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3832:
--
Shepherd: Vinod Kone
Target Version/s: 0.26.0
  Labels: newbie  (was: )
 Description: 
The documentation for the Scheduler HTTP API says:

{quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
Redirect” will be received with the “Location” header pointing to the leading 
master.{quote}
While the redirect functionality has been implemented, it was not actually used 
in the handler for the HTTP api.

A probable fix could be:
- Check if the current master is the leading master.
- If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}

  was:
The documentation for the HTTP api says:

{quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
Redirect” will be received with the “Location” header pointing to the leading 
master.{quote}
While the redirect functionality has been implemented, it was not actually used 
in the handler for the HTTP api.

 Summary: Scheduler HTTP API does not redirect to leading master  
(was: HTTP API does not redirect to leading master)

> Scheduler HTTP API does not redirect to leading master
> --
>
> Key: MESOS-3832
> URL: https://issues.apache.org/jira/browse/MESOS-3832
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0, 0.24.1, 0.25.0
>Reporter: Dario Rexin
>Assignee: Dario Rexin
>  Labels: newbie
>
> The documentation for the Scheduler HTTP API says:
> {quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
> Redirect” will be received with the “Location” header pointing to the leading 
> master.{quote}
> While the redirect functionality has been implemented, it was not actually 
> used in the handler for the HTTP api.
> A probable fix could be:
> - Check if the current master is the leading master.
> - If not, invoke the existing {{redirect}} method in {{src/master/http.cpp}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3829) Error on gracefully shutdown task

2015-11-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990959#comment-14990959
 ] 

haosdent commented on MESOS-3829:
-

DOCKER_STOP_TIMEOUT is used for 
http://docs.docker.com/engine/reference/commandline/stop/. According docker 
document, you could receive SIGTERM before SIGKILL.

> Error on gracefully shutdown task
> -
>
> Key: MESOS-3829
> URL: https://issues.apache.org/jira/browse/MESOS-3829
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Marathon: 0.12.0-RC1
> Mesos: 0.25.0
> Docker 1.9.0
>Reporter: Rafael Capucho
>
> Hello,
> I'm suffering from the same error reported here[1]. I have configured my 
> mesos-slave environment as [2] setting DOCKER_STOP_TIMEOUT and 
> EXECUTOR_SHUTDOWN_GRACE_PERIOD.
> When I see the sandbox stdout, I can see in the first line the declaration:
> --stop_timeout="30secs"
> properly configured, but when I click in "Destroy App" in Marathon the stdout 
> keep showing weird things[3] like repeatedly "Killing docker task Shutting 
> down".
> In my code I deal with SIGTERM and it isn't being reached.
> Thank you.
> [1] - 
> https://groups.google.com/forum/?hl=en#!topic/marathon-framework/Oy0dN0Lron0
> [2] - https://paste.ee/r/grRyS
> [3] - https://paste.ee/r/SghOr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3826) Add an optional unique identifier for resource reservations

2015-11-04 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990952#comment-14990952
 ] 

Guangya Liu commented on MESOS-3826:


Thanks [~neilc] That's also what I noticed, seems it is difficult to add an ID 
to a dynamic reservation as the dynamic reservation might be merged. [~mcypark] 
any comments? Thanks.

> Add an optional unique identifier for resource reservations
> ---
>
> Key: MESOS-3826
> URL: https://issues.apache.org/jira/browse/MESOS-3826
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Sargun Dhillon
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere
>
> Thanks to the resource reservation primitives, frameworks can reserve 
> resources. These reservations are per role, which means multiple frameworks 
> can share reservations. This can get very hairy, as multiple reservations can 
> occur on each agent. 
> It would be nice to be able to optionally, uniquely identify reservations by 
> ID, much like persistent volumes are today. This could be done by adding a 
> new protobuf field, such as Resource.ReservationInfo.id, that if set upon 
> reservation time, would come back when the reservation is advertised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3833) /help endpoints do not work for nested paths

2015-11-04 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-3833:
-

 Summary: /help endpoints do not work for nested paths
 Key: MESOS-3833
 URL: https://issues.apache.org/jira/browse/MESOS-3833
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Reporter: Anand Mazumdar
Priority: Minor


Mesos displays the list of all supported endpoints starting at a given path 
prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.

It seems that the {{help}} functionality is broken for URL's having nested 
paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
{quote}
Malformed URL, expecting '/help/id/name/'
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3829) Error on gracefully shutdown task

2015-11-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990960#comment-14990960
 ] 

haosdent commented on MESOS-3829:
-

Maybe because of this https://github.com/docker/docker/pull/3240 ? I think it 
maybe a docker issue.

> Error on gracefully shutdown task
> -
>
> Key: MESOS-3829
> URL: https://issues.apache.org/jira/browse/MESOS-3829
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Marathon: 0.12.0-RC1
> Mesos: 0.25.0
> Docker 1.9.0
>Reporter: Rafael Capucho
>
> Hello,
> I'm suffering from the same error reported here[1]. I have configured my 
> mesos-slave environment as [2] setting DOCKER_STOP_TIMEOUT and 
> EXECUTOR_SHUTDOWN_GRACE_PERIOD.
> When I see the sandbox stdout, I can see in the first line the declaration:
> --stop_timeout="30secs"
> properly configured, but when I click in "Destroy App" in Marathon the stdout 
> keep showing weird things[3] like repeatedly "Killing docker task Shutting 
> down".
> In my code I deal with SIGTERM and it isn't being reached.
> Thank you.
> [1] - 
> https://groups.google.com/forum/?hl=en#!topic/marathon-framework/Oy0dN0Lron0
> [2] - https://paste.ee/r/grRyS
> [3] - https://paste.ee/r/SghOr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3832) HTTP API does not redirect to leading master

2015-11-04 Thread Dario Rexin (JIRA)
Dario Rexin created MESOS-3832:
--

 Summary: HTTP API does not redirect to leading master
 Key: MESOS-3832
 URL: https://issues.apache.org/jira/browse/MESOS-3832
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API
Affects Versions: 0.25.0, 0.24.1, 0.24.0
Reporter: Dario Rexin
Assignee: Dario Rexin


The documentation for the HTTP api says:

{quote}If requests are made to a non-leading master a “HTTP 307 Temporary 
Redirect” will be received with the “Location” header pointing to the leading 
master.{quote}
While the redirect functionality has been implemented, it was not actually used 
in the handler for the HTTP api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3826) Add an optional unique identifier for resource reservations

2015-11-04 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990783#comment-14990783
 ] 

Neil Conway commented on MESOS-3826:


There are some subtle issues here. Right now, reservations do not have 
identity. For example, suppose a slave has 8 CPUs and 8192 MB of RAM, and a 
framework makes two dynamic reservations for 2 CPUs and 2048 MB of RAM for role 
'foo'. The result is that 4 CPUs and 4096MB of RAM on that slave are reserved 
for 'foo': there are *not* two distinct reservations that might themselves be 
assigned an ID.

Offhand, my initial impression is that this ticket would not be a reasonable 
thing to implement (unless we redefine how reservations work).

> Add an optional unique identifier for resource reservations
> ---
>
> Key: MESOS-3826
> URL: https://issues.apache.org/jira/browse/MESOS-3826
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Sargun Dhillon
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere
>
> Thanks to the resource reservation primitives, frameworks can reserve 
> resources. These reservations are per role, which means multiple frameworks 
> can share reservations. This can get very hairy, as multiple reservations can 
> occur on each agent. 
> It would be nice to be able to optionally, uniquely identify reservations by 
> ID, much like persistent volumes are today. This could be done by adding a 
> new protobuf field, such as Resource.ReservationInfo.id, that if set upon 
> reservation time, would come back when the reservation is advertised.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths

2015-11-04 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991229#comment-14991229
 ] 

Guangya Liu commented on MESOS-3833:


[~bmahler] The /help/master works well and it will list all master related 
endpoints:
{code}
/master/api/v1/scheduler
/master/flags
/master/frameworks
/master/health
/master/machine/down
/master/machine/up
/master/maintenance/schedule
/master/maintenance/status
/master/observe
/master/redirect
/master/reserve
/master/roles
/master/roles.json
/master/slaves
/master/state
/master/state-summary
/master/state.json
/master/tasks
/master/tasks.json
/master/teardown
/master/unreserve
{code}

But when click the links for master which has more than two strings, the mesos 
will report error.

The reason is that the current 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/help.cpp#L120-L127
 logic can only handle three strings (/help/master/) in an endpoint, but 
for four or bigger string endpoint (/help/master/xxx/xxx), the mesos will 
report error. Comments?

> /help endpoints do not work for nested paths
> 
>
> Key: MESOS-3833
> URL: https://issues.apache.org/jira/browse/MESOS-3833
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Mesos displays the list of all supported endpoints starting at a given path 
> prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.
> It seems that the {{help}} functionality is broken for URL's having nested 
> paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
> {quote}
> Malformed URL, expecting '/help/id/name/'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths

2015-11-04 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991240#comment-14991240
 ] 

Benjamin Mahler commented on MESOS-3833:


Let's fix Help::help to handle these.

> /help endpoints do not work for nested paths
> 
>
> Key: MESOS-3833
> URL: https://issues.apache.org/jira/browse/MESOS-3833
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Mesos displays the list of all supported endpoints starting at a given path 
> prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.
> It seems that the {{help}} functionality is broken for URL's having nested 
> paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
> {quote}
> Malformed URL, expecting '/help/id/name/'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths

2015-11-04 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991037#comment-14991037
 ] 

Guangya Liu commented on MESOS-3833:


There are two solutions for this, the first is do not use nested paths and 
update all nested paths to single string, the second is update 
https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/help.cpp#L120-L127
 to support nested path, but it is difficult to handle this as the current 
nested paths including two words and not sure if we have more words nested 
paths in future.

[~bmahler] any comments? Thanks.

> /help endpoints do not work for nested paths
> 
>
> Key: MESOS-3833
> URL: https://issues.apache.org/jira/browse/MESOS-3833
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Mesos displays the list of all supported endpoints starting at a given path 
> prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.
> It seems that the {{help}} functionality is broken for URL's having nested 
> paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
> {quote}
> Malformed URL, expecting '/help/id/name/'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3833) /help endpoints do not work for nested paths

2015-11-04 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu reassigned MESOS-3833:
--

Assignee: Guangya Liu

> /help endpoints do not work for nested paths
> 
>
> Key: MESOS-3833
> URL: https://issues.apache.org/jira/browse/MESOS-3833
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Mesos displays the list of all supported endpoints starting at a given path 
> prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.
> It seems that the {{help}} functionality is broken for URL's having nested 
> paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
> {quote}
> Malformed URL, expecting '/help/id/name/'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3834) slave upgrade framework checkpoint incompatibility

2015-11-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991116#comment-14991116
 ] 

James Peach commented on MESOS-3834:


I'm gonna take a crack at a patch for us that restores the compatibility check 
and also rewrites the framework checkpoint once it is recovered. If the latter 
is a terrible idea for some reason, I'd love to be educated about it ;)

> slave upgrade framework checkpoint incompatibility 
> ---
>
> Key: MESOS-3834
> URL: https://issues.apache.org/jira/browse/MESOS-3834
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.1
>Reporter: James Peach
>Assignee: James Peach
>
> We are upgrading from 0.22 to 0.25 and experienced the following crash in the 
> 0.24 slave:
> {code}
> F1104 05:20:49.162701  1153 slave.cpp:4175] Check failed: 
> frameworkInfo.has_id()
> *** Check failure stack trace: ***
> @ 0x7fef9c294650  google::LogMessage::Fail()
> @ 0x7fef9c29459f  google::LogMessage::SendToLog()
> @ 0x7fef9c293fb0  google::LogMessage::Flush()
> @ 0x7fef9c296ce4  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7fef9b9a5492  mesos::internal::slave::Slave::recoverFramework()
> @ 0x7fef9b9a3314  mesos::internal::slave::Slave::recover()
> @ 0x7fef9b9d069c  
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS4_5state5StateEES9_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESP_
> @ 0x7fef9ba039f4  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS8_5state5StateEESD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSK_FSI_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> {code}
> As near as I can tell, what happened was this:
> - 0.22 wrote {{framework.info}} without the FrameworkID
> - 0.23 had a compatibility check so it was ok with it
> - 0.24 removed the compatibility check in MESOS-2259
> - the framework checkpoint doesn't get rewritten during recovery so when the 
> 0.24 slave starts it reads the 0.22 version
> - 0.24 asserts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3834) slave upgrade framework checkpoint incompatibility

2015-11-04 Thread James Peach (JIRA)
James Peach created MESOS-3834:
--

 Summary: slave upgrade framework checkpoint incompatibility 
 Key: MESOS-3834
 URL: https://issues.apache.org/jira/browse/MESOS-3834
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.24.1
Reporter: James Peach
Assignee: James Peach


We are upgrading from 0.22 to 0.25 and experienced the following crash in the 
0.24 slave:

{code}
F1104 05:20:49.162701  1153 slave.cpp:4175] Check failed: frameworkInfo.has_id()
*** Check failure stack trace: ***
@ 0x7fef9c294650  google::LogMessage::Fail()
@ 0x7fef9c29459f  google::LogMessage::SendToLog()
@ 0x7fef9c293fb0  google::LogMessage::Flush()
@ 0x7fef9c296ce4  google::LogMessageFatal::~LogMessageFatal()
@ 0x7fef9b9a5492  mesos::internal::slave::Slave::recoverFramework()
@ 0x7fef9b9a3314  mesos::internal::slave::Slave::recover()
@ 0x7fef9b9d069c  
_ZZN7process8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS4_5state5StateEES9_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESP_
@ 0x7fef9ba039f4  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave5SlaveERK6ResultINS8_5state5StateEESD_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSK_FSI_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
{code}

As near as I can tell, what happened was this:

- 0.22 wrote {{framework.info}} without the FrameworkID
- 0.23 had a compatibility check so it was ok with it
- 0.24 removed the compatibility check in MESOS-2259
- the framework checkpoint doesn't get rewritten during recovery so when the 
0.24 slave starts it reads the 0.22 version
- 0.24 asserts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths

2015-11-04 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991218#comment-14991218
 ] 

Benjamin Mahler commented on MESOS-3833:


[~gyliu] it doesn't look that difficult to update the help code in your link to 
support an arbitrary amount of tokens. However, I'm a bit surprised that the 
code behaves this way. What is listed when you hit /help/master? Taking a quick 
glance at the code, it looks like the help code handles multi-token paths 
during the call to 
[Help::add|https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/src/help.cpp#L81],
 so seems that 
[Help::help|https://github.com/apache/mesos/blob/0.25.0/3rdparty/libprocess/src/help.cpp#L109]
 should handle these as well.

> /help endpoints do not work for nested paths
> 
>
> Key: MESOS-3833
> URL: https://issues.apache.org/jira/browse/MESOS-3833
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Mesos displays the list of all supported endpoints starting at a given path 
> prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.
> It seems that the {{help}} functionality is broken for URL's having nested 
> paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
> {quote}
> Malformed URL, expecting '/help/id/name/'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with --executor_environmnent_variables

2015-11-04 Thread Cody Maloney (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cody Maloney updated MESOS-3751:

Fix Version/s: 0.26.0

> MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with 
> --executor_environmnent_variables
> ---
>
> Key: MESOS-3751
> URL: https://issues.apache.org/jira/browse/MESOS-3751
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.24.1, 0.25.0
>Reporter: Cody Maloney
>Assignee: Gilbert Song
>  Labels: mesosphere, newbie
> Fix For: 0.26.0
>
>
> When using --executor_environment_variables, and having 
> MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos 
> containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself.
> Relevant code: 
> https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281
> It sees that the variable is in the mesos-slave's environment (os::getenv), 
> rather than checking if it is set in the environment variable set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2077) Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.

2015-11-04 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991266#comment-14991266
 ] 

Guangya Liu commented on MESOS-2077:


[~bmahler] can you please help shepherd this? Thanks!

> Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
> -
>
> Key: MESOS-2077
> URL: https://issues.apache.org/jira/browse/MESOS-2077
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Benjamin Mahler
>Assignee: Guangya Liu
>  Labels: twitter
>
> For maintenance, sometimes operators will force the drain of a slave (via 
> SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary 
> (e.g. bad hardware).
> To eliminate alerting noise, we'd like to add a 'Reason' that expresses the 
> forced drain of the slave, so that these are not considered to be a generic 
> slave removal TASK_LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2647) Slave should validate tasks using oversubscribed resources

2015-11-04 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991270#comment-14991270
 ] 

Guangya Liu commented on MESOS-2647:


[~vi...@twitter.com] can you please help review? Thanks!

> Slave should validate tasks using oversubscribed resources
> --
>
> Key: MESOS-2647
> URL: https://issues.apache.org/jira/browse/MESOS-2647
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Guangya Liu
>  Labels: twitter
>
> The latest oversubscribed resource estimate might render a revocable task 
> launch invalid. Slave should check this and send TASK_LOST with appropriate 
> REASON.
> We need to add a new REASON for this (REASON_RESOURCE_OVERSUBSCRIBED?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3828) Strategy for Utilizing Docker 1.9 Multihost Networking

2015-11-04 Thread John Omernik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Omernik updated MESOS-3828:

Labels: Docker isolation network plugins  (was: Docker feaa isolation 
network plugins)

> Strategy for Utilizing Docker 1.9 Multihost Networking
> --
>
> Key: MESOS-3828
> URL: https://issues.apache.org/jira/browse/MESOS-3828
> Project: Mesos
>  Issue Type: Story
>  Components: isolation
>Affects Versions: 0.26.0
>Reporter: John Omernik
>  Labels: Docker, isolation, network, plugins
>
> This is a user story to discuss the strategy for Mesos to in using the new 
> Docker 1.9 feature: Multihost Networking. 
> http://blog.docker.com/2015/11/docker-multi-host-networking-ga/
> Basically we should determine if this is something we want to work with from 
> a standpoint of container isolation and going forward how can we best 
> integrate. 
> The space for networking in Mesos is growing fast with IP per Container and 
> other networking modules being worked on.  Projects like Project Calico offer 
> services from outside the Mesos community that plug nicely or will plug 
> nicely into Mesos.  
> So how about Multihost networking? An option to work with? With Docker being 
> a first class citizen of Mesos, this is something we should be considering. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3828) Strategy for Utilizing Docker 1.9 Multihost Networking

2015-11-04 Thread John Omernik (JIRA)
John Omernik created MESOS-3828:
---

 Summary: Strategy for Utilizing Docker 1.9 Multihost Networking
 Key: MESOS-3828
 URL: https://issues.apache.org/jira/browse/MESOS-3828
 Project: Mesos
  Issue Type: Story
  Components: isolation
Affects Versions: 0.26.0
Reporter: John Omernik


This is a user story to discuss the strategy for Mesos to in using the new 
Docker 1.9 feature: Multihost Networking. 

http://blog.docker.com/2015/11/docker-multi-host-networking-ga/

Basically we should determine if this is something we want to work with from a 
standpoint of container isolation and going forward how can we best integrate. 

The space for networking in Mesos is growing fast with IP per Container and 
other networking modules being worked on.  Projects like Project Calico offer 
services from outside the Mesos community that plug nicely or will plug nicely 
into Mesos.  

So how about Multihost networking? An option to work with? With Docker being a 
first class citizen of Mesos, this is something we should be considering. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.

2015-11-04 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989814#comment-14989814
 ] 

Vinod Kone commented on MESOS-2353:
---

[~mcypark] Do you have an ETA on when you would get the design and reviews out? 
This is causing issues for us in production, so want to fix this asap.

> Improve performance of the master's state.json endpoint for large clusters.
> ---
>
> Key: MESOS-2353
> URL: https://issues.apache.org/jira/browse/MESOS-2353
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>  Labels: newbie, scalability, twitter
>
> The master's state.json endpoint consistently takes a long time to compute 
> the JSON result, for large clusters:
> {noformat}
> $ time curl -s -o /dev/null localhost:5050/master/state.json
> Mon Jan 26 22:38:50 UTC 2015
> real  0m13.174s
> user  0m0.003s
> sys   0m0.022s
> {noformat}
> This can cause the master to get backlogged if there are many state.json 
> requests in flight.
> Looking at {{perf}} data, it seems most of the time is spent doing memory 
> allocation / de-allocation. This ticket will try to capture any low hanging 
> fruit to speed this up. Possibly we can leverage moves if they are not 
> already being used by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3388) Add an interface to allow Slave Modules to checkpoint/restore state.

2015-11-04 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3388:
-
Labels: external-volumes mesosphere module  (was: )

> Add an interface to allow Slave Modules to checkpoint/restore state.
> 
>
> Key: MESOS-3388
> URL: https://issues.apache.org/jira/browse/MESOS-3388
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kapil Arya
>Assignee: Greg Mann
>  Labels: external-volumes, mesosphere, module
>
> * This is to restore module-specific in-memory data structures that might be 
> required by the modules to do cleanup on task exit, etc.
> * We need to define the interaction when an Agent is restarted with a 
> different set of modules.
> One open question is how does an Agent identify a certain module? One 
> possibility is to assign a UID to the module and pass it in during 
> `create()`?. The UID is used to assign a ckpt directory during ckpt/restore. 
> (Something like /tmp/mesos/...//modules/).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3817) Rename offers to outstanding offers

2015-11-04 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-3817:

Labels: newbie  (was: )

> Rename offers to outstanding offers
> ---
>
> Key: MESOS-3817
> URL: https://issues.apache.org/jira/browse/MESOS-3817
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: haosdent
>  Labels: newbie
>
> As discussion in http://search-hadoop.com/m/0Vlr6NFAux1DPmxp , we need rename 
> offers to outstanding offers in webui to avoid user confuse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3815) docker executor not works when SSL enable

2015-11-04 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-3815:

Summary: docker executor not works when SSL enable  (was: os environment 
variables not passing to docker-executor environment variables correctly)

> docker executor not works when SSL enable
> -
>
> Key: MESOS-3815
> URL: https://issues.apache.org/jira/browse/MESOS-3815
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3827) Improve compilation speed of GMock tests

2015-11-04 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-3827:
--

Assignee: Neil Conway

> Improve compilation speed of GMock tests
> 
>
> Key: MESOS-3827
> URL: https://issues.apache.org/jira/browse/MESOS-3827
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere, tech-debt, testing
>
> The GMock docs suggest that moving the definition of mock classes' 
> constructors and destructors to a separate compilation unit can improve 
> compile performance: 
> https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2079) IO.Write test is flaky on OS X 10.10.

2015-11-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989947#comment-14989947
 ] 

James Peach commented on MESOS-2079:


https://reviews.apache.org/r/39938/
https://reviews.apache.org/r/39940/
https://reviews.apache.org/r/39941/


> IO.Write test is flaky on OS X 10.10.
> -
>
> Key: MESOS-2079
> URL: https://issues.apache.org/jira/browse/MESOS-2079
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, technical debt, test
> Environment: OS X 10.10
> {noformat}
> $ clang++ --version
> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
> Target: x86_64-apple-darwin14.0.0
> Thread model: posix
> {noformat}
>Reporter: Benjamin Mahler
>Assignee: James Peach
>  Labels: flaky
>
> [~benjaminhindman]: If I recall correctly, this is related to MESOS-1658. 
> Unfortunately, we don't have a stacktrace for SIGPIPE currently:
> {noformat}
> [ RUN  ] IO.Write
> make[5]: *** [check-local] Broken pipe: 13
> {noformat}
> Running in gdb, seems to always occur here:
> {code}
> Program received signal SIGPIPE, Broken pipe.
> [Switching to process 56827 thread 0x60b]
> 0x7fff9a011132 in __psynch_cvwait ()
> (gdb) where
> #0  0x7fff9a011132 in __psynch_cvwait ()
> #1  0x7fff903e7ea0 in _pthread_cond_wait ()
> #2  0x00010062f27c in Gate::arrive (this=0x101908a10, old=14780) at 
> gate.hpp:82
> #3  0x000100600888 in process::schedule (arg=0x0) at src/process.cpp:1373
> #4  0x7fff903e72fc in _pthread_body ()
> #5  0x7fff903e7279 in _pthread_start ()
> #6  0x7fff903e54b1 in thread_start ()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3815) docker executor not works when SSL enable

2015-11-04 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-3815:

Description: Because docker executor not pass SSL related environment 
variables, mesos-docker-executor could not works normal when SSL enable. More 
details could found in http://search-hadoop.com/m/0Vlr6DsslDSvVs72

> docker executor not works when SSL enable
> -
>
> Key: MESOS-3815
> URL: https://issues.apache.org/jira/browse/MESOS-3815
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>
> Because docker executor not pass SSL related environment variables, 
> mesos-docker-executor could not works normal when SSL enable. More details 
> could found in http://search-hadoop.com/m/0Vlr6DsslDSvVs72



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2079) IO.Write test is flaky on OS X 10.10.

2015-11-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989947#comment-14989947
 ] 

James Peach edited comment on MESOS-2079 at 11/4/15 5:19 PM:
-

These patches global ignore {{SIGPIPE}} during libprocess initialization, 
document {{SIGPIPE}} behavior a bit more, and remove various signal 
manipulations that were formerly necessary for disabling {{SIGPIPE}} delivery.

https://reviews.apache.org/r/39938/
https://reviews.apache.org/r/39940/
https://reviews.apache.org/r/39941/



was (Author: jamespeach):
https://reviews.apache.org/r/39938/
https://reviews.apache.org/r/39940/
https://reviews.apache.org/r/39941/


> IO.Write test is flaky on OS X 10.10.
> -
>
> Key: MESOS-2079
> URL: https://issues.apache.org/jira/browse/MESOS-2079
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, technical debt, test
> Environment: OS X 10.10
> {noformat}
> $ clang++ --version
> Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)
> Target: x86_64-apple-darwin14.0.0
> Thread model: posix
> {noformat}
>Reporter: Benjamin Mahler
>Assignee: James Peach
>  Labels: flaky
>
> [~benjaminhindman]: If I recall correctly, this is related to MESOS-1658. 
> Unfortunately, we don't have a stacktrace for SIGPIPE currently:
> {noformat}
> [ RUN  ] IO.Write
> make[5]: *** [check-local] Broken pipe: 13
> {noformat}
> Running in gdb, seems to always occur here:
> {code}
> Program received signal SIGPIPE, Broken pipe.
> [Switching to process 56827 thread 0x60b]
> 0x7fff9a011132 in __psynch_cvwait ()
> (gdb) where
> #0  0x7fff9a011132 in __psynch_cvwait ()
> #1  0x7fff903e7ea0 in _pthread_cond_wait ()
> #2  0x00010062f27c in Gate::arrive (this=0x101908a10, old=14780) at 
> gate.hpp:82
> #3  0x000100600888 in process::schedule (arg=0x0) at src/process.cpp:1373
> #4  0x7fff903e72fc in _pthread_body ()
> #5  0x7fff903e7279 in _pthread_start ()
> #6  0x7fff903e54b1 in thread_start ()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3388) Add an interface to allow Slave Modules to checkpoint/restore state.

2015-11-04 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989964#comment-14989964
 ] 

Greg Mann commented on MESOS-3388:
--

Regarding Agent restart, I'm trying to decide if it makes sense for us to 
garbage collect the checkpointed state of undetected modules on Agent startup. 
On the one hand, it's good to leave the Agent in a clean state whenever we can. 
On the other, it's possible that a user may restart the Agent multiple times 
with different modules present, and it could be useful for them to have old 
checkpointed module data hanging around. If our long-term vision is that Agent 
restart should be a seldom-used operator action, then perhaps garbage 
collecting old module checkpoint data isn't such a big deal. If we imagine 
Agents being restarted frequently in order to accomplish different 
Attribute/Resource/Module configurations, then cleanup would be wise.

Regarding module UIDs, how will we maintain association of a given module with 
its ID through an Agent failover or restart? i.e., if we assign a module a UID, 
checkpoint some state, and then restart the Agent, how do we know what that 
module's UID was? Perhaps we could use a hash on the module name?

> Add an interface to allow Slave Modules to checkpoint/restore state.
> 
>
> Key: MESOS-3388
> URL: https://issues.apache.org/jira/browse/MESOS-3388
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kapil Arya
>Assignee: Greg Mann
>
> * This is to restore module-specific in-memory data structures that might be 
> required by the modules to do cleanup on task exit, etc.
> * We need to define the interaction when an Agent is restarted with a 
> different set of modules.
> One open question is how does an Agent identify a certain module? One 
> possibility is to assign a UID to the module and pass it in during 
> `create()`?. The UID is used to assign a ckpt directory during ckpt/restore. 
> (Something like /tmp/mesos/...//modules/).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3817) Rename offers to outstanding offers

2015-11-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989991#comment-14989991
 ] 

haosdent commented on MESOS-3817:
-

I think rename "Offers" to "Outstanding Offers" in offers.html and index.html 
should be enough.

> Rename offers to outstanding offers
> ---
>
> Key: MESOS-3817
> URL: https://issues.apache.org/jira/browse/MESOS-3817
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: haosdent
>  Labels: newbie
>
> As discussion in http://search-hadoop.com/m/0Vlr6NFAux1DPmxp , we need rename 
> offers to outstanding offers in webui to avoid user confuse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.

2015-11-04 Thread Felix Bechstein (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989200#comment-14989200
 ] 

Felix Bechstein commented on MESOS-2353:


We are experiencing severe issues too. The master is using most of it's cpu 
cycles for answering the master/state.json and metrics/snapshot requests. It 
takes up to 30s to fetch the state.

We experience, that the master is getting slow in sending offers because of 
that.
We noticed, that restarting the leader to force reelection, the problem goes 
away for some time.

> Improve performance of the master's state.json endpoint for large clusters.
> ---
>
> Key: MESOS-2353
> URL: https://issues.apache.org/jira/browse/MESOS-2353
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>  Labels: newbie, scalability, twitter
>
> The master's state.json endpoint consistently takes a long time to compute 
> the JSON result, for large clusters:
> {noformat}
> $ time curl -s -o /dev/null localhost:5050/master/state.json
> Mon Jan 26 22:38:50 UTC 2015
> real  0m13.174s
> user  0m0.003s
> sys   0m0.022s
> {noformat}
> This can cause the master to get backlogged if there are many state.json 
> requests in flight.
> Looking at {{perf}} data, it seems most of the time is spent doing memory 
> allocation / de-allocation. This ticket will try to capture any low hanging 
> fruit to speed this up. Possibly we can leverage moves if they are not 
> already being used by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3815) docker executor not works when SSL enable

2015-11-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984317#comment-14984317
 ] 

haosdent edited comment on MESOS-3815 at 11/4/15 6:04 PM:
--

Patch: 
https://reviews.apache.org/r/39944/
https://reviews.apache.org/r/39945/


was (Author: haosd...@gmail.com):
Patch: https://reviews.apache.org/r/39837/

> docker executor not works when SSL enable
> -
>
> Key: MESOS-3815
> URL: https://issues.apache.org/jira/browse/MESOS-3815
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>
> Because docker executor not pass SSL related environment variables, 
> mesos-docker-executor could not works normal when SSL enable. More details 
> could found in http://search-hadoop.com/m/0Vlr6DsslDSvVs72



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3827) Improve compilation speed of GMock tests

2015-11-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990064#comment-14990064
 ] 

James Peach commented on MESOS-3827:


Did you measure this? I tried it and it didn't make much difference for me :-/

> Improve compilation speed of GMock tests
> 
>
> Key: MESOS-3827
> URL: https://issues.apache.org/jira/browse/MESOS-3827
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere, tech-debt, testing
>
> The GMock docs suggest that moving the definition of mock classes' 
> constructors and destructors to a separate compilation unit can improve 
> compile performance: 
> https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3827) Improve compilation speed of GMock tests

2015-11-04 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3827:
---
Shepherd: Joris Van Remoortere

> Improve compilation speed of GMock tests
> 
>
> Key: MESOS-3827
> URL: https://issues.apache.org/jira/browse/MESOS-3827
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere, tech-debt, testing
>
> The GMock docs suggest that moving the definition of mock classes' 
> constructors and destructors to a separate compilation unit can improve 
> compile performance: 
> https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3827) Improve compilation speed of GMock tests

2015-11-04 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990099#comment-14990099
 ] 

James Peach commented on MESOS-3827:


Yup. I'd agree it is worth it even if it made no difference :)

> Improve compilation speed of GMock tests
> 
>
> Key: MESOS-3827
> URL: https://issues.apache.org/jira/browse/MESOS-3827
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere, tech-debt, testing
>
> The GMock docs suggest that moving the definition of mock classes' 
> constructors and destructors to a separate compilation unit can improve 
> compile performance: 
> https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3827) Improve compilation speed of GMock tests

2015-11-04 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990130#comment-14990130
 ] 

Neil Conway commented on MESOS-3827:


One irritant is that we can't use this technique with TestAllocator (because it 
is templatized, so def'n needs to be in a header), which is a fairly expensive 
thing to compile: just compiling the things that use it takes ~175 seconds of 
CPU time (versus ~1300 for the whole test suite). Not sure if there's any easy 
fix for this, though.

> Improve compilation speed of GMock tests
> 
>
> Key: MESOS-3827
> URL: https://issues.apache.org/jira/browse/MESOS-3827
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere, tech-debt, testing
>
> The GMock docs suggest that moving the definition of mock classes' 
> constructors and destructors to a separate compilation unit can improve 
> compile performance: 
> https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3827) Improve compilation speed of GMock tests

2015-11-04 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990136#comment-14990136
 ] 

Neil Conway commented on MESOS-3827:


https://reviews.apache.org/r/39946/
https://reviews.apache.org/r/39947/


> Improve compilation speed of GMock tests
> 
>
> Key: MESOS-3827
> URL: https://issues.apache.org/jira/browse/MESOS-3827
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere, tech-debt, testing
>
> The GMock docs suggest that moving the definition of mock classes' 
> constructors and destructors to a separate compilation unit can improve 
> compile performance: 
> https://code.google.com/p/googlemock/wiki/V1_7_CookBook#Making_the_Compilation_Faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2015-11-04 Thread Peter Kolloch (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989299#comment-14989299
 ] 

Peter Kolloch commented on MESOS-3793:
--

The last log line ( Failed to locate systemd runtime directory: 
/run/systemd/system) looks as if mesos depended on systemd? Is that correct and 
expected?

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos 
> group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
> master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
> I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
> I1022 18:42:26.862560   137 registrar.cpp:309] Recovering 

[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2015-11-04 Thread Peter Kolloch (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989308#comment-14989308
 ] 

Peter Kolloch commented on MESOS-3793:
--

[~karlkfi]Is it correct that you encountered this problem, too? Did you find a 
workaround?

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos 
> group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
> master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
> I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
> I1022 18:42:26.862560   137 registrar.cpp:309] Recovering registrar
> Failed to create a containerizer: Could not create 

[jira] [Commented] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2015-11-04 Thread Peter Kolloch (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989332#comment-14989332
 ] 

Peter Kolloch commented on MESOS-3793:
--

I found this related CHANGELOG entry 
(https://github.com/apache/mesos/blob/master/CHANGELOG#L109):

{code}
  * [MESOS-3425] - Modify LinuxLauncher to support Systemd.
{code}

Maybe MESOS-3425 introduced a hard dependency on systemd utilities? Maybe 
MESOS-1159 is about fixing that but I am not sure.

> Cannot start mesos local on a Debian GNU/Linux 8 docker machine
> ---
>
> Key: MESOS-3793
> URL: https://issues.apache.org/jira/browse/MESOS-3793
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
> Environment: Debian GNU/Linux 8 docker machine
>Reporter: Matthias Veit
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> We updated the mesos version to 0.25.0 in our Marathon docker image, that 
> runs our integration tests.
> We use mesos local for those tests. This fails with this message:
> {noformat}
> root@a06e4b4eb776:/marathon# mesos local
> I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
> I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
> I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
> I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 
> 1287ns
> I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the 
> db in ns
> I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
> I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
> I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to 
> STARTING
> I1022 18:42:26.855655   138 master.cpp:376] Master 
> a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 
> 172.17.0.14:5050
> I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
> I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
> slaves to register
> I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
> authenticator
> W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
> authentication requests will be refused
> I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
> I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix
> I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.888185ms
> I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
> STARTING
> I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
> I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
> I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 371121ns
> I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to 
> VOTING
> I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos 
> group
> I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
> I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
> master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
> I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading