[jira] [Commented] (MESOS-2449) Support group of tasks (Pod) constructs and API in Mesos

2015-03-30 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386819#comment-14386819
 ] 

Timothy St. Clair commented on MESOS-2449:
--

It would be ideal in this use case to handle the network + ovs abstraction 1st 
as it's crucial to the pods.  

 Support group of tasks (Pod) constructs and API in Mesos
 

 Key: MESOS-2449
 URL: https://issues.apache.org/jira/browse/MESOS-2449
 Project: Mesos
  Issue Type: Epic
Reporter: Timothy Chen

 There is a common need among different frameworks, that wants to start a 
 group of tasks that are either depend or co-located with each other.
 Although a framework can schedule individual tasks within the same offer and 
 slave id, it doesn't have a way to describe dependencies, failure policies 
 (if one of the task failed), network setup, and group container information, 
 etc.
 Want to create a epic to start the discussion around the requirements folks 
 need, and see where we can lead this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2571) Expose Memory Pressure in MemeIsolator

2015-03-30 Thread Chi Zhang (JIRA)
Chi Zhang created MESOS-2571:


 Summary: Expose Memory Pressure in MemeIsolator
 Key: MESOS-2571
 URL: https://issues.apache.org/jira/browse/MESOS-2571
 Project: Mesos
  Issue Type: Improvement
Reporter: Chi Zhang
Assignee: Chi Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2571) Expose Memory Pressure in MemeIsolator

2015-03-30 Thread Chi Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387090#comment-14387090
 ] 

Chi Zhang commented on MESOS-2571:
--

https://reviews.apache.org/r/30546

 Expose Memory Pressure in MemeIsolator
 --

 Key: MESOS-2571
 URL: https://issues.apache.org/jira/browse/MESOS-2571
 Project: Mesos
  Issue Type: Improvement
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1790) Add chown option to CommandInfo.URI

2015-03-30 Thread Jim Klucar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386946#comment-14386946
 ] 

Jim Klucar commented on MESOS-1790:
---

Forcing the Mesos slave to be run as root to get this working is probably a 
non-starter for many users. I'm going to add a skip chown option and see what 
people think.

 Add chown option to CommandInfo.URI
 -

 Key: MESOS-1790
 URL: https://issues.apache.org/jira/browse/MESOS-1790
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
  Labels: mesosphere, newbie

 Mesos fetcher always chown()s the extracted executor URIs as the executor 
 user but sometimes this is not desirable, e.g., setuid bit gets lost during 
 chown() if slave/fetcher is running as root. 
 It would be nice to give frameworks the ability to skip the chown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2572) Add memory statistics tests.

2015-03-30 Thread Chi Zhang (JIRA)
Chi Zhang created MESOS-2572:


 Summary: Add memory statistics tests.
 Key: MESOS-2572
 URL: https://issues.apache.org/jira/browse/MESOS-2572
 Project: Mesos
  Issue Type: Task
Reporter: Chi Zhang
Assignee: Chi Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2570) webuiUrl doesn't get updated when a framework re-registers

2015-03-30 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386999#comment-14386999
 ] 

Benjamin Mahler commented on MESOS-2570:


Looks like a duplicate of MESOS-703 (no support for FrameworkInfo updates).

 webuiUrl doesn't get updated when a framework re-registers
 --

 Key: MESOS-2570
 URL: https://issues.apache.org/jira/browse/MESOS-2570
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Robert Stupp
Priority: Minor

 The webuiUrl attribute doesn't get updated when a framework re-registers.
 I tried to set the webuiUrl for example here: 
 https://github.com/mesosphere/cassandra-mesos/blob/rewrite/cassandra-framework/src/main/java/io/mesosphere/mesos/frameworks/cassandra/Main.java#L165
 After the first startup, the correct URL is linked in Mesos webUI. But when 
 the scheduler is stopped, the webuiUrl field is changed and the framework is 
 restarted, the old webuiUrl is shown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2404) Add an example framework to test persistent volumes.

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2404:
--
  Sprint: Twitter Mesos Q1 Sprint 6
Assignee: Jie Yu
Story Points: 3

 Add an example framework to test persistent volumes.
 

 Key: MESOS-2404
 URL: https://issues.apache.org/jira/browse/MESOS-2404
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu

 This serves two purposes:
 1) testing the new persistence feature
 2) served as an example for others to use the new feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2353:
--
Sprint: Twitter Mesos Q1 Sprint 5, Twitter Mesos Q1 Sprint 6  (was: Twitter 
Mesos Q1 Sprint 5)

 Improve performance of the master's state.json endpoint for large clusters.
 ---

 Key: MESOS-2353
 URL: https://issues.apache.org/jira/browse/MESOS-2353
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
  Labels: newbie, scalability, twitter

 The master's state.json endpoint consistently takes a long time to compute 
 the JSON result, for large clusters:
 {noformat}
 $ time curl -s -o /dev/null localhost:5050/master/state.json
 Mon Jan 26 22:38:50 UTC 2015
 real  0m13.174s
 user  0m0.003s
 sys   0m0.022s
 {noformat}
 This can cause the master to get backlogged if there are many state.json 
 requests in flight.
 Looking at {{perf}} data, it seems most of the time is spent doing memory 
 allocation / de-allocation. This ticket will try to capture any low hanging 
 fruit to speed this up. Possibly we can leverage moves if they are not 
 already being used by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2571) Expose Memory Pressure in MemIsolator

2015-03-30 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-2571:
--
Summary: Expose Memory Pressure in MemIsolator  (was: Expose Memory 
Pressure in MemeIsolator)

 Expose Memory Pressure in MemIsolator
 -

 Key: MESOS-2571
 URL: https://issues.apache.org/jira/browse/MESOS-2571
 Project: Mesos
  Issue Type: Improvement
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2461) Slave should provide details on processes running in its cgroups

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2461:
--
Story Points: 1

 Slave should provide details on processes running in its cgroups
 

 Key: MESOS-2461
 URL: https://issues.apache.org/jira/browse/MESOS-2461
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.21.1
Reporter: Ian Downes
Assignee: Jie Yu
Priority: Minor
  Labels: twitter

 The slave can optionally be put into its own cgroups for a list of 
 subsystems, e.g., for monitoring of memory and cpu. See the slave flag: 
 --slave_subsystems
 It currently refuses to start if there are any processes in its cgroups - 
 this could be another slave or some subprocess started by a previous slave - 
 and only logs the pids of those processes.
 Improve this to log details about the processes: suggest at least the process 
 command, uid running it, and perhaps its start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2461) Slave should provide details on processes running in its cgroups

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2461:
--
Assignee: Jie Yu

 Slave should provide details on processes running in its cgroups
 

 Key: MESOS-2461
 URL: https://issues.apache.org/jira/browse/MESOS-2461
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.21.1
Reporter: Ian Downes
Assignee: Jie Yu
Priority: Minor
  Labels: twitter

 The slave can optionally be put into its own cgroups for a list of 
 subsystems, e.g., for monitoring of memory and cpu. See the slave flag: 
 --slave_subsystems
 It currently refuses to start if there are any processes in its cgroups - 
 this could be another slave or some subprocess started by a previous slave - 
 and only logs the pids of those processes.
 Improve this to log details about the processes: suggest at least the process 
 command, uid running it, and perhaps its start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2461) Slave should provide details on processes running in its cgroups

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2461:
--
Sprint: Twitter Mesos Q1 Sprint 6

 Slave should provide details on processes running in its cgroups
 

 Key: MESOS-2461
 URL: https://issues.apache.org/jira/browse/MESOS-2461
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.21.1
Reporter: Ian Downes
Priority: Minor
  Labels: twitter

 The slave can optionally be put into its own cgroups for a list of 
 subsystems, e.g., for monitoring of memory and cpu. See the slave flag: 
 --slave_subsystems
 It currently refuses to start if there are any processes in its cgroups - 
 this could be another slave or some subprocess started by a previous slave - 
 and only logs the pids of those processes.
 Improve this to log details about the processes: suggest at least the process 
 command, uid running it, and perhaps its start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2350) Add support for MesosContainerizerLaunch to chroot to a specified path

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2350:
--
Sprint: Twitter Mesos Q1 Sprint 3, Twitter Mesos Q1 Sprint 4, Twitter Mesos 
Q1 Sprint 5, Twitter Mesos Q1 Sprint 6  (was: Twitter Mesos Q1 Sprint 3, 
Twitter Mesos Q1 Sprint 4, Twitter Mesos Q1 Sprint 5)

 Add support for MesosContainerizerLaunch to chroot to a specified path
 --

 Key: MESOS-2350
 URL: https://issues.apache.org/jira/browse/MESOS-2350
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.21.1, 0.22.0
Reporter: Ian Downes
Assignee: Ian Downes
  Labels: twitter

 In preparation for the MesosContainerizer to support a filesystem isolator 
 the MesosContainerizerLauncher must support chrooting. Optionally, it should 
 also configure the chroot environment by (re-)mounting special filesystems 
 such as /proc and /sys and making device nodes such as /dev/zero, etc., such 
 that the chroot environment is functional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2332) Report per-container metrics for network bandwidth throttling

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2332:
--
Sprint: Twitter Mesos Q1 Sprint 2, Twitter Mesos Q1 Sprint 3, Twitter Mesos 
Q1 Sprint 4, Twitter Mesos Q1 Sprint 5, Twitter Mesos Q1 Sprint 6  (was: 
Twitter Mesos Q1 Sprint 2, Twitter Mesos Q1 Sprint 3, Twitter Mesos Q1 Sprint 
4, Twitter Mesos Q1 Sprint 5)

 Report per-container metrics for network bandwidth throttling
 -

 Key: MESOS-2332
 URL: https://issues.apache.org/jira/browse/MESOS-2332
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: features, twitter

 Export metrics from the network isolation to identify scope and duration of 
 container throttling.  
 Packet loss can be identified from the overlimits and requeues fields of the 
 htb qdisc report for the virtual interface, e.g.
 {noformat}
 $ tc -s -d qdisc show dev mesos19223
 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
 1 1 1
  Sent 158213287452 bytes 1030876393 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
 qdisc ingress : parent :fff1 
  Sent 119381747824 bytes 1144549901 pkt (dropped 2044879, overlimits 0 
 requeues 0)
  backlog 0b 0p requeues 0
 {noformat}
 Note that since a packet can be examined multiple times before transmission, 
 overlimits can exceed total packets sent.  
 Add to the port_mapping isolator usage() and the container statistics 
 protobuf. Carefully consider the naming (esp tx/rx) + commenting of the 
 protobuf fields so it's clear what these represent and how they are different 
 to the existing dropped packet counts from the network stack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1127) Implement the protobufs for the scheduler API

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-1127:
--
Sprint: Twitter Mesos Q1 Sprint 5, Twitter Mesos Q1 Sprint 6  (was: Twitter 
Mesos Q1 Sprint 5)

 Implement the protobufs for the scheduler API
 -

 Key: MESOS-1127
 URL: https://issues.apache.org/jira/browse/MESOS-1127
 Project: Mesos
  Issue Type: Task
  Components: framework
Reporter: Benjamin Hindman
Assignee: Vinod Kone
  Labels: twitter

 The default scheduler/executor interface and implementation in Mesos have a 
 few drawbacks:
 (1) The interface is fairly high-level which makes it hard to do certain 
 things, for example, handle events (callbacks) in batch. This can have a big 
 impact on the performance of schedulers (for example, writing task updates 
 that need to be persisted).
 (2) The implementation requires writing a lot of boilerplate JNI and native 
 Python wrappers when adding additional API components.
 The plan is to provide a lower-level API that can easily be used to implement 
 the higher-level API that is currently provided. This will also open the door 
 to more easily building native-language Mesos libraries (i.e., not needing 
 the C++ shim layer) and building new higher-level abstractions on top of the 
 lower-level API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2438) Improve support for streaming HTTP Responses in libprocess.

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2438:
--
Sprint: Twitter Mesos Q1 Sprint 4, Twitter Mesos Q1 Sprint 5, Twitter Mesos 
Q1 Sprint 6  (was: Twitter Mesos Q1 Sprint 4, Twitter Mesos Q1 Sprint 5)

 Improve support for streaming HTTP Responses in libprocess.
 ---

 Key: MESOS-2438
 URL: https://issues.apache.org/jira/browse/MESOS-2438
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
  Labels: twitter

 Currently libprocess' HTTP::Response supports a PIPE construct for doing 
 streaming responses:
 {code}
 struct Response
 {
   ...
   // Either provide a body, an absolute path to a file, or a
   // pipe for streaming a response. Distinguish between the cases
   // using 'type' below.
   //
   // BODY: Uses 'body' as the body of the response. These may be
   // encoded using gzip for efficiency, if 'Content-Encoding' is not
   // already specified.
   //
   // PATH: Attempts to perform a 'sendfile' operation on the file
   // found at 'path'.
   //
   // PIPE: Splices data from 'pipe' using 'Transfer-Encoding=chunked'.
   // Note that the read end of the pipe will be closed by libprocess
   // either after the write end has been closed or if the socket the
   // data is being spliced to has been closed (i.e., nobody is
   // listening any longer). This can cause writes to the pipe to
   // generate a SIGPIPE (which will terminate your program unless you
   // explicitly ignore them or handle them).
   //
   // In all cases (BODY, PATH, PIPE), you are expected to properly
   // specify the 'Content-Type' header, but the 'Content-Length' and
   // or 'Transfer-Encoding' headers will be filled in for you.
   enum {
 NONE,
 BODY,
 PATH,
 PIPE
   } type;
   ...
 };
 {code}
 This interface is too low level and difficult to program against:
 * Connection closure is signaled with SIGPIPE, which is difficult for callers 
 to deal with (must suppress SIGPIPE locally or globally in order to get EPIPE 
 instead).
 * Pipes are generally for inter-process communication, and the pipe has 
 finite size. With a blocking pipe the caller must deal with blocking when the 
 pipe's buffer limit is exceeded. With a non-blocking pipe, the caller must 
 deal with retrying the write.
 We'll want to consider a few use cases:
 # Sending an HTTP::Response with streaming data.
 # Making a request with http::get and http::post in which the data is 
 returned in a streaming manner.
 # Making a request in which the request content is streaming.
 This ticket will focus on 1 as it is required for the HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2462) Add option for Subprocess to set a death signal for the forked child

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2462:
--
Sprint: Twitter Mesos Q1 Sprint 6

 Add option for Subprocess to set a death signal for the forked child
 

 Key: MESOS-2462
 URL: https://issues.apache.org/jira/browse/MESOS-2462
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.21.1
Reporter: Ian Downes
Assignee: Jie Yu
Priority: Minor
  Labels: twitter

 Currently, children forked by the slave, including those through Subprocess, 
 will continue running if the slave exits. For some processes, including 
 helper processes like the fetcher, du, or perf, we'd like them to be 
 terminated when the slave exits.
 Add support to Subprocess to optionally set a DEATHSIG for the child, e.g., 
 setting SIGTERM would mean the child would get SIGTERM when the slave 
 terminates.
 This can be done (*after forking*) with PR_SET_DEATHSIG. See man prctl. It 
 is preserved through an exec call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2367) Improve slave resiliency in the face of orphan containers

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2367:
--
Sprint: Twitter Mesos Q1 Sprint 5  (was: Twitter Mesos Q1 Sprint 5, Twitter 
Mesos Q1 Sprint 6)

 Improve slave resiliency in the face of orphan containers 
 --

 Key: MESOS-2367
 URL: https://issues.apache.org/jira/browse/MESOS-2367
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Joe Smith
Assignee: Jie Yu
Priority: Critical

 Right now there's a case where a misbehaving executor can cause a slave 
 process to flap:
 {panel:title=Quote From [~jieyu]}
 {quote}
 1) User tries to kill an instance
 2) Slave sends {{KillTaskMessage}} to executor
 3) Executor sends kill signals to task processes
 4) Executor sends {{TASK_KILLED}} to slave
 5) Slave updates container cpu limit to be 0.01 cpus
 6) A user-process is still processing the kill signal
 7) the task process cannot exit since it has too little cpu share and is 
 throttled
 8) Executor itself terminates
 9) Slave tries to destroy the container, but cannot because the user-process 
 is stuck in the exit path.
 10) Slave restarts, and is constantly flapping because it cannot kill orphan 
 containers
 {quote}
 {panel}
 The slave's orphan container handling should be improved to deal with this 
 case despite ill-behaved users (framework writers).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-713) Support for adding subsystems to existing cgroup hierarchies.

2015-03-30 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes resolved MESOS-713.
--
Resolution: Won't Fix

Remounting cgroups is not really recommended and can cause significant 
confusion to the kernel.

 Support for adding subsystems to existing cgroup hierarchies.
 -

 Key: MESOS-713
 URL: https://issues.apache.org/jira/browse/MESOS-713
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Benjamin Mahler
Priority: Minor
  Labels: newbie, twitter

 Currently if a slave is restarted with additional subsystems, it will refuse 
 to proceed if those subsystems are not attached to the existing hierarchy.
 It's possible to add subsystems to existing hierarchies via re-mounting:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-Attaching_Subsystems_to_and_Detaching_Them_From_an_Existing_Hierarchy.html
 We can add support for this by calling mount with the MS_REMOUNT option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2200) bogus docker images result in bad error message to scheduler

2015-03-30 Thread Steven Borrelli (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387337#comment-14387337
 ] 

Steven Borrelli commented on MESOS-2200:


Wanted to +1 this issue. We have a deployment pipeline where there are times a 
user tries to deploy a nonexistent image. Right now we repeatedly get a 
TASK_FAILED in mesos with no output, so an admin has to look through docker 
logs to see what happened. 

 bogus docker images result in bad error message to scheduler
 

 Key: MESOS-2200
 URL: https://issues.apache.org/jira/browse/MESOS-2200
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker
Reporter: Jay Buffington
Assignee: Joerg Schad
  Labels: mesosphere

 When a scheduler specifies a bogus image in ContainerInfo mesos doesn't tell 
 the scheduler that the docker pull failed or why.
 This error is logged in the mesos-slave log, but it isn't given to the 
 scheduler (as far as I can tell):
 {noformat}
 E1218 23:50:55.406230  8123 slave.cpp:2730] Container 
 '8f70784c-3e40-4072-9ca2-9daed23f15ff' for executor 
 'thermos-1418946354013-xxx-xxx-curl-0-f500cc41-dd0a-4338-8cbc-d631cb588bb1' 
 of framework '20140522-213145-1749004561-5050-29512-' failed to start: 
 Failed to 'docker pull 
 docker-registry.example.com/doesntexist/hello1.1:latest': exit status = 
 exited with status 1 stderr = 2014/12/18 23:50:55 Error: image 
 doesntexist/hello1.1 not found
 {noformat}
 If the docker image is not in the registry, the scheduler should give the 
 user an error message.  If docker pull failed because of networking issues, 
 it should be retried.  Mesos should give the scheduler enough information to 
 be able to make that decision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2233) Run ASF CI mesos builds inside docker

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2233:
--
  Sprint: Twitter Mesos Q1 Sprint 6
Story Points: 5

 Run ASF CI mesos builds inside docker
 -

 Key: MESOS-2233
 URL: https://issues.apache.org/jira/browse/MESOS-2233
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone

 There are several limitations to mesos projects current state of CI, which is 
 run on builds.a.o
 -- Only runs on Ubuntu
 -- Doesn't run any tests that deal with cgroups
 -- Doesn't run any tests that need root permissions
 Now that ASF CI supports docker 
 (https://issues.apache.org/jira/browse/BUILDS-25), it would be great for the 
 Mesos project to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-2508) Slave recovering a docker container results in Unknow container error

2015-03-30 Thread Jay Buffington (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Buffington resolved MESOS-2508.
---
Resolution: Duplicate

Closing as dup of https://issues.apache.org/jira/browse/MESOS-2215

 Slave recovering a docker container results in Unknow container error
 ---

 Key: MESOS-2508
 URL: https://issues.apache.org/jira/browse/MESOS-2508
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker, slave
Affects Versions: 0.21.1
 Environment: Ubuntu 14.04.2 LTS
 Docker 1.5.0 (same error with 1.4.1)
 Mesos 0.21.1 installed from mesosphere ubuntu repo
 Marathon 0.8.0 installed from mesosphere ubuntu repo
Reporter: Geoffroy Jabouley
Priority: Minor

 I'm seeing some error logs occuring during a slave recovery of a Mesos task 
 running into a docker container.
 It does not impede slave recovery process, as the mesos task is still active 
 and running on the slave afeter the recovery.
 But there is something not working properly when the slave is recovering my 
 docker container. The slave detects my container as an Unknown container
 Cluster status:
 - 1 mesos-master, 1 mesos-slave, 1 marathon framework running on the host.
 - checkpointing is activated on both slave and framework
 - use native docker containerizer
 - 1 mesos task, started using marathon, is running inside a docker container 
 and is monitored by the mesos-slave
 Action:
 - restart the mesos-slave process (sudo restart mesos-slave)
 Expected:
 - docker container still running
 - mesos task still running
 - no error in the mesos slave log regarding recovery process
 Seen:
 - docker container still running
 - mesos task still running
 - {color:red}Several errors *Unknown container* in the mesos slave log during 
 recovery process{color}
 ---
 For what it forth, here are my investigations:
 1) The mesos task starts fine in the docker container 
 *e4b0de57edf3658046405eff2fbe2f91ac451e04360fc437c20fcfe448297330*. Docker 
 container name is set to *mesos-adb71dc4-c07d-42a9-8fed-264c241668ad* by 
 Mesos docker containerizer _i guess_...
 {code}
 I0317 09:56:14.300439  2784 slave.cpp:1083] Got assigned task 
 test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799 for framework 
 20150311-150951-3982541578-5050-50860-
 I0317 09:56:14.380702  2784 slave.cpp:1193] Launching task 
 test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799 for framework 
 20150311-150951-3982541578-5050-50860-
 I0317 09:56:14.384466  2784 slave.cpp:3997] Launching executor 
 test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799 of framework 
 20150311-150951-3982541578-5050-50860- in work directory 
 '/tmp/mesos/slaves/20150312-145235-3982541578-5050-1421-S0/frameworks/20150311-150951-3982541578-5050-50860-/executors/test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799/runs/adb71dc4-c07d-42a9-8fed-264c241668ad'
 I0317 09:56:14.390207  2784 slave.cpp:1316] Queuing task 
 'test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799' for executor 
 test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799 of framework 
 '20150311-150951-3982541578-5050-50860-
 I0317 09:56:14.421787  2782 docker.cpp:927] Starting container 
 'adb71dc4-c07d-42a9-8fed-264c241668ad' for task 
 'test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799' (and executor 
 'test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799') of framework 
 '20150311-150951-3982541578-5050-50860-'
 I0317 09:56:15.784143  2781 docker.cpp:633] Checkpointing pid 27080 to 
 '/tmp/mesos/meta/slaves/20150312-145235-3982541578-5050-1421-S0/frameworks/20150311-150951-3982541578-5050-50860-/executors/test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799/runs/adb71dc4-c07d-42a9-8fed-264c241668ad/pids/forked.pid'
 I0317 09:56:15.789443  2784 slave.cpp:2840] Monitoring executor 
 'test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799' of framework 
 '20150311-150951-3982541578-5050-50860-' in container 
 'adb71dc4-c07d-42a9-8fed-264c241668ad'
 I0317 09:56:15.862642  2784 slave.cpp:1860] Got registration for executor 
 'test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799' of framework 
 20150311-150951-3982541578-5050-50860- from 
 executor(1)@10.195.96.237:36021
 I0317 09:56:15.865319  2784 slave.cpp:1979] Flushing queued task 
 test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799 for executor 
 'test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799' of framework 
 20150311-150951-3982541578-5050-50860-
 I0317 09:56:15.885414  2787 slave.cpp:2215] Handling status update 
 TASK_RUNNING (UUID: 79f49cec-92c7-4660-b54e-22dd19c1e67c) for task 
 test-app-bveaf.7733257e-cc83-11e4-b930-56847afe9799 of framework 
 20150311-150951-3982541578-5050-50860- from 
 executor(1)@10.195.96.237:36021
 I0317 09:56:15.885902  2787 

[jira] [Updated] (MESOS-2571) Expose Memory Pressure in MemeIsolator

2015-03-30 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2571:
--
Sprint: Twitter Mesos Q1 Sprint 6

 Expose Memory Pressure in MemeIsolator
 --

 Key: MESOS-2571
 URL: https://issues.apache.org/jira/browse/MESOS-2571
 Project: Mesos
  Issue Type: Improvement
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2402) MesosContainerizerDestroyTest.LauncherDestroyFailure is flaky

2015-03-30 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378974#comment-14378974
 ] 

Vinod Kone edited comment on MESOS-2402 at 3/30/15 10:30 PM:
-

commit f98f26fa50e31ab399d156f942f4fb92edcd926e
Author: Vinod Kone vinodk...@gmail.com
Date:   Tue Mar 24 14:45:44 2015 -0700

Fixed flaky MesosContainerizerDestroyTest tests.

Review: https://reviews.apache.org/r/32454



was (Author: vinodkone):
commit 0c19d17eb8d24af5db45efb6e5e05de7bdfeb41b
Author: Vinod Kone vinodk...@gmail.com
Date:   Tue Mar 24 14:45:44 2015 -0700

Fixed flaky MesosContainerizerDestroyTest tests.

Review: https://reviews.apache.org/r/32454


 MesosContainerizerDestroyTest.LauncherDestroyFailure is flaky
 -

 Key: MESOS-2402
 URL: https://issues.apache.org/jira/browse/MESOS-2402
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Vinod Kone
Assignee: Vinod Kone
 Fix For: 0.23.0


 Failed to os::execvpe in childMain. Never seen this one before.
 {code}
 [ RUN  ] MesosContainerizerDestroyTest.LauncherDestroyFailure
 Using temporary directory 
 '/tmp/MesosContainerizerDestroyTest_LauncherDestroyFailure_QpjQEn'
 I0224 18:55:49.326912 21391 containerizer.cpp:461] Starting container 
 'test_container' for executor 'executor' of framework ''
 I0224 18:55:49.332252 21391 launcher.cpp:130] Forked child with pid '23496' 
 for container 'test_container'
 ABORT: (src/subprocess.cpp:165): Failed to os::execvpe in childMain
 *** Aborted at 1424832949 (unix time) try date -d @1424832949 if you are 
 using GNU date ***
 PC: @ 0x2b178c5db0d5 (unknown)
 I0224 18:55:49.340955 21392 process.cpp:2117] Dropped / Lost event for PID: 
 scheduler-509d37ac-296f-4429-b101-af433c1800e9@127.0.1.1:39647
 I0224 18:55:49.342300 21386 containerizer.cpp:911] Destroying container 
 'test_container'
 *** SIGABRT (@0x3e85bc8) received by PID 23496 (TID 0x2b178f9f0700) from 
 PID 23496; stack trace: ***
 @ 0x2b178c397cb0 (unknown)
 @ 0x2b178c5db0d5 (unknown)
 @ 0x2b178c5de83b (unknown)
 @   0x87a945 _Abort()
 @ 0x2b1789f610b9 process::childMain()
 I0224 18:55:49.391793 21386 containerizer.cpp:1120] Executor for container 
 'test_container' has exited
 I0224 18:55:49.400478 21391 process.cpp:2770] Handling HTTP event for process 
 'metrics' with path: '/metrics/snapshot'
 tests/containerizer_tests.cpp:485: Failure
 Value of: metrics.values[containerizer/mesos/container_destroy_errors]
   Actual: 16-byte object 02-00 00-00 17-2B 00-00 E0-86 0E-04 00-00 00-00
 Expected: 1u
 Which is: 1
 [  FAILED  ] MesosContainerizerDestroyTest.LauncherDestroyFailure (89 ms)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2573) Use Memory Test Helper to improve some test code.

2015-03-30 Thread Chi Zhang (JIRA)
Chi Zhang created MESOS-2573:


 Summary: Use Memory Test Helper to improve some test code.
 Key: MESOS-2573
 URL: https://issues.apache.org/jira/browse/MESOS-2573
 Project: Mesos
  Issue Type: Improvement
Reporter: Chi Zhang
Assignee: Chi Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2572) Add memory statistics tests.

2015-03-30 Thread Chi Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chi Zhang updated MESOS-2572:
-
Labels: twitter  (was: iso twitter)

 Add memory statistics tests.
 

 Key: MESOS-2572
 URL: https://issues.apache.org/jira/browse/MESOS-2572
 Project: Mesos
  Issue Type: Task
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2572) Add memory statistics tests.

2015-03-30 Thread Chi Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387387#comment-14387387
 ] 

Chi Zhang commented on MESOS-2572:
--

[~jieyu], i've broken up the diff into 5 small patches. Let me know when you 
have time to take a look at them. I will re-rebase, test and post them then.

 Add memory statistics tests.
 

 Key: MESOS-2572
 URL: https://issues.apache.org/jira/browse/MESOS-2572
 Project: Mesos
  Issue Type: Task
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: iso, twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2574) Namespace handle symlinks in port_mapping isolator should not be under /var/run/netns

2015-03-30 Thread Jie Yu (JIRA)
Jie Yu created MESOS-2574:
-

 Summary: Namespace handle symlinks in port_mapping isolator should 
not be under /var/run/netns
 Key: MESOS-2574
 URL: https://issues.apache.org/jira/browse/MESOS-2574
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu


Consider putting symlinks under /var/run/messo/netns. This is because 'ip' 
command assumes all files under /var/run/netns are valid namespaces without 
duplication and it has command like:

ip -all netns exec ip link

to list all links for each network namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2564) Kill superfluous forward declaration comments.

2015-03-30 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387545#comment-14387545
 ] 

Benjamin Mahler commented on MESOS-2564:


I'm a +1 but I believe [~benjaminhindman] enforced this style from the 
beginning.

 Kill superfluous forward declaration comments.
 --

 Key: MESOS-2564
 URL: https://issues.apache.org/jira/browse/MESOS-2564
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov
Priority: Minor
  Labels: easyfix, newbie

 We often prepend forward declarations with a comment, which is pretty 
 useless, e.g.: 
 {code}
 // Forward declarations.
 class LogStorageProcess;
 {code}
 or
 {code}
 // Forward declarations.
 namespace registry {
 class Slaves;
 }
 class Authorizer;
 class WhitelistWatcher;
 {code}
 This JIRA aims to clean up such comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2572) Add memory statistics tests.

2015-03-30 Thread Chi Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chi Zhang updated MESOS-2572:
-
Labels: iso twitter  (was: twitter)

 Add memory statistics tests.
 

 Key: MESOS-2572
 URL: https://issues.apache.org/jira/browse/MESOS-2572
 Project: Mesos
  Issue Type: Task
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: iso, twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2191) Add ContainerId to the TaskStatus message

2015-03-30 Thread Marcel Neuhausler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387724#comment-14387724
 ] 

Marcel Neuhausler commented on MESOS-2191:
--

Trying to answer your questions in reverse order:
4. Why is the internal mesos container id sufficient?
I need to correlate the mesos task id with the container-name in Docker.  
mesos- + mesos container id == name of container in Docker. That name becomes 
visible to the user if you run cAdvisor for example on your mesos slaves. 
Obviously it would be even nicer if Mesos would return the full container-name. 

3. As it is in the code, there's one Container per Executor, so you could 
theoretically use the ExecutorID for the correlation you mention. Why is that 
not enough?
It is my understanding that the ExecutorId is equal to the TaskId, so that 
wouldn't help in figuring out what the corresponding container name in Docker 
would be.

2. Are you attempting to extract, ultimately the docker container ID? If so, 
how would you do it?
No, I want to get the name of the container in docker

1. What exactly is your goal, i.e. If you had the mesos Container ID how would 
you use it?
We have two use-cases in which our own Mesos Framework needs to know about the 
container name in Docker:
a) We use cAdvisor to automatically collect metrics from running docker 
containers. The collector process talks to our Framework to get the container 
name for a corresponding mesos task.
c) Networking with Project Calico: When you set up a network-group in Calico 
you have to pass the Docker container name to Calico. Our Framework interacts 
with Calico and has to be able to correlate the Mesos TaskID with the Docker 
container name.

In general, I still have a hard time to understand why you try to hide some 
Mesos id even so that id gets visible to the user, for example in cAdvisor as 
part of the docker container name. I also would expect that for completeness 
reasons you would return all of the IDs in the Task message protobuf 
data-structure (messages.proto).. btw, ideally you would return the full 
container-name (mesos-683fd6a3-9dbc-4180-bb64-bf8b961cc50e) in the task message.



 Add ContainerId to the TaskStatus message
 -

 Key: MESOS-2191
 URL: https://issues.apache.org/jira/browse/MESOS-2191
 Project: Mesos
  Issue Type: Wish
  Components: containerization
Reporter: Marcel Neuhausler
Assignee: Alexander Rojas
  Labels: mesosphere

 {{TaskStatus}} provides the frameworks with certain information 
 ({{executorId}}, {{slaveId}}, etc.) which is useful when collecting 
 statistics about cluster performance; however, it is difficult to associate 
 tasks to the container it is executed since this information stays always 
 within mesos itself. Therefore it would be good to provide the framework 
 scheduler with this information, adding a new field in the {{TaskStatus}} 
 message.
 See comments for a use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2564) Kill superfluous forward declaration comments.

2015-03-30 Thread Benjamin Hindman (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387802#comment-14387802
 ] 

Benjamin Hindman commented on MESOS-2564:
-

The intention was to force people to capture all of their forward declarations 
at the top of the file, and by explicitly calling them out with a comment it 
made it more clear that is where they belong (where as it's already common 
convention to always put your includes at the beginning of the file, although 
not necessary). I don't really see a ton of value of removing these, I don't 
see how/why they are negatively impacting the code base?

 Kill superfluous forward declaration comments.
 --

 Key: MESOS-2564
 URL: https://issues.apache.org/jira/browse/MESOS-2564
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov
Priority: Minor
  Labels: easyfix, newbie

 We often prepend forward declarations with a comment, which is pretty 
 useless, e.g.: 
 {code}
 // Forward declarations.
 class LogStorageProcess;
 {code}
 or
 {code}
 // Forward declarations.
 namespace registry {
 class Slaves;
 }
 class Authorizer;
 class WhitelistWatcher;
 {code}
 This JIRA aims to clean up such comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2542) mesos containerizer should not allow tasks to run as root inside scheduler specified rootfs

2015-03-30 Thread Jay Buffington (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387897#comment-14387897
 ] 

Jay Buffington commented on MESOS-2542:
---

Also we should consider no_new_privs.  From 
https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt

{quote}
With no_new_privs set, execve promises not to grant the privilege to do 
anything that could not have been done without the execve call.  For example, 
the setuid and setgid bits will no longer change the uid or gid; file 
capabilities will not add to the permitted set
{quote}


 mesos containerizer should not allow tasks to run as root inside scheduler 
 specified rootfs
 ---

 Key: MESOS-2542
 URL: https://issues.apache.org/jira/browse/MESOS-2542
 Project: Mesos
  Issue Type: Technical task
  Components: containerization
Reporter: Jay Buffington

 If a task has root in the container it’s fairly well documented how to break 
 out of the chroot and get root privs outside the container.  Therefore, when 
 the mesos containerizer specifies an arbitrary rootfs to chroot into we need 
 to be careful to not allow the task to get root access.  
 There are likely at least two options to consider here.  One is user 
 namespaces[1] wherein the user has “root” inside the container, but outside 
 the container that root user is mapped to an unprivileged user.  Another 
 option is to mount all user specified rootfs with a nosetuid flag and 
 strictly control /etc/passwd.
 [1] https://lwn.net/Articles/532593/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)