[jira] [Updated] (MESOS-5960) Design doc for supporting seccomp in Mesos container
[ https://issues.apache.org/jira/browse/MESOS-5960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Guo updated MESOS-5960: --- Assignee: Jay Guo Component/s: containerization Issue Type: Task (was: Bug) > Design doc for supporting seccomp in Mesos container > > > Key: MESOS-5960 > URL: https://issues.apache.org/jira/browse/MESOS-5960 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jay Guo >Assignee: Jay Guo > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5960) Design doc for supporting seccomp in Mesos container
Jay Guo created MESOS-5960: -- Summary: Design doc for supporting seccomp in Mesos container Key: MESOS-5960 URL: https://issues.apache.org/jira/browse/MESOS-5960 Project: Mesos Issue Type: Bug Reporter: Jay Guo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5041) Add cgroups unified isolator
[ https://issues.apache.org/jira/browse/MESOS-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403377#comment-15403377 ] Jie Yu commented on MESOS-5041: --- commit 9567fb42062774ca841da8cbfb45119bc30a0a4d Author: haosdent huang Date: Mon Aug 1 22:03:41 2016 -0700 Implemented `CgroupsIsolatorProcess::cleanup`. Review: https://reviews.apache.org/r/49827/ > Add cgroups unified isolator > > > Key: MESOS-5041 > URL: https://issues.apache.org/jira/browse/MESOS-5041 > Project: Mesos > Issue Type: Task > Components: cgroups, isolation >Reporter: haosdent >Assignee: haosdent > Fix For: 1.1.0 > > > Implement the cgroups unified isolator for Mesos containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5923) Ubuntu 14.04 LTS GPU Isolator "/run" directory is noexec
[ https://issues.apache.org/jira/browse/MESOS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402306#comment-15402306 ] Jie Yu edited comment on MESOS-5923 at 8/2/16 4:51 AM: --- commit 48a492cd9d7d0a194735b9b4107a35b489c596e1 Author: Kevin Klues Date: Mon Aug 1 09:06:07 2016 -0700 Updated NvidiaVolume to mount as 'tmpfs' if parent fs is 'noexec'. This patch is in response to an issue we ran into on Ubuntu 14.04, where '/run' is being mounted as 'noexec' (MESOS-5923). Since our NvidiaVolume is created below this mount point, we are unable to execute any binaries we add to this volume. This causes problems, for example, when trying to execute 'nvidia-smi' from within a container that has this volume mounted in. To work around this issue, we detect if any mount point above the path where we create the volume is marked as 'noexec', and if so, we create a new 'tmpfs' mount for the volume without 'noexec' set. Review: https://reviews.apache.org/r/50592/ was (Author: jieyu): commit 48a492cd9d7d0a194735b9b4107a35b489c596e1 Author: Kevin Klues Date: Mon Aug 1 09:06:07 2016 -0700 Updated NvidiaVolume to mount as 'tmpfs' if parent fs is 'noexec'. This patch is in response to an issue we ran into on Ubuntu 14.04, where '/run' is being mounted as 'noexec' (MESOS-5923). Since our NvidiaVolume is created below this mount point, we are unable to execute any binaries we add to this volume. This causes problems, for example, when trying to execute 'nvidia-smi' from within a container that has this volume mounted in. To work around this issue, we detect if any mount point above the path where we create the volume is marked as 'noexec', and if so, we create a new 'tmpfs' mount for the volume without 'noexec' set. Review: https://reviews.apache.org/r/50592/ commit ad1f610508ca669b32b1cb7a4d5baf5f3b337b70 Author: Kevin Klues Date: Mon Aug 1 09:06:04 2016 -0700 Added check for root permissions to 'NvidiaVolume::create()'. Review: https://reviews.apache.org/r/50644/ > Ubuntu 14.04 LTS GPU Isolator "/run" directory is noexec > > > Key: MESOS-5923 > URL: https://issues.apache.org/jira/browse/MESOS-5923 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.0 > Environment: Ubuntu 14.04 LTS >Reporter: Bill Zhao >Assignee: Kevin Klues > Labels: gpu, mesosphere > Fix For: 1.0.1 > > > In Ubuntu 14.04 LTS the mount for /run directory is noexec. It affect the > {{/var/run/mesos/isolators/gpu/nvidia_352.63/bin}} directory which mesos GPU > isolators depended on. > {{bill@billz:/var/run$ mount | grep noexec > proc on /proc type proc (rw,noexec,nosuid,nodev) > sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) > devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)}} > The /var/run is link to /run: > {{bill@billz:/var$ ll > total 52 > drwxr-xr-x 13 root root 4096 May 5 20:00 ./ > drwxr-xr-x 27 root root 4096 Jul 14 17:29 ../ > lrwxrwxrwx 1 root root9 May 5 19:50 lock -> /run/lock/ > drwxrwxr-x 19 root syslog 4096 Jul 28 08:00 log/ > drwxr-xr-x 2 root root 4096 Aug 4 2015 opt/ > lrwxrwxrwx 1 root root4 May 5 19:50 run -> /run/}} > Current the work around is mount without noexec: > {{sudo mount -o remount,exec /run}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5388) MesosContainerizerLaunch flags execute arbitrary commands via shell
[ https://issues.apache.org/jira/browse/MESOS-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403069#comment-15403069 ] Jie Yu commented on MESOS-5388: --- commit 9c6097f063405279efc07eec22457c2059653f07 Author: Gilbert Song Date: Mon Aug 1 17:07:00 2016 -0700 Updated filesystem linux isolator pre exec commands to be non-shell. Review: https://reviews.apache.org/r/50216/ > MesosContainerizerLaunch flags execute arbitrary commands via shell > --- > > Key: MESOS-5388 > URL: https://issues.apache.org/jira/browse/MESOS-5388 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: James DeFelice >Assignee: Gilbert Song > Labels: mesosphere, security > > For example, the docker volume isolator's containerPath is appended (without > sanitation) to a command that's executed in this manner. As such, it's > possible to inject arbitrary shell commands to be executed by mesos. > https://github.com/apache/mesos/blob/17260204c833c643adf3d8f36ad8a1a606ece809/src/slave/containerizer/mesos/launch.cpp#L206 > Perhaps instead of strings these commands could/should be sent as string > arrays that could be passed as argv arguments w/o shell interpretation? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3548) Investigate federations of Mesos masters
[ https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhilip Kumar S updated MESOS-3548: -- Description: In a large Mesos installation, the operator might want to ensure that even if the Mesos masters are inaccessible or failed, new tasks can still be scheduled (across multiple different frameworks). HA masters are only a partial solution here: the masters might still be inaccessible due to a correlated failure (e.g., Zookeeper misconfiguration/human error). To support this, we could support the notion of "hierarchies" or "federations" of Mesos masters. In a Mesos installation with 10k machines, the operator might configure 10 Mesos masters (each of which might be HA) to manage 1k machines each. Then an additional "meta-Master" would manage the allocation of cluster resources to the 10 masters. Hence, the failure of any individual master would impact 1k machines at most. The meta-master might not have a lot of work to do: e.g., it might be limited to occasionally reallocating cluster resources among the 10 masters, or ensuring that newly added cluster resources are allocated among the masters as appropriate. Hence, the failure of the meta-master would not prevent any of the individual masters from scheduling new tasks. A single framework instance probably wouldn't be able to use more resources than have been assigned to a single Master, but that seems like a reasonable restriction. This feature might also be a good fit for a multi-datacenter deployment of Mesos: each Mesos master instance would manage a single DC. Naturally, reducing the traffic between frameworks and the meta-master would be important for performance reasons in a configuration like this. Operationally, this might be simpler if Mesos processes were self-hosting ([MESOS-3547]). Intial Design document: https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit# Initial Survey : https://goo.gl/forms/DpVRV9Zh3kunhJkP2 was: In a large Mesos installation, the operator might want to ensure that even if the Mesos masters are inaccessible or failed, new tasks can still be scheduled (across multiple different frameworks). HA masters are only a partial solution here: the masters might still be inaccessible due to a correlated failure (e.g., Zookeeper misconfiguration/human error). To support this, we could support the notion of "hierarchies" or "federations" of Mesos masters. In a Mesos installation with 10k machines, the operator might configure 10 Mesos masters (each of which might be HA) to manage 1k machines each. Then an additional "meta-Master" would manage the allocation of cluster resources to the 10 masters. Hence, the failure of any individual master would impact 1k machines at most. The meta-master might not have a lot of work to do: e.g., it might be limited to occasionally reallocating cluster resources among the 10 masters, or ensuring that newly added cluster resources are allocated among the masters as appropriate. Hence, the failure of the meta-master would not prevent any of the individual masters from scheduling new tasks. A single framework instance probably wouldn't be able to use more resources than have been assigned to a single Master, but that seems like a reasonable restriction. This feature might also be a good fit for a multi-datacenter deployment of Mesos: each Mesos master instance would manage a single DC. Naturally, reducing the traffic between frameworks and the meta-master would be important for performance reasons in a configuration like this. Operationally, this might be simpler if Mesos processes were self-hosting ([MESOS-3547]). > Investigate federations of Mesos masters > > > Key: MESOS-3548 > URL: https://issues.apache.org/jira/browse/MESOS-3548 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Dhilip Kumar S > Labels: federation, mesosphere, multi-dc > > In a large Mesos installation, the operator might want to ensure that even if > the Mesos masters are inaccessible or failed, new tasks can still be > scheduled (across multiple different frameworks). HA masters are only a > partial solution here: the masters might still be inaccessible due to a > correlated failure (e.g., Zookeeper misconfiguration/human error). > To support this, we could support the notion of "hierarchies" or > "federations" of Mesos masters. In a Mesos installation with 10k machines, > the operator might configure 10 Mesos masters (each of which might be HA) to > manage 1k machines each. Then an additional "meta-Master" would manage the > allocation of cluster resources to the 10 masters. Hence, the failure of any > individual master would impact 1k machines at most. The meta-master might n
[jira] [Commented] (MESOS-4862) Setting failover_timeout in FrameworkInfo to Double.MAX_VALUE causes it to be set to zero
[ https://issues.apache.org/jira/browse/MESOS-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403028#comment-15403028 ] Steven Schlansker commented on MESOS-4862: -- Is this a duplicate of https://issues.apache.org/jira/browse/MESOS-1575 ? > Setting failover_timeout in FrameworkInfo to Double.MAX_VALUE causes it to be > set to zero > - > > Key: MESOS-4862 > URL: https://issues.apache.org/jira/browse/MESOS-4862 > Project: Mesos > Issue Type: Bug > Components: master, stout >Reporter: Timothy Chen > > Currently we expose framework failover_timeout as a double in Proto, and if > users set the failover_timeout to Double.MAX_VALUE, the Master will actually > set it to zero which is the complete opposite of the original intent. > The problem is that in stout/duration.hpp we only store down to the > nanoseconds with int64_t, and it gives an error when we pass double.max as it > goes out of the int64_t bounds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4992) sandbox uri does not work outisde mesos http server
[ https://issues.apache.org/jira/browse/MESOS-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403013#comment-15403013 ] Benjamin Mahler commented on MESOS-4992: It looks like this was done intentionally: {code} // When navigating directly to this page, e.g. pasting the URL into the // browser, the previous page is not a page in Mesos. In that case, navigate // home. if (!$scope.agents) { $alert.danger({ message: "Navigate to the agent's sandbox via the Mesos UI.", title: "Failed to find agents." }); return $location.path('/').replace(); } {code} >From here: >https://github.com/apache/mesos/blob/8dc71da12c9b91edd2fa6c7b9a0a088b7dbb0ad3/src/webui/master/static/js/controllers.js#L751-L760 Looking at the [commit|https://github.com/apache/mesos/commit/270b7594c8eb3dd0d4db8461c3ee1108fa16b45d] and the code, it's not clear to me why this check was introduced since it is not being done in the other controllers. > sandbox uri does not work outisde mesos http server > --- > > Key: MESOS-4992 > URL: https://issues.apache.org/jira/browse/MESOS-4992 > Project: Mesos > Issue Type: Bug > Components: webui >Affects Versions: 0.27.1 >Reporter: Stavros Kontopoulos > Labels: mesosphere > > The SandBox uri of a framework does not work if i just copy paste it to the > browser. > For example the following sandbox uri: > http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/frameworks/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009/executors/driver-20160321155016-0001/browse > should redirect to: > http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/browse?path=%2Ftmp%2Fmesos%2Fslaves%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0%2Fframeworks%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009%2Fexecutors%2Fdriver-20160321155016-0001%2Fruns%2F60533483-31fb-4353-987d-f3393911cc80 > yet it fails with the message: > "Failed to find slaves. > Navigate to the slave's sandbox via the Mesos UI." > and redirects to: > http://172.17.0.1:5050/#/ > It is an issue for me because im working on expanding the mesos spark ui with > sandbox uri, The other option is to get the slave info and parse the json > file there and get executor paths not so straightforward or elegant though. > Moreover i dont see the runs/container_id in the Mesos Proto Api. I guess > this is hidden info, this is the needed piece of info to re-write the uri > without redirection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5959) All non-root tests fail on GPU machine
[ https://issues.apache.org/jira/browse/MESOS-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402848#comment-15402848 ] Kevin Klues commented on MESOS-5959: https://reviews.apache.org/r/50671/ https://reviews.apache.org/r/50672/ > All non-root tests fail on GPU machine > -- > > Key: MESOS-5959 > URL: https://issues.apache.org/jira/browse/MESOS-5959 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: gpu, mesosphere > Fix For: 1.0.1 > > > A recent addition to ensure that {{NvidiaVolume::create()}} ran as root broke > all non-root tests on GPU machines. The reason is that we unconditionally > create this volume so long as we detect {{nvml.isAvailable()}} which will > fail now that we are only allowed to create this volume if we have root > permissions. > We should fix this by adding the proper conditions to determine when / if we > should create this volume based on some combination of {{\-\-containerizer}} > and {{\-\-isolation}} flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5894) cpu share should be considered distinctly from cpu allocation
[ https://issues.apache.org/jira/browse/MESOS-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402846#comment-15402846 ] Vinod Kone commented on MESOS-5894: --- Currently, an executor and any of its tasks share the same cgroup limits. The cgroup limits are increased/decreased as tasks come and go from the executor. So yes, there is no isolation between tasks of an executor and executor itself. This will change in the future when we allow tasks to have their own resource limits. See https://issues.apache.org/jira/browse/MESOS- > cpu share should be considered distinctly from cpu allocation > - > > Key: MESOS-5894 > URL: https://issues.apache.org/jira/browse/MESOS-5894 > Project: Mesos > Issue Type: Improvement > Components: allocation >Affects Versions: 0.28.2 > Environment: Linux, cgroups, docker >Reporter: Christopher Hunt > Labels: cgroups, cpu-usage, docker > > As a framework developer I wish to explicitly declare the cpu.share for a > task and its associated executor so that I may have a direct means of > controlling their respective cpu usage at runtime. > With current behaviour, I've noticed that the cgroup cpu.share for a task > includes both the executor's cpu.share and also the cpu value specified as a > task's resources. The cpu.share value appears to be calculated as a multiple > of 1024, therefore 1 cpu == 1024, 0.1 cpu = 102 and so forth. I find this > behaviour to be unexpected, and also an overloading of the meaning of the > resource cpu type. My understanding of the resource cpu type is that it is > used primarily for decrementing from the total number of cpus available to a > node, and thereby influences the resource offers made to a given framework in > consideration of other frameworks. On the other hand, cpu shares limit the > amount of cpu used at runtime. > By way of a solution, perhaps a new Resource type could be introduced named > "cpu-share" and optionally provided by a scheduler when constructing a > TaskInfo. The cpu share resource could also be optionally specified for the > associated executor. By not specifying the cpu share, the existing behaviour > is preserved thereby providing backward compatibility. > Related issue: https://issues.apache.org/jira/browse/MESOS-1718 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5959) All non-root tests fail on GPU machine
[ https://issues.apache.org/jira/browse/MESOS-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-5959: --- Description: A recent addition to ensure that {{NvidiaVolume::create()}} ran as root broke all non-root tests on GPU machines. The reason is that we unconditionally create this volume so long as we detect {{nvml.isAvailable()}} which will fail now that we are only allowed to create this volume if we have root permissions. We should fix this by adding the proper conditions to determine when / if we should create this volume based on some combination of {{\-\-containerizer}} and {{\-\-isolation}} flags. was: A recent addition to ensure that {{NvidiaVolume::create() }} ran as root broke all non-root tests on GPU machines. The reason is that we unconditionally create this volume so long as we detect {{nvml.isAvailable()}} which will fail now that we are only allowed to create this volume if we have root permissions. We should fix this by adding the proper conditions to determine when / if we should create this volume based on some combination of {{\-\-containerizer}} and {{\-\-isolation}} flags. > All non-root tests fail on GPU machine > -- > > Key: MESOS-5959 > URL: https://issues.apache.org/jira/browse/MESOS-5959 > Project: Mesos > Issue Type: Bug >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: gpu, mesosphere > Fix For: 1.0.1 > > > A recent addition to ensure that {{NvidiaVolume::create()}} ran as root broke > all non-root tests on GPU machines. The reason is that we unconditionally > create this volume so long as we detect {{nvml.isAvailable()}} which will > fail now that we are only allowed to create this volume if we have root > permissions. > We should fix this by adding the proper conditions to determine when / if we > should create this volume based on some combination of {{\-\-containerizer}} > and {{\-\-isolation}} flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5959) All non-root tests fail on GPU machine
Kevin Klues created MESOS-5959: -- Summary: All non-root tests fail on GPU machine Key: MESOS-5959 URL: https://issues.apache.org/jira/browse/MESOS-5959 Project: Mesos Issue Type: Bug Reporter: Kevin Klues Assignee: Kevin Klues Fix For: 1.0.1 A recent addition to ensure that {{NvidiaVolume::create() }} ran as root broke all non-root tests on GPU machines. The reason is that we unconditionally create this volume so long as we detect {{nvml.isAvailable()}} which will fail now that we are only allowed to create this volume if we have root permissions. We should fix this by adding the proper conditions to determine when / if we should create this volume based on some combination of {{\-\-containerizer}} and {{\-\-isolation}} flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5958) Reviewbot failing due to python files not being cleaned up after distclean
[ https://issues.apache.org/jira/browse/MESOS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402809#comment-15402809 ] Vinod Kone commented on MESOS-5958: --- What's a bit surprising is that both the ReviewBot jenkins job and the main Mesos job run the same command (./support/docker_build.sh) and the latter doesn't seem to error out on this failure; although both the jobs show these errors during the cleanup phase. Successful Mesos build: https://builds.apache.org/view/M-R/view/Mesos/job/Mesos/2570/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)/console {code} find python -name "build" -o -name "dist" -o -name "*.pyc" \ -o -name "*.egg-info" | xargs rm -rf test -z "libmesos_no_3rdparty.la libbuild.la liblog.la libstate.la libjava.la libexamplemodule.la libtestallocator.la libtestanonymous.la libtestauthentication.la libtestauthorizer.la libtestcontainer_logger.la libtesthook.la libtesthttpauthenticator.la libtestisolator.la libtestmastercontender.la libtestmasterdetector.la libtestqos_controller.la libtestresource_estimator.la" || rm -f libmesos_no_3rdparty.la libbuild.la liblog.la libstate.la libjava.la libexamplemodule.la libtestallocator.la libtestanonymous.la libtestauthentication.la libtestauthorizer.la libtestcontainer_logger.la libtesthook.la libtesthttpauthenticator.la libtestisolator.la libtestmastercontender.la libtestmasterdetector.la libtestqos_controller.la libtestresource_estimator.la test -z "liblogrotate_container_logger.la libfixed_resource_estimator.la libload_qos_controller.la " || rm -f liblogrotate_container_logger.la libfixed_resource_estimator.la libload_qos_controller.la rm -f mesos-fetcher mesos-executor mesos-containerizer mesos-logrotate-logger mesos-health-check mesos-usage mesos-docker-executor rm -f ./so_locations rm -f mesos-agent mesos-master mesos-slave rm -f ./so_locations rm -f *.o rm -f *.lo rm -f *.tab.c rm -f ./so_locations test -z "" || rm -f rm -f ../include/mesos/*.o rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags test . = "../../src" || test -z "" || rm -f rm -f ../include/mesos/*.lo rm -f ../include/mesos/.deps/.dirstamp rm -f ../include/mesos/agent/*.o rm -f ../include/mesos/agent/*.lo rm -f ../include/mesos/.dirstamp rm -f ../include/mesos/allocator/*.o rm -f ../include/mesos/agent/.deps/.dirstamp rm -f ../include/mesos/agent/.dirstamp rm -f ../include/mesos/allocator/*.lo rm -f ../include/mesos/allocator/.deps/.dirstamp rm -f ../include/mesos/appc/*.o rm -f ../include/mesos/allocator/.dirstamp rm -f ../include/mesos/appc/*.lo rm -f ../include/mesos/appc/.deps/.dirstamp rm: cannot remove 'python/cli/build': Is a directory rm: cannot remove 'python/executor/build': Is a directory rm: cannot remove 'python/interface/build': Is a directory rm: cannot remove 'python/native/build': Is a directory rm: cannot remove 'python/scheduler/build': Is a directory rm -f ../include/mesos/authentication/*.o make[2]: [clean-generic] Error 1 (ignored) rm -f ../include/mesos/appc/.dirstamp rm -f ../include/mesos/authentication/*.lo rm -f ../include/mesos/authentication/.deps/.dirstamp rm -f ../include/mesos/authorizer/*.o rm -f ../include/mesos/authentication/.dirstamp rm -f ../include/mesos/authorizer/*.lo rm -f ../include/mesos/authorizer/.deps/.dirstamp rm -f ../include/mesos/containerizer/*.o rm -f ../include/mesos/authorizer/.dirstamp rm -f ../include/mesos/containerizer/*.lo rm -f ../include/mesos/containerizer/.deps/.dirstamp rm -f ../include/mesos/docker/*.o rm -f ../include/mesos/containerizer/.dirstamp rm -f ../include/mesos/docker/*.lo rm -f ../include/mesos/docker/.deps/.dirstamp rm -f ../include/mesos/docker/.dirstamp rm -f ../include/mesos/executor/*.o rm -f ../include/mesos/executor/.deps/.dirstamp rm -f ../include/mesos/executor/*.lo rm -f ../include/mesos/executor/.dirstamp rm -f ../include/mesos/fetcher/*.o rm -f ../include/mesos/fetcher/.deps/.dirstamp rm -f ../include/mesos/fetcher/*.lo rm -f ../include/mesos/fetcher/.dirstamp rm -f ../include/mesos/maintenance/*.o rm -f ../include/mesos/maintenance/.deps/.dirstamp rm -f ../include/mesos/maintenance/*.lo rm -f ../include/mesos/maintenance/.dirstamp rm -f ../include/mesos/master/*.o rm -f ../include/mesos/master/.deps/.dirstamp rm -f ../include/mesos/master/*.lo rm -f ../include/mesos/master/.dirstamp rm -f ../include/mesos/module/*.o rm -f ../include/mesos/module/.deps/.dirstamp rm -f ../include/mesos/module/*.lo rm -f ../include/mesos/module/.dirstamp rm -f ../include/mesos/quota/*.o rm -f ../include/mesos/quota/.deps/.dirstamp rm -f ../include/mesos/quota/*.lo rm -f ../include/mesos/quota/.dirstamp rm -f ../include/mesos/scheduler/*.o rm -f ../include/mesos/scheduler/.deps/.dirstamp rm -f ../include/mesos/scheduler/*.lo rm -f ../include/mesos/scheduler/.dirstamp rm -f
[jira] [Updated] (MESOS-5958) Reviewbot failing due to python files not being cleaned up after distclean
[ https://issues.apache.org/jira/browse/MESOS-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-5958: -- Description: This is on ASF CI. https://builds.apache.org/job/mesos-reviewbot/14573/consoleFull {code} find python -name "build" -o -name "dist" -o -name "*.pyc" \ -o -name "*.egg-info" -exec rm -rf '{}' \+ test -z "libmesos_no_3rdparty.la libbuild.la liblog.la libstate.la libjava.la libexamplemodule.la libtestallocator.la libtestanonymous.la libtestauthentication.la libtestauthorizer.la libtestcontainer_logger.la libtesthook.la libtesthttpauthenticator.la libtestisolator.la libtestmastercontender.la libtestmasterdetector.la libtestqos_controller.la libtestresource_estimator.la" || rm -f libmesos_no_3rdparty.la libbuild.la liblog.la libstate.la libjava.la libexamplemodule.la libtestallocator.la libtestanonymous.la libtestauthentication.la libtestauthorizer.la libtestcontainer_logger.la libtesthook.la libtesthttpauthenticator.la libtestisolator.la libtestmastercontender.la libtestmasterdetector.la libtestqos_controller.la libtestresource_estimator.la test -z "liblogrotate_container_logger.la libfixed_resource_estimator.la libload_qos_controller.la " || rm -f liblogrotate_container_logger.la libfixed_resource_estimator.la libload_qos_controller.la rm -f mesos-fetcher mesos-executor mesos-containerizer mesos-logrotate-logger mesos-health-check mesos-usage mesos-docker-executor rm -f mesos-agent mesos-master mesos-slave rm -f ./so_locations rm -f *.o rm -f *.lo rm -f ../include/mesos/*.o rm -f ./so_locations rm -f *.tab.c test -z "" || rm -f rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags rm -f ./so_locations rm -f ../include/mesos/*.lo test . = "../../src" || test -z "" || rm -f rm -f ../include/mesos/.deps/.dirstamp rm -f ../include/mesos/agent/*.o rm -f ../include/mesos/agent/*.lo rm -f ../include/mesos/.dirstamp rm -f ../include/mesos/allocator/*.o rm -f ../include/mesos/agent/.deps/.dirstamp rm -f ../include/mesos/allocator/*.lo rm -f ../include/mesos/agent/.dirstamp rm -f ../include/mesos/appc/*.o rm -f ../include/mesos/allocator/.deps/.dirstamp rm -f ../include/mesos/appc/*.lo rm -f ../include/mesos/allocator/.dirstamp rm -f ../include/mesos/authentication/*.o rm -f ../include/mesos/appc/.deps/.dirstamp rm -f ../include/mesos/authentication/*.lo rm -f ../include/mesos/appc/.dirstamp rm -f ../include/mesos/authorizer/*.o rm -f ../include/mesos/authentication/.deps/.dirstamp rm -f ../include/mesos/authorizer/*.lo rm -f ../include/mesos/authentication/.dirstamp rm -f ../include/mesos/containerizer/*.o rm -f ../include/mesos/authorizer/.deps/.dirstamp rm -f ../include/mesos/containerizer/*.lo rm -f ../include/mesos/authorizer/.dirstamp rm -f ../include/mesos/docker/*.o rm -f ../include/mesos/containerizer/.deps/.dirstamp rm -f ../include/mesos/docker/*.lo rm: cannot remove 'python/cli/build': Is a directory rm: cannot remove 'python/executor/build': Is a directory rm: cannot remove 'python/interface/build': Is a directory rm: cannot remove 'python/native/build': Is a directory rm: cannot remove 'python/scheduler/build': Is a directory rm -f ../include/mesos/containerizer/.dirstamp make[2]: [clean-generic] Error 1 (ignored) rm -f ../include/mesos/executor/*.o rm -f ../include/mesos/docker/.deps/.dirstamp rm -f ../include/mesos/executor/*.lo rm -f ../include/mesos/docker/.dirstamp rm -f ../include/mesos/fetcher/*.o rm -f ../include/mesos/executor/.deps/.dirstamp rm -f ../include/mesos/fetcher/*.lo rm -f ../include/mesos/executor/.dirstamp rm -f ../include/mesos/maintenance/*.o rm -f ../include/mesos/fetcher/.deps/.dirstamp rm -f ../include/mesos/maintenance/*.lo rm -f ../include/mesos/fetcher/.dirstamp rm -f ../include/mesos/master/*.o rm -f ../include/mesos/maintenance/.deps/.dirstamp rm -f ../include/mesos/master/*.lo rm -f ../include/mesos/module/*.o rm -f ../include/mesos/maintenance/.dirstamp rm -f ../include/mesos/module/*.lo rm -f ../include/mesos/master/.deps/.dirstamp rm -f ../include/mesos/master/.dirstamp rm -f ../include/mesos/quota/*.o rm -f ../include/mesos/module/.deps/.dirstamp rm -f ../include/mesos/quota/*.lo rm -f ../include/mesos/module/.dirstamp rm -f ../include/mesos/scheduler/*.o rm -f ../include/mesos/quota/.deps/.dirstamp rm -f ../include/mesos/scheduler/*.lo rm -f ../include/mesos/quota/.dirstamp rm -f ../include/mesos/slave/*.o rm -f ../include/mesos/scheduler/.deps/.dirstamp rm -f ../include/mesos/scheduler/.dirstamp rm -f ../include/mesos/slave/*.lo rm -f ../include/mesos/slave/.deps/.dirstamp rm -f ../include/mesos/state/*.o rm -f ../include/mesos/slave/.dirstamp rm -f ../include/mesos/state/*.lo rm -f ../include/mesos/state/.deps/.dirstamp rm -f ../include/mesos/uri/*.o rm -f ../include/mesos/state/.dirstamp rm -f ../include/mesos/uri/*.lo rm -f ../include/mesos/uri/.deps/.dirstamp rm -f ../include/mesos/v1/*.o rm -f ../include/mesos/uri/.dirstamp rm -f ../
[jira] [Assigned] (MESOS-5930) Orphan tasks shown as RUNNING have state TASK_FINISHED
[ https://issues.apache.org/jira/browse/MESOS-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar reassigned MESOS-5930: - Assignee: Anand Mazumdar > Orphan tasks shown as RUNNING have state TASK_FINISHED > -- > > Key: MESOS-5930 > URL: https://issues.apache.org/jira/browse/MESOS-5930 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.0.0 >Reporter: Lukas Loesche >Assignee: Anand Mazumdar > Fix For: 1.0.1 > > Attachments: Screen Shot 2016-07-29 at 19.23.49.png, Screen Shot > 2016-07-29 at 19.24.03.png, orphan-running.txt > > > On my cluster I have 111 Orphan Tasks of which some are RUNNING some are > FINISHED and some are FAILED. When I open the task details for a FINISHED > tasks the following page shows a state of TASK_FINISHED and likewise when I > open a FAILED task the details page shows TASK_FAILED. > However when I open the details for the RUNNING tasks they all have a task > state of TASK_FINISHED. None of them is in state TASK_RUNNING. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402768#comment-15402768 ] Gilbert Song commented on MESOS-5953: - A discussion is needed to prefer which way we should go. > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402767#comment-15402767 ] Gilbert Song commented on MESOS-5953: - [~philwinder], understand your concern. You want the docker image created by Dockerfile using default working_dir can be executable in unified containerizer. We need to leverage the pros and cons: 1. Using mesos container sandbox: Pros: Keep the semantics in mesos consistent. Any files/dirs under the sandbox will not be lost even the container is killed (bind mounted to host sandbox). Persistent volumes should be accessible in the container sandbox. Cons: operators may need to add one more layer to the image to specify the working_dir as "/". 2. Using "/" by default: Pros: will not break some docker images which has some default entrypoint/cmd using "/" as working_dir. Cons: Semantic is not guaranteed to be the same in docker. semantics in mesos insistency. > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5792) Add mesos tests to CMake (make check)
[ https://issues.apache.org/jira/browse/MESOS-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402763#comment-15402763 ] Joseph Wu commented on MESOS-5792: -- More progress: {code} commit 7dbc74efaea3e4ec185bfbd0c503a61ac2a5f1e1 Author: Srinivas Brahmaroutu Date: Thu Jul 28 12:37:19 2016 -0700 CMake: Added `setns` and `active-user` test helper binaries. These binaries are required for `NsTest.ROOT_setns` and `SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser`. Review: https://reviews.apache.org/r/50064/ {code} {code} commit 697b55a733d15a5bdc0a524f062f3dd93263a224 Author: Srinivas Brahmaroutu Date: Thu Jul 28 14:49:41 2016 -0700 CMake: Added LogrotateContainerLogger companion executable. This binary is required for the various `LOGROTATE_*` tests. For now, this binary is not built on Windows due to some optimizations made inside the executable. Review: https://reviews.apache.org/r/50179/ {code} {code} commit dac771f1e2fafbf9ad8adfebc491933d64a21d66 Author: Srinivas Brahmaroutu Date: Thu Jul 28 15:13:42 2016 -0700 CMake: Added build script for mesos-local executable. This executable is used to run a local Mesos cluster for testing purposes. Review: https://reviews.apache.org/r/50323/ {code} {code} commit 0c2166c4e68748a285a680f32b1dbf51d865f245 Author: Srinivas Brahmaroutu Date: Thu Jul 28 15:55:48 2016 -0700 CMake: Added script to build mesos-execute. `mesos-execute` is a utility that can schedule and run a single task. Review: https://reviews.apache.org/r/50324/ {code} > Add mesos tests to CMake (make check) > - > > Key: MESOS-5792 > URL: https://issues.apache.org/jira/browse/MESOS-5792 > Project: Mesos > Issue Type: Improvement > Components: build >Reporter: Srinivas >Assignee: Srinivas > Labels: build, mesosphere > Original Estimate: 168h > Remaining Estimate: 168h > > Provide CMakeLists.txt and configuration files to build mesos tests using > CMake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5388) MesosContainerizerLaunch flags execute arbitrary commands via shell
[ https://issues.apache.org/jira/browse/MESOS-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402727#comment-15402727 ] Jie Yu commented on MESOS-5388: --- commit ca5eaad82f69309de427aab3ec2ed7976c9cc850 Author: Gilbert Song Date: Mon Aug 1 13:05:53 2016 -0700 Updated docker volume isolator to return non-shell 'pre_exec_commands'. Review: https://reviews.apache.org/r/50535/ commit 202e1933c592f456420ec1c85fd9a21d0df9 Author: Gilbert Song Date: Mon Aug 1 13:03:16 2016 -0700 Updated mesos containerizer launch execute() to return 'EXIT_FAILURE'. Review: https://reviews.apache.org/r/50534/ > MesosContainerizerLaunch flags execute arbitrary commands via shell > --- > > Key: MESOS-5388 > URL: https://issues.apache.org/jira/browse/MESOS-5388 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: James DeFelice >Assignee: Gilbert Song > Labels: mesosphere, security > > For example, the docker volume isolator's containerPath is appended (without > sanitation) to a command that's executed in this manner. As such, it's > possible to inject arbitrary shell commands to be executed by mesos. > https://github.com/apache/mesos/blob/17260204c833c643adf3d8f36ad8a1a606ece809/src/slave/containerizer/mesos/launch.cpp#L206 > Perhaps instead of strings these commands could/should be sent as string > arrays that could be passed as argv arguments w/o shell interpretation? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5933) Refactor the uri::Fetcher as a binary.
[ https://issues.apache.org/jira/browse/MESOS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402653#comment-15402653 ] Joseph Wu commented on MESOS-5933: -- You can do that by scheduling a task :) One of the motivations behind the URI fetcher is to get rid of the extra binary (harder to maintain and has some odd undesirable behavior in some configurations). By running the fetcher separately, you'll end up testing a different code path. This is especially true if the fetcher becomes pluggable. > Refactor the uri::Fetcher as a binary. > -- > > Key: MESOS-5933 > URL: https://issues.apache.org/jira/browse/MESOS-5933 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Gilbert Song >Assignee: Zhitao Li > Labels: fetcher, mesosphere > > By refactoring the uri::Fetcher as a binary, the fetcher can be used > independently. Not only mesos, but also new fetcher plugin testing, mesos cli > and many other new components in the future can re-use the binary to fetch > any URI with different schemes. Ideally, after this change, mesos cli is able > to re-use the uri::Fetcher binary to introduce new image pulling commands, > e.g., `mesos fetch -i `. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5958) Reviewbot failing due to python files not being cleaned up after distclean
Vinod Kone created MESOS-5958: - Summary: Reviewbot failing due to python files not being cleaned up after distclean Key: MESOS-5958 URL: https://issues.apache.org/jira/browse/MESOS-5958 Project: Mesos Issue Type: Bug Environment: ASF CI Reporter: Vinod Kone This is on ASF CI. {code} rm -rf ../include/mesos/.deps ../include/mesos/agent/.deps ../include/mesos/allocator/.deps ../include/mesos/appc/.deps ../include/mesos/authentication/.deps ../include/mesos/authorizer/.deps ../include/mesos/containerizer/.deps ../include/mesos/docker/.deps ../include/mesos/executor/.deps ../include/mesos/fetcher/.deps ../include/mesos/maintenance/.deps ../include/mesos/master/.deps ../include/mesos/module/.deps ../include/mesos/quota/.deps ../include/mesos/scheduler/.deps ../include/mesos/slave/.deps ../include/mesos/state/.deps ../include/mesos/uri/.deps ../include/mesos/v1/.deps ../include/mesos/v1/agent/.deps ../include/mesos/v1/allocator/.deps ../include/mesos/v1/executor/.deps ../include/mesos/v1/maintenance/.deps ../include/mesos/v1/master/.deps ../include/mesos/v1/quota/.deps ../include/mesos/v1/scheduler/.deps appc/.deps authentication/cram_md5/.deps authentication/http/.deps authorizer/.deps authorizer/local/.deps cli/.deps common/.deps docker/.deps examples/.deps exec/.deps executor/.deps files/.deps hdfs/.deps health-check/.deps hook/.deps internal/.deps java/jni/.deps jvm/.deps jvm/org/apache/.deps launcher/.deps launcher/posix/.deps linux/.deps linux/routing/.deps linux/routing/diagnosis/.deps linux/routing/filter/.deps linux/routing/link/.deps linux/routing/queueing/.deps local/.deps log/.deps log/tool/.deps logging/.deps master/.deps master/allocator/.deps master/allocator/mesos/.deps master/allocator/sorter/drf/.deps master/contender/.deps master/detector/.deps messages/.deps module/.deps sched/.deps scheduler/.deps slave/.deps slave/container_loggers/.deps slave/containerizer/.deps slave/containerizer/mesos/.deps slave/containerizer/mesos/isolators/appc/.deps slave/containerizer/mesos/isolators/cgroups/.deps slave/containerizer/mesos/isolators/docker/.deps slave/containerizer/mesos/isolators/docker/volume/.deps slave/containerizer/mesos/isolators/filesystem/.deps slave/containerizer/mesos/isolators/gpu/.deps slave/containerizer/mesos/isolators/namespaces/.deps slave/containerizer/mesos/isolators/network/.deps slave/containerizer/mesos/isolators/network/cni/.deps slave/containerizer/mesos/isolators/posix/.deps slave/containerizer/mesos/isolators/xfs/.deps slave/containerizer/mesos/provisioner/.deps slave/containerizer/mesos/provisioner/appc/.deps slave/containerizer/mesos/provisioner/backends/.deps slave/containerizer/mesos/provisioner/docker/.deps slave/qos_controllers/.deps slave/resource_estimators/.deps state/.deps tests/.deps tests/common/.deps tests/containerizer/.deps uri/.deps uri/fetchers/.deps usage/.deps v1/.deps version/.deps watcher/.deps zookeeper/.deps rm -f Makefile make[2]: Leaving directory `/mesos/mesos-1.1.0/_build/src' rm -f config.status config.cache config.log configure.lineno config.status.lineno rm -f Makefile ERROR: files left in build directory after distclean: ./src/python/executor/build/temp.linux-x86_64-2.7/src/mesos/executor/module.o ./src/python/executor/build/temp.linux-x86_64-2.7/src/mesos/executor/mesos_executor_driver_impl.o ./src/python/executor/build/temp.linux-x86_64-2.7/src/mesos/executor/proxy_executor.o ./src/python/executor/build/lib.linux-x86_64-2.7/mesos/executor/_executor.so ./src/python/executor/build/lib.linux-x86_64-2.7/mesos/executor/__init__.py ./src/python/executor/build/lib.linux-x86_64-2.7/mesos/__init__.py ./src/python/executor/ext_modules.pyc ./src/python/scheduler/build/temp.linux-x86_64-2.7/src/mesos/scheduler/module.o ./src/python/scheduler/build/temp.linux-x86_64-2.7/src/mesos/scheduler/mesos_scheduler_driver_impl.o ./src/python/scheduler/build/temp.linux-x86_64-2.7/src/mesos/scheduler/proxy_scheduler.o ./src/python/scheduler/build/lib.linux-x86_64-2.7/mesos/scheduler/_scheduler.so ./src/python/scheduler/build/lib.linux-x86_64-2.7/mesos/scheduler/__init__.py ./src/python/scheduler/build/lib.linux-x86_64-2.7/mesos/__init__.py ./src/python/scheduler/ext_modules.pyc ./src/python/build/lib.linux-x86_64-2.7/mesos/__init__.py ./src/python/cli/build/lib.linux-x86_64-2.7/mesos/http.py ./src/python/cli/build/lib.linux-x86_64-2.7/mesos/cli.py ./src/python/cli/build/lib.linux-x86_64-2.7/mesos/futures.py ./src/python/cli/build/lib.linux-x86_64-2.7/mesos/__init__.py ./src/python/interface/build/lib.linux-x86_64-2.7/mesos/__init__.py ./src/python/interface/build/lib.linux-x86_64-2.7/mesos/interface/containerizer_pb2.py ./src/python/interface/build/lib.linux-x86_64-2.7/mesos/interface/mesos_pb2.py ./src/python/interface/build/lib.linux-x86_64-2.7/mesos/interface/__init__.py ./src/python/native/bui
[jira] [Commented] (MESOS-5388) MesosContainerizerLaunch flags execute arbitrary commands via shell
[ https://issues.apache.org/jira/browse/MESOS-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402575#comment-15402575 ] Jie Yu commented on MESOS-5388: --- commit 25626fcf8f63875ed0ccfe2ddb67a9998e5ba934 Author: Gilbert Song Date: Mon Aug 1 09:50:13 2016 -0700 Supported non-shell command in MesosLaunch to avoid arbitrary commands. Currently all pre_exec_commands are executed as shell commands in Mesos Launch. It is not safe because arbitrary shell command may be included in some user facing api (e.g., container_path). We should execute those command as a subprocess to prevent arbitrary shell command injection. Review: https://reviews.apache.org/r/50214/ > MesosContainerizerLaunch flags execute arbitrary commands via shell > --- > > Key: MESOS-5388 > URL: https://issues.apache.org/jira/browse/MESOS-5388 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: James DeFelice >Assignee: Gilbert Song > Labels: mesosphere, security > > For example, the docker volume isolator's containerPath is appended (without > sanitation) to a command that's executed in this manner. As such, it's > possible to inject arbitrary shell commands to be executed by mesos. > https://github.com/apache/mesos/blob/17260204c833c643adf3d8f36ad8a1a606ece809/src/slave/containerizer/mesos/launch.cpp#L206 > Perhaps instead of strings these commands could/should be sent as string > arrays that could be passed as argv arguments w/o shell interpretation? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5933) Refactor the uri::Fetcher as a binary.
[ https://issues.apache.org/jira/browse/MESOS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402569#comment-15402569 ] Zhitao Li commented on MESOS-5933: -- [~kaysoky], prime the cache is another good result, but what I want is a separate utility to quickly test different fetcher code/configuration against different registry (or whatever image store for AppC) independently. > Refactor the uri::Fetcher as a binary. > -- > > Key: MESOS-5933 > URL: https://issues.apache.org/jira/browse/MESOS-5933 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Gilbert Song >Assignee: Zhitao Li > Labels: fetcher, mesosphere > > By refactoring the uri::Fetcher as a binary, the fetcher can be used > independently. Not only mesos, but also new fetcher plugin testing, mesos cli > and many other new components in the future can re-use the binary to fetch > any URI with different schemes. Ideally, after this change, mesos cli is able > to re-use the uri::Fetcher binary to introduce new image pulling commands, > e.g., `mesos fetch -i `. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5933) Refactor the uri::Fetcher as a binary.
[ https://issues.apache.org/jira/browse/MESOS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402565#comment-15402565 ] Joseph Wu commented on MESOS-5933: -- Another note: If the purpose of this ticket is to add a way to "prime" the (fetcher) cache, there are other ways to achieve this. You could schedule a task (i.e. a command task of {{exit 0}}) that fetches the object (into the cache). > Refactor the uri::Fetcher as a binary. > -- > > Key: MESOS-5933 > URL: https://issues.apache.org/jira/browse/MESOS-5933 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Gilbert Song >Assignee: Zhitao Li > Labels: fetcher, mesosphere > > By refactoring the uri::Fetcher as a binary, the fetcher can be used > independently. Not only mesos, but also new fetcher plugin testing, mesos cli > and many other new components in the future can re-use the binary to fetch > any URI with different schemes. Ideally, after this change, mesos cli is able > to re-use the uri::Fetcher binary to introduce new image pulling commands, > e.g., `mesos fetch -i `. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5853) http v1 API should document behavior regarding generated content-type header in the presence of errors
[ https://issues.apache.org/jira/browse/MESOS-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5853: -- Assignee: Abhishek Dasgupta > http v1 API should document behavior regarding generated content-type header > in the presence of errors > -- > > Key: MESOS-5853 > URL: https://issues.apache.org/jira/browse/MESOS-5853 > Project: Mesos > Issue Type: Improvement > Components: documentation >Reporter: James DeFelice >Assignee: Abhishek Dasgupta > Labels: mesosphere > > Changes made as part of https://issues.apache.org/jira/browse/MESOS-3739 set > a default Content-Type header. This should be documented in the Mesos v1 HTTP > API literature so that devs implementing against the spec know what to expect. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5933) Refactor the uri::Fetcher as a binary.
[ https://issues.apache.org/jira/browse/MESOS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402556#comment-15402556 ] Gilbert Song commented on MESOS-5933: - [~haosd...@gmail.com], This is a separate issue. It is a little confusing cuz we have some tech debt in fetchers. E.g., we have the mesos fetcher and the uri fetcher. This JIRA MESOS-5259 should address the it and refactor the mesos fetcher into the uri fetcher. Please note that most of the tickets in this Epic MESOS-3918 have dependencies. I talked to [~klueska] about the Mesos CLI, it would be great and easy to re-use this binary in CLI. BTW [~zhitao], please note that we need MESOS-5254 before you start working on this issue. Already link it as a dependency. :) > Refactor the uri::Fetcher as a binary. > -- > > Key: MESOS-5933 > URL: https://issues.apache.org/jira/browse/MESOS-5933 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Gilbert Song >Assignee: Zhitao Li > Labels: fetcher, mesosphere > > By refactoring the uri::Fetcher as a binary, the fetcher can be used > independently. Not only mesos, but also new fetcher plugin testing, mesos cli > and many other new components in the future can re-use the binary to fetch > any URI with different schemes. Ideally, after this change, mesos cli is able > to re-use the uri::Fetcher binary to introduce new image pulling commands, > e.g., `mesos fetch -i `. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5933) Refactor the uri::Fetcher as a binary.
[ https://issues.apache.org/jira/browse/MESOS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402441#comment-15402441 ] Zhitao Li commented on MESOS-5933: -- yes, in our conversation we agreed that ideally this should be able to be invoked by the new mesos cli, or built as a sub component somehow. I'm still getting familiar with that new architecture. > Refactor the uri::Fetcher as a binary. > -- > > Key: MESOS-5933 > URL: https://issues.apache.org/jira/browse/MESOS-5933 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Gilbert Song >Assignee: Zhitao Li > Labels: fetcher, mesosphere > > By refactoring the uri::Fetcher as a binary, the fetcher can be used > independently. Not only mesos, but also new fetcher plugin testing, mesos cli > and many other new components in the future can re-use the binary to fetch > any URI with different schemes. Ideally, after this change, mesos cli is able > to re-use the uri::Fetcher binary to introduce new image pulling commands, > e.g., `mesos fetch -i `. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5933) Refactor the uri::Fetcher as a binary.
[ https://issues.apache.org/jira/browse/MESOS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402429#comment-15402429 ] haosdent commented on MESOS-5933: - Got your idea now. Then they should be different things. In additionally, because we are going to implement Mesos CLI https://docs.google.com/document/d/1r6Iv4Efu8v8IBrcUTjgYkvZ32WVscgYqrD07OyIglsA/edit I think this ticket should relate to it as well. > Refactor the uri::Fetcher as a binary. > -- > > Key: MESOS-5933 > URL: https://issues.apache.org/jira/browse/MESOS-5933 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Gilbert Song >Assignee: Zhitao Li > Labels: fetcher, mesosphere > > By refactoring the uri::Fetcher as a binary, the fetcher can be used > independently. Not only mesos, but also new fetcher plugin testing, mesos cli > and many other new components in the future can re-use the binary to fetch > any URI with different schemes. Ideally, after this change, mesos cli is able > to re-use the uri::Fetcher binary to introduce new image pulling commands, > e.g., `mesos fetch -i `. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5790) Ensure all examples in Scheduler HTTP API docs are valid JSON
[ https://issues.apache.org/jira/browse/MESOS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5790: -- Shepherd: Anand Mazumdar > Ensure all examples in Scheduler HTTP API docs are valid JSON > - > > Key: MESOS-5790 > URL: https://issues.apache.org/jira/browse/MESOS-5790 > Project: Mesos > Issue Type: Improvement >Reporter: Anand Mazumdar >Assignee: Abhishek Dasgupta > Labels: mesosphere, newbie > > Currently, there are a lot of JSON snippets in the [API Docs | > http://mesos.apache.org/documentation/latest/scheduler-http-api/ ] that are > not valid JSON i.e. have {{...}} to make the snippet succinct/easy to read. > e.g., > {code} > {{"filters" : {...} > {code} > However, this is a problem for framework developers who are trying to use the > new API. Looking at the corresponding protobuf definitions can be a good > place to start but hardly ideal. > It would be good to address the shortcomings and make the JSON snippets > complete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5933) Refactor the uri::Fetcher as a binary.
[ https://issues.apache.org/jira/browse/MESOS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402390#comment-15402390 ] Zhitao Li commented on MESOS-5933: -- [~haosd...@gmail.com], from my perspective, what I want is to refactor the part of natively fetching images from a docker registry as a separate binary so that we can easily test different registries and storages options around. It seems like mesos_fetcher is designed is used for fetching `CommandInfo.URI` to sandbox. It remains a question to me whether these two things should be in the same binary. Usually, image fetching is stored into the image `Store`, which we probably don't want executors to play with? > Refactor the uri::Fetcher as a binary. > -- > > Key: MESOS-5933 > URL: https://issues.apache.org/jira/browse/MESOS-5933 > Project: Mesos > Issue Type: Improvement > Components: fetcher >Reporter: Gilbert Song >Assignee: Zhitao Li > Labels: fetcher, mesosphere > > By refactoring the uri::Fetcher as a binary, the fetcher can be used > independently. Not only mesos, but also new fetcher plugin testing, mesos cli > and many other new components in the future can re-use the binary to fetch > any URI with different schemes. Ideally, after this change, mesos cli is able > to re-use the uri::Fetcher binary to introduce new image pulling commands, > e.g., `mesos fetch -i `. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5956) ABORT or report TASK_FAILED on health check creation failure
[ https://issues.apache.org/jira/browse/MESOS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-5956: --- Description: Now when a task with health check fail to create {{HealthChecker}}, we just print a warning log and continue to start the task. We should consider aborting or sending a {{TASK_FAILED}} or another appropriate {{TaskStatus}} in this case. (was: Now when a task with health check fail to create {{HealthChecker}}, we just print a warning log and continue to start the task. We should decide to abort or send a {{TASK_FAILED}} {{TaskStatus}} in this case.) > ABORT or report TASK_FAILED on health check creation failure > > > Key: MESOS-5956 > URL: https://issues.apache.org/jira/browse/MESOS-5956 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent > Labels: health-check, tech-debt > > Now when a task with health check fail to create {{HealthChecker}}, we just > print a warning log and continue to start the task. We should consider > aborting or sending a {{TASK_FAILED}} or another appropriate {{TaskStatus}} > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5956) ABORT or report TASK_FAILED on health check creation failure
[ https://issues.apache.org/jira/browse/MESOS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-5956: --- Description: Now when a task with health check fails to create a {{HealthChecker}} instance, we just log a warning and continue to start the task. We should consider aborting or sending a {{TASK_FAILED}} or another appropriate {{TaskStatus}} in this case. (was: Now when a task with health check fail to create {{HealthChecker}}, we just print a warning log and continue to start the task. We should consider aborting or sending a {{TASK_FAILED}} or another appropriate {{TaskStatus}} in this case.) > ABORT or report TASK_FAILED on health check creation failure > > > Key: MESOS-5956 > URL: https://issues.apache.org/jira/browse/MESOS-5956 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent > Labels: health-check, tech-debt > > Now when a task with health check fails to create a {{HealthChecker}} > instance, we just log a warning and continue to start the task. We should > consider aborting or sending a {{TASK_FAILED}} or another appropriate > {{TaskStatus}} in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402307#comment-15402307 ] Avinash Sridharan edited comment on MESOS-5953 at 8/1/16 4:09 PM: -- But I think that is the problem that [~philwinder] was alluding to, that certain Dockerfile assume that the working directory is `/`, when the WORKDIR is not specified? was (Author: avin...@mesosphere.io): But I think that is the problem that [~philwinder] was adhering to, that certain Dockerfile assume that the working directory is `/`, when the WORKDIR is not specified? > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402307#comment-15402307 ] Avinash Sridharan commented on MESOS-5953: -- But I think that is the problem that [~philwinder] was adhering to, that certain Dockerfile assume that the working directory is `/`, when the WORKDIR is not specified? > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402294#comment-15402294 ] Jie Yu commented on MESOS-5953: --- I feel like we should not strictly follow Docker engine semantics here. Mesos has this notion of sandbox ($MESOS_SANDBOX). It makes more sense to set workdir to $MESOS_SANDBOX if it's not set in Dockerfile than setting to `/`. > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402280#comment-15402280 ] haosdent commented on MESOS-5953: - I think need tp update here https://github.com/apache/mesos/blob/7864eb860cc5b6d12c4af968e85640613dc34f1d/src/slave/containerizer/mesos/isolators/docker/runtime.cpp#L384 to set the work dir to {{/}} when {{WorkingDir}} is empty in the docker image manifest. > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5957) Provide packages for Ubuntu 16.04
[ https://issues.apache.org/jira/browse/MESOS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Streichardt updated MESOS-5957: --- Description: As per dcos-community slack: ``` mop 5:23 PM for ubuntu vivid mesosphere was kind enough to host a package repo: https://open.mesosphere.com/getting-started/install/ ... will there be an update for 16.04? or did I miss something? thomas.mesosphere 5:26 PM @mop: please file an issue and assign to Artem (issues.apache.org) ``` Ubuntu 16.04 has been released since a while. This getting started tutorial https://open.mesosphere.com/getting-started/install/ still lists 15.04 as the most up to date ubuntu. Having updated packages would be great! was: As per dcos-community slack: ``` mop 5:23 PM for ubuntu vivid mesosphere was kind enough to host a package repo: https://open.mesosphere.com/getting-started/install/ ... will there be an update for 16.04? or did I miss something? thomas.mesosphere 5:26 PM @mop: please file an issue and assign to Artem (issues.apache.org) ``` Ubuntu 16.04 has been released since a while. This getting started tutorial https://open.mesosphere.com/getting-started/install/ still lists 15.04 as the most up to date ubuntu. Having updated packages would be great! There are multiple Artems in the list :S Not sure which Artem to assign this ticket > Provide packages for Ubuntu 16.04 > - > > Key: MESOS-5957 > URL: https://issues.apache.org/jira/browse/MESOS-5957 > Project: Mesos > Issue Type: Wish > Components: release >Reporter: Andreas Streichardt >Priority: Minor > > As per dcos-community slack: > ``` > mop > 5:23 PM for ubuntu vivid mesosphere was kind enough to host a package repo: > https://open.mesosphere.com/getting-started/install/ ... will there be an > update for 16.04? or did I miss something? > thomas.mesosphere > 5:26 PM @mop: please file an issue and assign to Artem (issues.apache.org) > ``` > Ubuntu 16.04 has been released since a while. > This getting started tutorial > https://open.mesosphere.com/getting-started/install/ > still lists 15.04 as the most up to date ubuntu. Having updated packages > would be great! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5957) Provide packages for Ubuntu 16.04
Andreas Streichardt created MESOS-5957: -- Summary: Provide packages for Ubuntu 16.04 Key: MESOS-5957 URL: https://issues.apache.org/jira/browse/MESOS-5957 Project: Mesos Issue Type: Wish Components: release Reporter: Andreas Streichardt Priority: Minor As per dcos-community slack: ``` mop 5:23 PM for ubuntu vivid mesosphere was kind enough to host a package repo: https://open.mesosphere.com/getting-started/install/ ... will there be an update for 16.04? or did I miss something? thomas.mesosphere 5:26 PM @mop: please file an issue and assign to Artem (issues.apache.org) ``` Ubuntu 16.04 has been released since a while. This getting started tutorial https://open.mesosphere.com/getting-started/install/ still lists 15.04 as the most up to date ubuntu. Having updated packages would be great! There are multiple Artems in the list :S Not sure which Artem to assign this ticket -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5956) ABORT or report TASK_FAILED on health check creation failure
[ https://issues.apache.org/jira/browse/MESOS-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5956: Labels: health-check tech-debt (was: health-check) > ABORT or report TASK_FAILED on health check creation failure > > > Key: MESOS-5956 > URL: https://issues.apache.org/jira/browse/MESOS-5956 > Project: Mesos > Issue Type: Improvement >Reporter: haosdent > Labels: health-check, tech-debt > > Now when a task with health check fail to create {{HealthChecker}}, we just > print a warning log and continue to start the task. We should decide to abort > or send a {{TASK_FAILED}} {{TaskStatus}} in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5956) ABORT or report TASK_FAILED on health check creation failure
haosdent created MESOS-5956: --- Summary: ABORT or report TASK_FAILED on health check creation failure Key: MESOS-5956 URL: https://issues.apache.org/jira/browse/MESOS-5956 Project: Mesos Issue Type: Improvement Reporter: haosdent Now when a task with health check fail to create {{HealthChecker}}, we just print a warning log and continue to start the task. We should decide to abort or send a {{TASK_FAILED}} {{TaskStatus}} in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5803) Command health checks do not survive after framework restart
[ https://issues.apache.org/jira/browse/MESOS-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-5803: Summary: Command health checks do not survive after framework restart (was: Command health checks do not survive after master restart) > Command health checks do not survive after framework restart > > > Key: MESOS-5803 > URL: https://issues.apache.org/jira/browse/MESOS-5803 > Project: Mesos > Issue Type: Bug >Reporter: haosdent > Labels: health-check > > Reported in https://github.com/mesosphere/marathon/issues/916 > and https://github.com/apache/mesos/pull/118 > So far health check only sends success healthy status if the previous status > is failed or not exists. So frameworks could not know the health status of > tasks after master restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5955) The "mesos-health-check" binary is not used anymore.
[ https://issues.apache.org/jira/browse/MESOS-5955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402200#comment-15402200 ] haosdent commented on MESOS-5955: - | Removed the binary way of HealthCheck in libprocess. | https://reviews.apache.org/r/50657/ | | Removed the binary way of HealthCheck in src. | https://reviews.apache.org/r/49556/ | > The "mesos-health-check" binary is not used anymore. > > > Key: MESOS-5955 > URL: https://issues.apache.org/jira/browse/MESOS-5955 > Project: Mesos > Issue Type: Improvement >Reporter: Alexander Rukletsov >Assignee: haosdent > Labels: mesosphere > > MESOS-5727 and MESOS-5954 refactored the health check code into the > {{HealthChecker}} library, hence the "mesos-health-check" binary became > unused. > While the command and docker executors could just use the library to avoid > the subprocess complexity, we may want to consider keeping a binary version > that ships with the installation, because the intention of the binary was to > allow other executors to re-use our implementation. On the other side, this > binary is ill suited to this since it uses libprocess message passing, so if > we do not have code that requires the binary it seems ok to remove it for > now. Custom executors may use the {{HealthChecker}} library directly, it is > not much more complex than using the binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402196#comment-15402196 ] haosdent commented on MESOS-5953: - [~philwinder] /tmp/mesos/sandbox should be /mnt/mesos/sandbox ? > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5955) The "mesos-health-check" binary is not used anymore.
Alexander Rukletsov created MESOS-5955: -- Summary: The "mesos-health-check" binary is not used anymore. Key: MESOS-5955 URL: https://issues.apache.org/jira/browse/MESOS-5955 Project: Mesos Issue Type: Improvement Reporter: Alexander Rukletsov Assignee: haosdent MESOS-5727 and MESOS-5954 refactored the health check code into the {{HealthChecker}} library, hence the "mesos-health-check" binary became unused. While the command and docker executors could just use the library to avoid the subprocess complexity, we may want to consider keeping a binary version that ships with the installation, because the intention of the binary was to allow other executors to re-use our implementation. On the other side, this binary is ill suited to this since it uses libprocess message passing, so if we do not have code that requires the binary it seems ok to remove it for now. Custom executors may use the {{HealthChecker}} library directly, it is not much more complex than using the binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5953) Default work dir is not root for unified containerizer and docker
Philip Winder created MESOS-5953: Summary: Default work dir is not root for unified containerizer and docker Key: MESOS-5953 URL: https://issues.apache.org/jira/browse/MESOS-5953 Project: Mesos Issue Type: Bug Components: containerization Reporter: Philip Winder According to the docker spec, the default working directory (WORKDIR) is root (/). https://docs.docker.com/engine/reference/run/#/workdir The unified containerizer with the docker runtime isolator sets the default working directory to /tmp/mesos/sandbox. Hence, dockerfiles that are relying on the default workdir will not work because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Winder updated MESOS-5953: - Description: According to the docker spec, the default working directory (WORKDIR) is root /. https://docs.docker.com/engine/reference/run/#/workdir The unified containerizer with the docker runtime isolator sets the default working directory to /tmp/mesos/sandbox. Hence, dockerfiles that are relying on the default workdir will not work because the pwd is changed by mesos. was: According to the docker spec, the default working directory (WORKDIR) is root (/). https://docs.docker.com/engine/reference/run/#/workdir The unified containerizer with the docker runtime isolator sets the default working directory to /tmp/mesos/sandbox. Hence, dockerfiles that are relying on the default workdir will not work because the pwd is changed by mesos. > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > /. https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5954) Docker executor does not use HealthChecker library.
Alexander Rukletsov created MESOS-5954: -- Summary: Docker executor does not use HealthChecker library. Key: MESOS-5954 URL: https://issues.apache.org/jira/browse/MESOS-5954 Project: Mesos Issue Type: Improvement Reporter: Alexander Rukletsov Assignee: haosdent https://github.com/apache/mesos/commit/1556d9a3a02de4e8a90b5b64d268754f95b12d77 refactored health checks into a library. Command executor uses the library instead of the "mesos-health-check" binary, docker executor should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5953) Default work dir is not root for unified containerizer and docker
[ https://issues.apache.org/jira/browse/MESOS-5953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Winder updated MESOS-5953: - Affects Version/s: 1.0.0 > Default work dir is not root for unified containerizer and docker > - > > Key: MESOS-5953 > URL: https://issues.apache.org/jira/browse/MESOS-5953 > Project: Mesos > Issue Type: Bug > Components: containerization >Affects Versions: 1.0.0 >Reporter: Philip Winder > > According to the docker spec, the default working directory (WORKDIR) is root > (/). https://docs.docker.com/engine/reference/run/#/workdir > The unified containerizer with the docker runtime isolator sets the default > working directory to /tmp/mesos/sandbox. > Hence, dockerfiles that are relying on the default workdir will not work > because the pwd is changed by mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5952) Update docs for new slave removal behavior
Neil Conway created MESOS-5952: -- Summary: Update docs for new slave removal behavior Key: MESOS-5952 URL: https://issues.apache.org/jira/browse/MESOS-5952 Project: Mesos Issue Type: Improvement Components: documentation Reporter: Neil Conway Assignee: Neil Conway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5951) Remove "strict registry" code
Neil Conway created MESOS-5951: -- Summary: Remove "strict registry" code Key: MESOS-5951 URL: https://issues.apache.org/jira/browse/MESOS-5951 Project: Mesos Issue Type: Improvement Components: master Reporter: Neil Conway Assignee: Neil Conway Once {{PARTITION_AWARE}} frameworks are supported, we should eventually remove the code that supports the "non-strict" semantics in the master. That is: 1. The master will be "strict" in Mesos 1.1, in the sense that master behavior will always reflect the content of the registry and will not change depending on whether the master has failed over. The exception here is that for non-PARTITION_AWARE frameworks, we will _only_ kill such tasks on a reregistering agent if the master hasn't failed over in the meantime. i.e., we'll remain backwards compatible with the previous "non-strict" semantics that old frameworks might depend on. 2. The "strict" semantics will be less problematic, because the master will no longer be killing tasks and shutting down agents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5950) Consider request/response for reconciliation, bulk reconcile
Neil Conway created MESOS-5950: -- Summary: Consider request/response for reconciliation, bulk reconcile Key: MESOS-5950 URL: https://issues.apache.org/jira/browse/MESOS-5950 Project: Mesos Issue Type: Improvement Components: framework api, master Reporter: Neil Conway The current task reconciliation API has a few quirks: 1. The master will sometimes use "send nothing" as a way to communicate information (MESOS-4050), which is very confusing in a distributed system that might drop messages for other reasons. 2. A framework has no way to determine when the reconciliation results for a given reconciliation request are "complete". That is, when a framework sends a reconciliation request, it starts to receive zero or more task status updates (with {{reason}} set to {{REASON_RECONCILIATION}}). The framework can't easily determine how many results it should expect to receive. 3. For efficiency (and perhaps to simplify framework logic), it might be easier to send a batch of task status updates together in a single message, rather than sending potentially tens of thousands of individual messages. For #2, arguably a framework shouldn't _need_ to know when it has seen the "complete" set of results for a reconciliation request. However, supporting a "request/reply" structure for reconciliation can simplify framework logic, especially if a framework might have multiple timers/reasons to be doing reconciliation at the same time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4050) Change task reconciliation not omit unknown tasks
[ https://issues.apache.org/jira/browse/MESOS-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4050: --- Description: If the master fails over and a framework tries to do an explicit reconciliation for a task running on an agent that has not reregistered yet (and {{agent_reregister_timeout}} has not been exceeded), the master will _not_ send a reconciliation response for that task. This is confusing for framework authors. It seems better for the master to announce all the information it has explicitly: e.g., to return "task X is in an unknown state", rather than not returning anything. Then as more information arrives (e.g., agent reregisters or task definitively dies), task state would transition appropriately. We might want to do this via a new task state, e.g., {{TASK_REREGISTER_PENDING}}. This might be consistent with changing the task states so that we capture "task is partitioned" as an explicit task state ({{TASK_UNKNOWN}} or {{TASK_WANDERING}}) -- see MESOS-4049. was: If a framework tries to reconcile the state of a task that is in an unknown state (because the agent running the task is partitioned from the master), the master will _not_ include any information about that task. This is confusing for framework authors. It seems better for the master to announce all the information it has explicitly: e.g., to return "task X is in an unknown state", rather than not returning anything. Then as more information arrives (e.g., task returns or task definitively dies), task state would transition appropriately. This might be consistent with changing the task states so that we capture "task is partitioned" as an explicit task state ({{TASK_UNKNOWN}} or {{TASK_WANDERING}}) -- see MESOS-4049. > Change task reconciliation not omit unknown tasks > - > > Key: MESOS-4050 > URL: https://issues.apache.org/jira/browse/MESOS-4050 > Project: Mesos > Issue Type: Improvement > Components: framework, master >Reporter: Neil Conway > Labels: mesosphere, reconciliation > > If the master fails over and a framework tries to do an explicit > reconciliation for a task running on an agent that has not reregistered yet > (and {{agent_reregister_timeout}} has not been exceeded), the master will > _not_ send a reconciliation response for that task. > This is confusing for framework authors. It seems better for the master to > announce all the information it has explicitly: e.g., to return "task X is in > an unknown state", rather than not returning anything. Then as more > information arrives (e.g., agent reregisters or task definitively dies), task > state would transition appropriately. We might want to do this via a new task > state, e.g., {{TASK_REREGISTER_PENDING}}. > This might be consistent with changing the task states so that we capture > "task is partitioned" as an explicit task state ({{TASK_UNKNOWN}} or > {{TASK_WANDERING}}) -- see MESOS-4049. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5949) Allow frameworks to learn the time when an agent became unreachable
Neil Conway created MESOS-5949: -- Summary: Allow frameworks to learn the time when an agent became unreachable Key: MESOS-5949 URL: https://issues.apache.org/jira/browse/MESOS-5949 Project: Mesos Issue Type: Improvement Components: master Reporter: Neil Conway Assignee: Neil Conway We currently store the time at which agents become unreachable in the registry, but we don't expose that to information to frameworks yet. One mechanism would be via a new optional field in {{TaskStatus}}; other mechanisms would also be possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5948) Remove rate-limiting for agent removal
Neil Conway created MESOS-5948: -- Summary: Remove rate-limiting for agent removal Key: MESOS-5948 URL: https://issues.apache.org/jira/browse/MESOS-5948 Project: Mesos Issue Type: Improvement Components: master Reporter: Neil Conway Assignee: Neil Conway If we can assume that all frameworks are {{PARTITION_AWARE}} (e.g., for Mesos 2), we can likely remove the code that applies a rate-limit to agent removal. This is because "agent removal" just means marking the agent as {{UNREACHABLE}}; because this is a non-destructive operation, we don't need to be as careful about the situations in which we do it. If a framework responds to {{UNREACHABLE}} by terminating and replacing tasks, they can (and often should) use their own safety mechanisms, whether a rate-limit or something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5344) Partition-aware Mesos frameworks
[ https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-5344: --- Epic Name: PARTITION_AWARE (was: New TaskStatuses) Description: This epic covers three related tasks: 1. Allowing partitioned agents to reregister with the master. This allows frameworks to control how tasks running on partitioned agents should be dealt with. 2. Replacing the TASK_LOST task state with a set of more granular states with more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and GONE_BY_OPERATOR. 3. Allow frameworks to be informed when a task that was running on a partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states). These new behaviors will be guarded by the {{PARTITION_AWARE}} framework capability. was: This epic covers three related tasks: 1. Allowing partitioned agents to reregister with the master. This allows frameworks to control how tasks running on partitioned agents should be dealt with. 2. Replacing the TASK_LOST task state with a set of more granular states with more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and GONE_BY_OPERATOR. 3. Allow frameworks to be informed when a task that was running on a partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states). > Partition-aware Mesos frameworks > > > Key: MESOS-5344 > URL: https://issues.apache.org/jira/browse/MESOS-5344 > Project: Mesos > Issue Type: Epic > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > This epic covers three related tasks: > 1. Allowing partitioned agents to reregister with the master. This allows > frameworks to control how tasks running on partitioned agents should be dealt > with. > 2. Replacing the TASK_LOST task state with a set of more granular states with > more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and > GONE_BY_OPERATOR. > 3. Allow frameworks to be informed when a task that was running on a > partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states). > These new behaviors will be guarded by the {{PARTITION_AWARE}} framework > capability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5344) Partition-aware Mesos frameworks
[ https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-5344: --- Shepherd: Vinod Kone > Partition-aware Mesos frameworks > > > Key: MESOS-5344 > URL: https://issues.apache.org/jira/browse/MESOS-5344 > Project: Mesos > Issue Type: Epic > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > This epic covers three related tasks: > 1. Allowing partitioned agents to reregister with the master. This allows > frameworks to control how tasks running on partitioned agents should be dealt > with. > 2. Replacing the TASK_LOST task state with a set of more granular states with > more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and > GONE_BY_OPERATOR. > 3. Allow frameworks to be informed when a task that was running on a > partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5344) Partition-aware Mesos frameworks
[ https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-5344: --- Description: This epic covers three related tasks: 1. Allowing partitioned agents to reregister with the master. This allows frameworks to control how tasks running on partitioned agents should be dealt with. 2. Replacing the TASK_LOST task state with a set of more granular states with more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and GONE_BY_OPERATOR. 3. Allow frameworks to be informed when a task that was running on a partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states). was: This epic covers two related tasks: 1. Clarifying the semantics of TASK_LOST, and allow frameworks to learn when a task is *truly* lost (i.e., not running), versus the current LOST semantics of "may or may not be running". 2. Allowing frameworks to control how partitioned tasks are handled. > Partition-aware Mesos frameworks > > > Key: MESOS-5344 > URL: https://issues.apache.org/jira/browse/MESOS-5344 > Project: Mesos > Issue Type: Epic > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > This epic covers three related tasks: > 1. Allowing partitioned agents to reregister with the master. This allows > frameworks to control how tasks running on partitioned agents should be dealt > with. > 2. Replacing the TASK_LOST task state with a set of more granular states with > more precise semantics: UNREACHABLE, DROPPED, UNKNOWN, GONE, and > GONE_BY_OPERATOR. > 3. Allow frameworks to be informed when a task that was running on a > partitioned agent has been terminated (GONE and GONE_BY_OPERATOR states). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5344) Partition-aware Mesos frameworks
[ https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-5344: -- Assignee: Neil Conway > Partition-aware Mesos frameworks > > > Key: MESOS-5344 > URL: https://issues.apache.org/jira/browse/MESOS-5344 > Project: Mesos > Issue Type: Epic > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > This epic covers two related tasks: > 1. Clarifying the semantics of TASK_LOST, and allow frameworks to learn when > a task is *truly* lost (i.e., not running), versus the current LOST semantics > of "may or may not be running". > 2. Allowing frameworks to control how partitioned tasks are handled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5344) Partition-aware Mesos frameworks
[ https://issues.apache.org/jira/browse/MESOS-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-5344: --- Summary: Partition-aware Mesos frameworks (was: Revise TaskStatus semantics) > Partition-aware Mesos frameworks > > > Key: MESOS-5344 > URL: https://issues.apache.org/jira/browse/MESOS-5344 > Project: Mesos > Issue Type: Epic > Components: master >Reporter: Neil Conway > Labels: mesosphere > > This epic covers two related tasks: > 1. Clarifying the semantics of TASK_LOST, and allow frameworks to learn when > a task is *truly* lost (i.e., not running), versus the current LOST semantics > of "may or may not be running". > 2. Allowing frameworks to control how partitioned tasks are handled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5947) Optimize registry to avoid unnecessary storage writes
Neil Conway created MESOS-5947: -- Summary: Optimize registry to avoid unnecessary storage writes Key: MESOS-5947 URL: https://issues.apache.org/jira/browse/MESOS-5947 Project: Mesos Issue Type: Improvement Components: master Reporter: Neil Conway Assignee: Neil Conway If we apply a sequence of registry operations and none of those operations actually modify the registry, we can skip writing out the new registry state variable to the replicated log. This can be an important optimization, e.g., when the master fails over and a large number of agents attempt to reregister simultaneously. Since those agents already appear in the "admitted" list of agents in the registry, we don't need to do any replicated log writes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5896) When start Mesos container and docker images, it does not work.
[ https://issues.apache.org/jira/browse/MESOS-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401634#comment-15401634 ] Gilbert Song commented on MESOS-5896: - [~Sunzhe], could you try by using the default `--containerizers` agent flag? E.g., --containerizers=mesos BTW, are you testing using mesos-execute? or your own framework? Since your ContainerInfo definition seems strange to me. The `NetworkInfo` is not supposed to be available ContainerInfo.MesosInfo. > When start Mesos container and docker images, it does not work. > --- > > Key: MESOS-5896 > URL: https://issues.apache.org/jira/browse/MESOS-5896 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 1.0.0 >Reporter: Sunzhe > Labels: containerizer > > When I create Mesos container with docker image, like this: > {code:title=test.json|borderStyle=solid} > { > "id": "test-mesos-container-docker-image", > "cmd": "while [ true ]; do uname -a; sleep 3; done", > "cpus": 0.5, > "mem": 32.0, > "container": { > "type": "MESOS", > "mesos": { > "image": { > "type": "DOCKER", > "docker": { > "name": "ubuntu:14.04" > } > }, > "network": "BRIDGE", > "portMappings": [ > { > "containerPort": 8080, > "hostPort": 0, > "servicePort": 10008, > "protocol": "tcp", > "labels": {} > } > ], > "privileged": false, > "parameters": [], > "forcePullImage": false > } > } > } > {code} > It does not wok! The result seems Docker image does not work, the container > uses host filesystem not the Docker image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)