[jira] [Updated] (MESOS-4578) docker run -c is deprecated
[ https://issues.apache.org/jira/browse/MESOS-4578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Maloney updated MESOS-4578: Labels: mesosphere newbie (was: mesosphere) > docker run -c is deprecated > --- > > Key: MESOS-4578 > URL: https://issues.apache.org/jira/browse/MESOS-4578 > Project: Mesos > Issue Type: Improvement > Components: containerization, docker >Affects Versions: 0.26.0 > Environment: CoreOS 7 >Reporter: Cody Maloney > Labels: mesosphere, newbie > > When running mesos slave with the docker containerizer enabled on CoreOS > 766.4.0, launching docker containers results in the following in stderr: > {noformat} > Warning: '-c' is deprecated, it will be replaced by '--cpu-shares' soon. See > usage. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4578) docker run -c is deprecated
Cody Maloney created MESOS-4578: --- Summary: docker run -c is deprecated Key: MESOS-4578 URL: https://issues.apache.org/jira/browse/MESOS-4578 Project: Mesos Issue Type: Improvement Components: containerization, docker Affects Versions: 0.26.0 Environment: CoreOS 7 Reporter: Cody Maloney When running mesos slave with the docker containerizer enabled on CoreOS 766.4.0, launching docker containers results in the following in stderr: {noformat} Warning: '-c' is deprecated, it will be replaced by '--cpu-shares' soon. See usage. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4066) Expose when agent is recovering in the agent's /state endpoint.
[ https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127802#comment-15127802 ] Guangya Liu commented on MESOS-4066: Yes, I think that it should be /state endpoint and have update the summary for this jira ticket. > Expose when agent is recovering in the agent's /state endpoint. > --- > > Key: MESOS-4066 > URL: https://issues.apache.org/jira/browse/MESOS-4066 > Project: Mesos > Issue Type: Task > Components: slave >Reporter: Benjamin Mahler >Assignee: Vinod Kone > Labels: mesosphere > > Currently when a user is hitting /state.json on the agent, it may return > partial state if the agent has failed over and is recovering. There is > currently no clear way to tell if this is the case when looking at a > response, so the user may incorrectly interpret the agent as being empty of > tasks. > We could consider exposing the 'state' enum of the agent in the endpoint: > {code} > enum State > { > RECOVERING, // Slave is doing recovery. > DISCONNECTED, // Slave is not connected to the master. > RUNNING, // Slave has (re-)registered. > TERMINATING, // Slave is shutting down. > } state; > {code} > This may be a bit tricky to maintain as far as backwards-compatibility of the > endpoint, if we were to alter this enum. > Exposing this would allow users to be more informed about the state of the > agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4066) Expose when agent is recovering in the agent's /state endpoint.
[ https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangya Liu updated MESOS-4066: --- Summary: Expose when agent is recovering in the agent's /state endpoint. (was: Expose when agent is recovering in the agent's /state.json endpoint.) > Expose when agent is recovering in the agent's /state endpoint. > --- > > Key: MESOS-4066 > URL: https://issues.apache.org/jira/browse/MESOS-4066 > Project: Mesos > Issue Type: Task > Components: slave >Reporter: Benjamin Mahler >Assignee: Vinod Kone > Labels: mesosphere > > Currently when a user is hitting /state.json on the agent, it may return > partial state if the agent has failed over and is recovering. There is > currently no clear way to tell if this is the case when looking at a > response, so the user may incorrectly interpret the agent as being empty of > tasks. > We could consider exposing the 'state' enum of the agent in the endpoint: > {code} > enum State > { > RECOVERING, // Slave is doing recovery. > DISCONNECTED, // Slave is not connected to the master. > RUNNING, // Slave has (re-)registered. > TERMINATING, // Slave is shutting down. > } state; > {code} > This may be a bit tricky to maintain as far as backwards-compatibility of the > endpoint, if we were to alter this enum. > Exposing this would allow users to be more informed about the state of the > agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127759#comment-15127759 ] Shuai Lin commented on MESOS-1806: -- Hi, I think the rationale of this ticket is some company/organization already have a running etcd cluster, so it would be easier for them to deploy a mesos cluster without having to setup a dedicated zookeeper quorum. It doesn't mean etcd can afford something better than zk. > Substituting etcd for Zookeeper > --- > > Key: MESOS-1806 > URL: https://issues.apache.org/jira/browse/MESOS-1806 > Project: Mesos > Issue Type: Task > Components: leader election >Reporter: Ed Ropple >Assignee: Shuai Lin >Priority: Minor > >eropple: Could you also file a new JIRA for Mesos to drop ZK > in favor of etcd or ReplicatedLog? Would love to get some momentum going on > that one. > -- > Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127644#comment-15127644 ] Brandon Philips commented on MESOS-1806: The etcd v3 api is a better match for the things you are looking to do. > Substituting etcd for Zookeeper > --- > > Key: MESOS-1806 > URL: https://issues.apache.org/jira/browse/MESOS-1806 > Project: Mesos > Issue Type: Task > Components: leader election >Reporter: Ed Ropple >Assignee: Shuai Lin >Priority: Minor > >eropple: Could you also file a new JIRA for Mesos to drop ZK > in favor of etcd or ReplicatedLog? Would love to get some momentum going on > that one. > -- > Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64)
[ https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AndyPang updated MESOS-4577: Summary: libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64) (was: libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64) ) > libprocess can not run on 16-byte aligned stack mandatory > architecture(aarch64) > > > Key: MESOS-4577 > URL: https://issues.apache.org/jira/browse/MESOS-4577 > Project: Mesos > Issue Type: Bug > Components: containerization, libprocess, stout > Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 > 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux >Reporter: AndyPang >Assignee: AndyPang > Labels: mesosphere > > mesos run in AArch64 will get error, the log is: > {code} > E0101 00:06:56.636520 32411 slave.cpp:3342] Container > 'b6be429a-08f0-4d52-b01d-abfcb6e0106b' for executor > 'hello.84d205ae-f626-11de-bd66-7a3f6cf980b9' of framework > '868b9f04-9179-427b-b050-ee8f89ffa3bd-' failed to start: Failed to fork > executor: Failed to clone child process: Failed to clone: Invalid argument > {code} > the "clone" achieve in libprocess 3rdparty stout library(in linux.hpp) > packaging a syscall "clone" : > {code:title=clone|borderStyle=solid} > inline pid_t clone(const lambda::function& func, int flags) > { > // Stack for the child. > // - unsigned long long used for best alignment. > // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux. > // > // NOTE: We need to allocate the stack dynamically. This is because > // glibc's 'clone' will modify the stack passed to it, therefore the > // stack must NOT be shared as multiple 'clone's can be invoked > // simultaneously. > int stackSize = 8 * 1024 * 1024; > unsigned long long *stack = > new unsigned long long[stackSize/sizeof(unsigned long long)]; > pid_t pid = ::clone( > childMain, > &stack[stackSize/sizeof(stack[0]) - 1], // stack grows down. > flags, > (void*) &func); > // If CLONE_VM is not set, ::clone would create a process which runs in a > // separate copy of the memory space of the calling process. So we destroy > the > // stack here to avoid memory leak. If CLONE_VM is set, ::clone would > create a > // thread which runs in the same memory space with the calling process. > if (!(flags & CLONE_VM)) { > delete[] stack; > } > return pid; > } > {code} > syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned > stack mandatory architecture(aarch64 ppc64) it will get error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)
[ https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127538#comment-15127538 ] AndyPang edited comment on MESOS-4577 at 2/2/16 4:05 AM: - the syscal "clone" achieve in /arch/arm64/kernel/process.c, in "copy_thread" function: {code} if (stack_start) { /* 16-byte aligned stack mandatory on AArch64 */ if (stack_start & 15) return -EINVAL; childregs->sp = stack_start; } {code} AArch64 the stack must be 16-byte aligned was (Author: andypang): the syscal "clone" achieve in linux-4.1.6/arch/arm64/kernel/process.c, in "copy_thread" function: {code} if (stack_start) { /* 16-byte aligned stack mandatory on AArch64 */ if (stack_start & 15) return -EINVAL; childregs->sp = stack_start; } {code} AArch64 the stack must be 16-byte aligned > libprocess can not run on 16-byte aligned stack mandatory > architecture(aarch64 ppc64) > -- > > Key: MESOS-4577 > URL: https://issues.apache.org/jira/browse/MESOS-4577 > Project: Mesos > Issue Type: Bug > Components: containerization, libprocess, stout > Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 > 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux >Reporter: AndyPang >Assignee: AndyPang > Labels: mesosphere > > mesos run in AArch64 will get error, the log is: > {code} > E0101 00:06:56.636520 32411 slave.cpp:3342] Container > 'b6be429a-08f0-4d52-b01d-abfcb6e0106b' for executor > 'hello.84d205ae-f626-11de-bd66-7a3f6cf980b9' of framework > '868b9f04-9179-427b-b050-ee8f89ffa3bd-' failed to start: Failed to fork > executor: Failed to clone child process: Failed to clone: Invalid argument > {code} > the "clone" achieve in libprocess 3rdparty stout library(in linux.hpp) > packaging a syscall "clone" : > {code:title=clone|borderStyle=solid} > inline pid_t clone(const lambda::function& func, int flags) > { > // Stack for the child. > // - unsigned long long used for best alignment. > // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux. > // > // NOTE: We need to allocate the stack dynamically. This is because > // glibc's 'clone' will modify the stack passed to it, therefore the > // stack must NOT be shared as multiple 'clone's can be invoked > // simultaneously. > int stackSize = 8 * 1024 * 1024; > unsigned long long *stack = > new unsigned long long[stackSize/sizeof(unsigned long long)]; > pid_t pid = ::clone( > childMain, > &stack[stackSize/sizeof(stack[0]) - 1], // stack grows down. > flags, > (void*) &func); > // If CLONE_VM is not set, ::clone would create a process which runs in a > // separate copy of the memory space of the calling process. So we destroy > the > // stack here to avoid memory leak. If CLONE_VM is set, ::clone would > create a > // thread which runs in the same memory space with the calling process. > if (!(flags & CLONE_VM)) { > delete[] stack; > } > return pid; > } > {code} > syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned > stack mandatory architecture(aarch64 ppc64) it will get error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2585) Use full width for mesos div.container
[ https://issues.apache.org/jira/browse/MESOS-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-2585: Assignee: Michael Lunøe (was: haosdent) > Use full width for mesos div.container > -- > > Key: MESOS-2585 > URL: https://issues.apache.org/jira/browse/MESOS-2585 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Alson Kemp >Assignee: Michael Lunøe >Priority: Trivial > Attachments: After (patch 2).png, Narrow (current).png, Wide > (patched).png, github_full_width.png > > > I've patched our Mesos installation so that the webui takes up the full page > width and is much nicer to look at on large monitors. It's a small change. > If y'all want me to submit a PR with the update, I'll do so. > Before: > !https://issues.apache.org/jira/secure/attachment/12708818/Narrow%20%28current%29.png|width=800! > After: > !https://issues.apache.org/jira/secure/attachment/12708861/After%20%28patch%202%29.png|width=800! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2585) Use full width for mesos div.container
[ https://issues.apache.org/jira/browse/MESOS-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127629#comment-15127629 ] haosdent commented on MESOS-2585: - So not matter which screen size, we always keep use full width, right? > Use full width for mesos div.container > -- > > Key: MESOS-2585 > URL: https://issues.apache.org/jira/browse/MESOS-2585 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Alson Kemp >Assignee: haosdent >Priority: Trivial > Attachments: After (patch 2).png, Narrow (current).png, Wide > (patched).png, github_full_width.png > > > I've patched our Mesos installation so that the webui takes up the full page > width and is much nicer to look at on large monitors. It's a small change. > If y'all want me to submit a PR with the update, I'll do so. > Before: > !https://issues.apache.org/jira/secure/attachment/12708818/Narrow%20%28current%29.png|width=800! > After: > !https://issues.apache.org/jira/secure/attachment/12708861/After%20%28patch%202%29.png|width=800! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127593#comment-15127593 ] Deshi Xiao commented on MESOS-1806: --- Shuai Lin, Through reading your result on implement etcd, i can't found any benefit to replace zookeeper, and do you have the same feeling? even i hate the zookeeper, but the etcd implement result is not good match my requirements. > Substituting etcd for Zookeeper > --- > > Key: MESOS-1806 > URL: https://issues.apache.org/jira/browse/MESOS-1806 > Project: Mesos > Issue Type: Task > Components: leader election >Reporter: Ed Ropple >Assignee: Shuai Lin >Priority: Minor > >eropple: Could you also file a new JIRA for Mesos to drop ZK > in favor of etcd or ReplicatedLog? Would love to get some momentum going on > that one. > -- > Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)
[ https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AndyPang updated MESOS-4577: Description: mesos run in AArch64 will get error, the log is: {code} E0101 00:06:56.636520 32411 slave.cpp:3342] Container 'b6be429a-08f0-4d52-b01d-abfcb6e0106b' for executor 'hello.84d205ae-f626-11de-bd66-7a3f6cf980b9' of framework '868b9f04-9179-427b-b050-ee8f89ffa3bd-' failed to start: Failed to fork executor: Failed to clone child process: Failed to clone: Invalid argument {code} the "clone" achieve in libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" : {code:title=clone|borderStyle=solid} inline pid_t clone(const lambda::function& func, int flags) { // Stack for the child. // - unsigned long long used for best alignment. // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux. // // NOTE: We need to allocate the stack dynamically. This is because // glibc's 'clone' will modify the stack passed to it, therefore the // stack must NOT be shared as multiple 'clone's can be invoked // simultaneously. int stackSize = 8 * 1024 * 1024; unsigned long long *stack = new unsigned long long[stackSize/sizeof(unsigned long long)]; pid_t pid = ::clone( childMain, &stack[stackSize/sizeof(stack[0]) - 1], // stack grows down. flags, (void*) &func); // If CLONE_VM is not set, ::clone would create a process which runs in a // separate copy of the memory space of the calling process. So we destroy the // stack here to avoid memory leak. If CLONE_VM is set, ::clone would create a // thread which runs in the same memory space with the calling process. if (!(flags & CLONE_VM)) { delete[] stack; } return pid; } {code} syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned stack mandatory architecture(aarch64 ppc64) it will get error. was: libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" : {code:title=clone|borderStyle=solid} inline pid_t clone(const lambda::function& func, int flags) { // Stack for the child. // - unsigned long long used for best alignment. // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux. // // NOTE: We need to allocate the stack dynamically. This is because // glibc's 'clone' will modify the stack passed to it, therefore the // stack must NOT be shared as multiple 'clone's can be invoked // simultaneously. int stackSize = 8 * 1024 * 1024; unsigned long long *stack = new unsigned long long[stackSize/sizeof(unsigned long long)]; pid_t pid = ::clone( childMain, &stack[stackSize/sizeof(stack[0]) - 1], // stack grows down. flags, (void*) &func); // If CLONE_VM is not set, ::clone would create a process which runs in a // separate copy of the memory space of the calling process. So we destroy the // stack here to avoid memory leak. If CLONE_VM is set, ::clone would create a // thread which runs in the same memory space with the calling process. if (!(flags & CLONE_VM)) { delete[] stack; } return pid; } {code} syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned stack mandatory architecture(aarch64 ppc64) it will get error. > libprocess can not run on 16-byte aligned stack mandatory > architecture(aarch64 ppc64) > -- > > Key: MESOS-4577 > URL: https://issues.apache.org/jira/browse/MESOS-4577 > Project: Mesos > Issue Type: Bug > Components: containerization, libprocess, stout > Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 > 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux >Reporter: AndyPang >Assignee: AndyPang > Labels: mesosphere > > mesos run in AArch64 will get error, the log is: > {code} > E0101 00:06:56.636520 32411 slave.cpp:3342] Container > 'b6be429a-08f0-4d52-b01d-abfcb6e0106b' for executor > 'hello.84d205ae-f626-11de-bd66-7a3f6cf980b9' of framework > '868b9f04-9179-427b-b050-ee8f89ffa3bd-' failed to start: Failed to fork > executor: Failed to clone child process: Failed to clone: Invalid argument > {code} > the "clone" achieve in libprocess 3rdparty stout library(in linux.hpp) > packaging a syscall "clone" : > {code:title=clone|borderStyle=solid} > inline pid_t clone(const lambda::function& func, int flags) > { > // Stack for the child. > // - unsigned long long used for best alignment. > // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux. > // > // NOTE: We need to allocate the stack dynamically. This is because > // glibc's 'clone' will modify the stack passed to it, therefore the > // stack must NOT be shared as multiple 'clone's can be invoked > // simultaneously. >
[jira] [Commented] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)
[ https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127538#comment-15127538 ] AndyPang commented on MESOS-4577: - the syscal "clone" achieve in linux-4.1.6/arch/arm64/kernel/process.c, in "copy_thread" function: {code} if (stack_start) { /* 16-byte aligned stack mandatory on AArch64 */ if (stack_start & 15) return -EINVAL; childregs->sp = stack_start; } {code} AArch64 the stack must be 16-byte aligned > libprocess can not run on 16-byte aligned stack mandatory > architecture(aarch64 ppc64) > -- > > Key: MESOS-4577 > URL: https://issues.apache.org/jira/browse/MESOS-4577 > Project: Mesos > Issue Type: Bug > Components: containerization, libprocess, stout > Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 > 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux >Reporter: AndyPang >Assignee: AndyPang > Labels: mesosphere > > libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" : > {code:title=clone|borderStyle=solid} > inline pid_t clone(const lambda::function& func, int flags) > { > // Stack for the child. > // - unsigned long long used for best alignment. > // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux. > // > // NOTE: We need to allocate the stack dynamically. This is because > // glibc's 'clone' will modify the stack passed to it, therefore the > // stack must NOT be shared as multiple 'clone's can be invoked > // simultaneously. > int stackSize = 8 * 1024 * 1024; > unsigned long long *stack = > new unsigned long long[stackSize/sizeof(unsigned long long)]; > pid_t pid = ::clone( > childMain, > &stack[stackSize/sizeof(stack[0]) - 1], // stack grows down. > flags, > (void*) &func); > // If CLONE_VM is not set, ::clone would create a process which runs in a > // separate copy of the memory space of the calling process. So we destroy > the > // stack here to avoid memory leak. If CLONE_VM is set, ::clone would > create a > // thread which runs in the same memory space with the calling process. > if (!(flags & CLONE_VM)) { > delete[] stack; > } > return pid; > } > {code} > syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned > stack mandatory architecture(aarch64 ppc64) it will get error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)
[ https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AndyPang reassigned MESOS-4577: --- Assignee: AndyPang > libprocess can not run on 16-byte aligned stack mandatory > architecture(aarch64 ppc64) > -- > > Key: MESOS-4577 > URL: https://issues.apache.org/jira/browse/MESOS-4577 > Project: Mesos > Issue Type: Bug > Components: containerization, libprocess, stout > Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 > 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux >Reporter: AndyPang >Assignee: AndyPang > Labels: mesosphere > > libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" : > {code:title=clone|borderStyle=solid} > inline pid_t clone(const lambda::function& func, int flags) > { > // Stack for the child. > // - unsigned long long used for best alignment. > // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux. > // > // NOTE: We need to allocate the stack dynamically. This is because > // glibc's 'clone' will modify the stack passed to it, therefore the > // stack must NOT be shared as multiple 'clone's can be invoked > // simultaneously. > int stackSize = 8 * 1024 * 1024; > unsigned long long *stack = > new unsigned long long[stackSize/sizeof(unsigned long long)]; > pid_t pid = ::clone( > childMain, > &stack[stackSize/sizeof(stack[0]) - 1], // stack grows down. > flags, > (void*) &func); > // If CLONE_VM is not set, ::clone would create a process which runs in a > // separate copy of the memory space of the calling process. So we destroy > the > // stack here to avoid memory leak. If CLONE_VM is set, ::clone would > create a > // thread which runs in the same memory space with the calling process. > if (!(flags & CLONE_VM)) { > delete[] stack; > } > return pid; > } > {code} > syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned > stack mandatory architecture(aarch64 ppc64) it will get error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)
AndyPang created MESOS-4577: --- Summary: libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64) Key: MESOS-4577 URL: https://issues.apache.org/jira/browse/MESOS-4577 Project: Mesos Issue Type: Bug Components: containerization, libprocess, stout Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux Reporter: AndyPang libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" : {code:title=clone|borderStyle=solid} inline pid_t clone(const lambda::function& func, int flags) { // Stack for the child. // - unsigned long long used for best alignment. // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux. // // NOTE: We need to allocate the stack dynamically. This is because // glibc's 'clone' will modify the stack passed to it, therefore the // stack must NOT be shared as multiple 'clone's can be invoked // simultaneously. int stackSize = 8 * 1024 * 1024; unsigned long long *stack = new unsigned long long[stackSize/sizeof(unsigned long long)]; pid_t pid = ::clone( childMain, &stack[stackSize/sizeof(stack[0]) - 1], // stack grows down. flags, (void*) &func); // If CLONE_VM is not set, ::clone would create a process which runs in a // separate copy of the memory space of the calling process. So we destroy the // stack here to avoid memory leak. If CLONE_VM is set, ::clone would create a // thread which runs in the same memory space with the calling process. if (!(flags & CLONE_VM)) { delete[] stack; } return pid; } {code} syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned stack mandatory architecture(aarch64 ppc64) it will get error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"
[ https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127429#comment-15127429 ] haosdent commented on MESOS-4576: - +1 > Introduce a stout helper for "which" > > > Key: MESOS-4576 > URL: https://issues.apache.org/jira/browse/MESOS-4576 > Project: Mesos > Issue Type: Improvement > Components: stout >Reporter: Joseph Wu > Labels: mesosphere > > We may want to add a helper to {{stout/os.hpp}} that will natively emulate > the functionality of the Linux utility {{which}}. i.e. > {code} > Option which(const string& command) > { > Option path = os::getenv("PATH"); > // Loop through path and return the first one which os::exists(...). > return None(); > } > {code} > This helper may be useful: > * for test filters in {{src/tests/environment.cpp}} > * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} > * the {{sha512}} utility in {{src/common/command_utils.cpp}} > * as runtime checks in the {{LogrotateContainerLogger}} > * etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4576) Introduce a stout helper for "which"
[ https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-4576: --- Assignee: haosdent > Introduce a stout helper for "which" > > > Key: MESOS-4576 > URL: https://issues.apache.org/jira/browse/MESOS-4576 > Project: Mesos > Issue Type: Improvement > Components: stout >Reporter: Joseph Wu >Assignee: haosdent > Labels: mesosphere > > We may want to add a helper to {{stout/os.hpp}} that will natively emulate > the functionality of the Linux utility {{which}}. i.e. > {code} > Option which(const string& command) > { > Option path = os::getenv("PATH"); > // Loop through path and return the first one which os::exists(...). > return None(); > } > {code} > This helper may be useful: > * for test filters in {{src/tests/environment.cpp}} > * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} > * the {{sha512}} utility in {{src/common/command_utils.cpp}} > * as runtime checks in the {{LogrotateContainerLogger}} > * etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2585) Use full width for mesos div.container
[ https://issues.apache.org/jira/browse/MESOS-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127301#comment-15127301 ] Michael Lunøe commented on MESOS-2585: -- The problem with the suggested solution is that it overrides Bootstrap functionality and does not work with, but against Bootstrap styles. I have created a patch to show my proposed solution here: https://reviews.apache.org/r/43072/ > Use full width for mesos div.container > -- > > Key: MESOS-2585 > URL: https://issues.apache.org/jira/browse/MESOS-2585 > Project: Mesos > Issue Type: Improvement > Components: webui >Reporter: Alson Kemp >Assignee: haosdent >Priority: Trivial > Attachments: After (patch 2).png, Narrow (current).png, Wide > (patched).png, github_full_width.png > > > I've patched our Mesos installation so that the webui takes up the full page > width and is much nicer to look at on large monitors. It's a small change. > If y'all want me to submit a PR with the update, I'll do so. > Before: > !https://issues.apache.org/jira/secure/attachment/12708818/Narrow%20%28current%29.png|width=800! > After: > !https://issues.apache.org/jira/secure/attachment/12708861/After%20%28patch%202%29.png|width=800! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"
[ https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127287#comment-15127287 ] Greg Mann commented on MESOS-4421: -- Sure I'm happy to help review :-) > Document that /reserve, /create-volumes endpoints can return misleading > "success" > - > > Key: MESOS-4421 > URL: https://issues.apache.org/jira/browse/MESOS-4421 > Project: Mesos > Issue Type: Task > Components: documentation, master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: documentation, endpoint, mesosphere, persistent-volumes, > reservations > > The docs for the {{/reserve}} endpoint say: > {noformat} > 200 OK: Success (the requested resources have been reserved). > {noformat} > This is not true: the master returns {{200}} when the request has been > validated and a {{CheckpointResourcesMessage}} has been sent to the agent, > but the master does not attempt to verify that the message has been received > or that the agent successfully checkpointed. Same behavior applies to > {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}. > We should _either_: > 1. Accurately document what {{200}} return code means. > 2. Change the implementation to wait for the agent's next checkpoint to > succeed (and to include the effect of the operation) before returning success > to the HTTP client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-416) Ensure master / slave do not get kernel OOM before executors, by setting oom_adj control.
[ https://issues.apache.org/jira/browse/MESOS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127282#comment-15127282 ] James DeFelice commented on MESOS-416: -- FWIW kubernetes is doing this already for its important procs > Ensure master / slave do not get kernel OOM before executors, by setting > oom_adj control. > - > > Key: MESOS-416 > URL: https://issues.apache.org/jira/browse/MESOS-416 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler > Labels: mesosphere, security, twitter > > We can adjust the /proc//oom_adj control during master / slave startup, > setting it to a low value to ensure we aren't killed first during an OOM. > Relevant LWN article: http://lwn.net/Articles/317814/ > Also relevant: https://bugzilla.redhat.com/show_bug.cgi?id=239313 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-416) Ensure master / slave do not get kernel OOM before executors, by setting oom_adj control.
[ https://issues.apache.org/jira/browse/MESOS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James DeFelice updated MESOS-416: - Labels: mesosphere security twitter (was: mesosphere twitter) > Ensure master / slave do not get kernel OOM before executors, by setting > oom_adj control. > - > > Key: MESOS-416 > URL: https://issues.apache.org/jira/browse/MESOS-416 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler > Labels: mesosphere, security, twitter > > We can adjust the /proc//oom_adj control during master / slave startup, > setting it to a low value to ensure we aren't killed first during an OOM. > Relevant LWN article: http://lwn.net/Articles/317814/ > Also relevant: https://bugzilla.redhat.com/show_bug.cgi?id=239313 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4235) JSON generation performance improvement
[ https://issues.apache.org/jira/browse/MESOS-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127275#comment-15127275 ] Cong Wang commented on MESOS-4235: -- Hi, [~mcypark] Since all the subtickets are resolved, I assume this one is resolved too in the latest code base? > JSON generation performance improvement > --- > > Key: MESOS-4235 > URL: https://issues.apache.org/jira/browse/MESOS-4235 > Project: Mesos > Issue Type: Epic > Components: libprocess, master, stout >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere, scalability, twitter > > This is an epic which evolved from MESOS-2353. As mentioned in the > description of MESOS-2353, most of the work is spent performing memory > allocation/deallocation. Some preliminary efforts have been made such as > calling {{reserve}} for {{JSON::Array}}. There are still plenty of dynamic > allocations being made especially from instances of {{JSON::Object}} which > hold a {{std::map}} as a member. > The current approach being adopted is to introduce a {{jsonify}} function > which by-passes these unnecessary dynamic allocations and copying, and to > simply hold references to the underlying objects. > We plan to first introduce the {{jsonify}} function to {{stout}}, and update > master's {{state}} endpoint, then proceed to update the rest of the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-416) Ensure master / slave do not get kernel OOM before executors, by setting oom_adj control.
[ https://issues.apache.org/jira/browse/MESOS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Maloney updated MESOS-416: --- Labels: mesosphere twitter (was: twitter) > Ensure master / slave do not get kernel OOM before executors, by setting > oom_adj control. > - > > Key: MESOS-416 > URL: https://issues.apache.org/jira/browse/MESOS-416 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler > Labels: mesosphere, twitter > > We can adjust the /proc//oom_adj control during master / slave startup, > setting it to a low value to ensure we aren't killed first during an OOM. > Relevant LWN article: http://lwn.net/Articles/317814/ > Also relevant: https://bugzilla.redhat.com/show_bug.cgi?id=239313 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-416) Ensure master / slave do not get kernel OOM before executors, by setting oom_adj control.
[ https://issues.apache.org/jira/browse/MESOS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127272#comment-15127272 ] James DeFelice commented on MESOS-416: -- AKA oom_score_adj ? > Ensure master / slave do not get kernel OOM before executors, by setting > oom_adj control. > - > > Key: MESOS-416 > URL: https://issues.apache.org/jira/browse/MESOS-416 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler > Labels: twitter > > We can adjust the /proc//oom_adj control during master / slave startup, > setting it to a low value to ensure we aren't killed first during an OOM. > Relevant LWN article: http://lwn.net/Articles/317814/ > Also relevant: https://bugzilla.redhat.com/show_bug.cgi?id=239313 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess
[ https://issues.apache.org/jira/browse/MESOS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-3570: -- Shepherd: Vinod Kone Assignee: Anand Mazumdar (was: Vinod Kone) > Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess > > > Key: MESOS-3570 > URL: https://issues.apache.org/jira/browse/MESOS-3570 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar > Labels: mesosphere, newbie > > Currently, the scheduler library sends calls in order by chaining them and > sending them only when it has received a response for the earlier call. This > was done because there was no HTTP Pipelining abstraction in Libprocess > {{process::post}}. > However once {{MESOS-3332}} is resolved, we should be now able to use the new > abstraction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"
[ https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127221#comment-15127221 ] Artem Harutyunyan commented on MESOS-4421: -- Hey [~greggomann], could you please take a look at this one? Jie is fine with committing it once it has a Ship It from you. > Document that /reserve, /create-volumes endpoints can return misleading > "success" > - > > Key: MESOS-4421 > URL: https://issues.apache.org/jira/browse/MESOS-4421 > Project: Mesos > Issue Type: Task > Components: documentation, master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: documentation, endpoint, mesosphere, persistent-volumes, > reservations > > The docs for the {{/reserve}} endpoint say: > {noformat} > 200 OK: Success (the requested resources have been reserved). > {noformat} > This is not true: the master returns {{200}} when the request has been > validated and a {{CheckpointResourcesMessage}} has been sent to the agent, > but the master does not attempt to verify that the message has been received > or that the agent successfully checkpointed. Same behavior applies to > {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}. > We should _either_: > 1. Accurately document what {{200}} return code means. > 2. Change the implementation to wait for the agent's next checkpoint to > succeed (and to include the effect of the operation) before returning success > to the HTTP client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4053: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27 (was: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28) > MemoryPressureMesosTest tests fail on CentOS 6.6 > > > Key: MESOS-4053 > URL: https://issues.apache.org/jira/browse/MESOS-4053 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann >Assignee: Benjamin Hindman > Labels: mesosphere, test-failure > > {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and > {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It > seems that mounted cgroups are not properly cleaned up after previous tests, > so multiple hierarchies are detected and thus an error is produced: > {code} > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > ../../src/tests/mesos.cpp:849: Failure > Value of: _baseHierarchy.get() > Actual: "/cgroup" > Expected: baseHierarchy > Which is: "/tmp/mesos_test_cgroup" > - > Multiple cgroups base hierarchies detected: > '/tmp/mesos_test_cgroup' > '/cgroup' > Mesos does not support multiple cgroups base hierarchies. > Please unmount the corresponding (or all) subsystems. > - > ../../src/tests/mesos.cpp:932: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms) > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > ../../src/tests/mesos.cpp:849: Failure > Value of: _baseHierarchy.get() > Actual: "/cgroup" > Expected: baseHierarchy > Which is: "/tmp/mesos_test_cgroup" > - > Multiple cgroups base hierarchies detected: > '/tmp/mesos_test_cgroup' > '/cgroup' > Mesos does not support multiple cgroups base hierarchies. > Please unmount the corresponding (or all) subsystems. > - > ../../src/tests/mesos.cpp:932: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"
[ https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127138#comment-15127138 ] Jojy Varghese commented on MESOS-4576: -- Not sure. The choice between *which* and *find* is same as the use case of these commands on a linux machine. *which* is used when you know the command and want to find the path to the command in PATH. *find* is used when you want to search for a file/command in a path. I think, in the case of SHAxxx, what we need is a combination of which and find- we want to *find* the first command that looks like "sha\(512\)\*sum"|"openssl" in a set of Paths ( that could be PATH). > Introduce a stout helper for "which" > > > Key: MESOS-4576 > URL: https://issues.apache.org/jira/browse/MESOS-4576 > Project: Mesos > Issue Type: Improvement > Components: stout >Reporter: Joseph Wu > Labels: mesosphere > > We may want to add a helper to {{stout/os.hpp}} that will natively emulate > the functionality of the Linux utility {{which}}. i.e. > {code} > Option which(const string& command) > { > Option path = os::getenv("PATH"); > // Loop through path and return the first one which os::exists(...). > return None(); > } > {code} > This helper may be useful: > * for test filters in {{src/tests/environment.cpp}} > * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} > * the {{sha512}} utility in {{src/common/command_utils.cpp}} > * as runtime checks in the {{LogrotateContainerLogger}} > * etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"
[ https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127107#comment-15127107 ] Joseph Wu commented on MESOS-4576: -- I think the usage for the {{sha512}} case would be multiple calls to {{os::which}}. i.e. Something not-too-pretty like: {code} Option whichSha = os::which("shasum"); if (whichSha.isNone()) { whichSha = os::which("sha512sum"); if (whichSha.isNone()) { whichSha = os::which("openssl"); if (whichSha.isNone()) { return Error("..."); } } } {code} > Introduce a stout helper for "which" > > > Key: MESOS-4576 > URL: https://issues.apache.org/jira/browse/MESOS-4576 > Project: Mesos > Issue Type: Improvement > Components: stout >Reporter: Joseph Wu > Labels: mesosphere > > We may want to add a helper to {{stout/os.hpp}} that will natively emulate > the functionality of the Linux utility {{which}}. i.e. > {code} > Option which(const string& command) > { > Option path = os::getenv("PATH"); > // Loop through path and return the first one which os::exists(...). > return None(); > } > {code} > This helper may be useful: > * for test filters in {{src/tests/environment.cpp}} > * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} > * the {{sha512}} utility in {{src/common/command_utils.cpp}} > * as runtime checks in the {{LogrotateContainerLogger}} > * etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4576) Introduce a stout helper for "which"
[ https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127080#comment-15127080 ] Jojy Varghese edited comment on MESOS-4576 at 2/1/16 9:38 PM: -- Wondering if we need another interface that accepts a regular expression of command. For example, in the case of shasum, we dont know what command to look for *shasum* or *sha512sum* or maybe *openssl* (openssl dgst). Maybe that interface looks like *find*. was (Author: jojy): Wondering if we need another interface that accepts a regular expression of command. For example, in the case of shasum, we dont know what command to look for *shasum* or *sha512sum* or maybe *openssl* (openssl dgst). > Introduce a stout helper for "which" > > > Key: MESOS-4576 > URL: https://issues.apache.org/jira/browse/MESOS-4576 > Project: Mesos > Issue Type: Improvement > Components: stout >Reporter: Joseph Wu > Labels: mesosphere > > We may want to add a helper to {{stout/os.hpp}} that will natively emulate > the functionality of the Linux utility {{which}}. i.e. > {code} > Option which(const string& command) > { > Option path = os::getenv("PATH"); > // Loop through path and return the first one which os::exists(...). > return None(); > } > {code} > This helper may be useful: > * for test filters in {{src/tests/environment.cpp}} > * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} > * the {{sha512}} utility in {{src/common/command_utils.cpp}} > * as runtime checks in the {{LogrotateContainerLogger}} > * etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"
[ https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127080#comment-15127080 ] Jojy Varghese commented on MESOS-4576: -- Wondering if we need another interface that accepts a regular expression of command. For example, in the case of shasum, we dont know what command to look for *shasum* or *sha512sum* or maybe *openssl* (openssl dgst). > Introduce a stout helper for "which" > > > Key: MESOS-4576 > URL: https://issues.apache.org/jira/browse/MESOS-4576 > Project: Mesos > Issue Type: Improvement > Components: stout >Reporter: Joseph Wu > Labels: mesosphere > > We may want to add a helper to {{stout/os.hpp}} that will natively emulate > the functionality of the Linux utility {{which}}. i.e. > {code} > Option which(const string& command) > { > Option path = os::getenv("PATH"); > // Loop through path and return the first one which os::exists(...). > return None(); > } > {code} > This helper may be useful: > * for test filters in {{src/tests/environment.cpp}} > * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} > * the {{sha512}} utility in {{src/common/command_utils.cpp}} > * as runtime checks in the {{LogrotateContainerLogger}} > * etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2974) stout flags can't have their defaults reset
[ https://issues.apache.org/jira/browse/MESOS-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joerg Schad reassigned MESOS-2974: -- Assignee: Joerg Schad > stout flags can't have their defaults reset > --- > > Key: MESOS-2974 > URL: https://issues.apache.org/jira/browse/MESOS-2974 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Joris Van Remoortere >Assignee: Joerg Schad > Labels: flags, newbie, stout > > Stout flags don't remember their default values, and so can't have their > defaults reset. This makes it hard to reset flags to their defaults between > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4556) ShasumTest.SHA512SimpleFile failed on centos7.
[ https://issues.apache.org/jira/browse/MESOS-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127055#comment-15127055 ] Jie Yu commented on MESOS-4556: --- commit 8b5523d2c512cfab4b69b70960515c4a8d791d2b Author: haosdent huang Date: Mon Feb 1 13:17:00 2016 -0800 Fixed ShasumTest.SHA512SimpleFile on centos7. Review: https://reviews.apache.org/r/43014/ > ShasumTest.SHA512SimpleFile failed on centos7. > -- > > Key: MESOS-4556 > URL: https://issues.apache.org/jira/browse/MESOS-4556 > Project: Mesos > Issue Type: Bug >Reporter: Jie Yu >Assignee: haosdent > > Looks like shasum is not available on some systems. We should check if it's > available on those systems. > {noformat} > [ RUN ] ShasumTest.SHA512SimpleFile > ../../src/tests/common/command_utils_tests.cpp:237: Failure > (sha512).failure(): Subprocess 'shasum, shasum, -a, 512, /tmp/o9CZPZ/test' > failed: ABORT: (../../../3rdparty/libprocess/src/subprocess.cpp:293): Failed > to os::execvpe on path 'shasum': No such file or directory > *** Aborted at 1454097934 (unix time) try "date -d @1454097934" if you are > using GNU date *** > PC: @ 0x7f26a0ae75f7 __GI_raise > *** SIGABRT (@0x3e8258d) received by PID 9613 (TID 0x7f26a7aa78c0) from > PID 9613; stack trace: *** > @ 0x7f26a1aae100 (unknown) > @ 0x7f26a0ae75f7 __GI_raise > @ 0x7f26a0ae8ce8 __GI_abort > @ 0x998808 _Abort() > @ 0x998836 _Abort() > @ 0x7f26a678937d process::childMain() > @ 0x7f26a678f5e3 > _ZNSt5_BindIFPFiRKSsPPcS3_RK6OptionISt8functionIFivEEERKN7process10Subprocess2IO20InputFileDescriptorsERKNSD_21OutputFileDescriptorsESJ_ESsS3_S3_S8_SE_SH_SH_EE6__callIiJEJLm0ELm1ELm2ELm3ELm4ELm5ELm6T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE > @ 0x7f26a678ed52 std::_Bind<>::operator()<>() > @ 0x7f26a678e124 std::_Function_handler<>::_M_invoke() > @ 0x99b49c std::function<>::operator()() > @ 0x7f26a67890e6 process::defaultClone() > @ 0x7f26a678d922 std::_Function_handler<>::_M_invoke() > @ 0x7f26a678d1fd std::function<>::operator()() > @ 0x7f26a6789ed7 process::subprocess() > @ 0x7f26a57b9cec mesos::internal::command::launch() > @ 0x7f26a57bacdc mesos::internal::command::shasum() > @ 0x7f26a57bae37 mesos::internal::command::sha512() > @ 0x1440cce > mesos::internal::tests::ShasumTest_SHA512SimpleFile_Test::TestBody() > @ 0x164e8a2 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x16497f0 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x162add5 testing::Test::Run() > @ 0x162b558 testing::TestInfo::Run() > @ 0x162bb9e testing::TestCase::Run() > @ 0x1632478 testing::internal::UnitTestImpl::RunAllTests() > @ 0x164f4c7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x164a36e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x16311be testing::UnitTest::Run() > @ 0xdeeb0a RUN_ALL_TESTS() > @ 0xdee720 main > @ 0x7f26a0ad3b15 __libc_start_main > @ 0x997599 (unknown) > [ FAILED ] ShasumTest.SHA512SimpleFile (202 ms) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"
[ https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127047#comment-15127047 ] Jie Yu commented on MESOS-4576: --- +1 We need to check if the binary is executable or not as well (using 'access'). > Introduce a stout helper for "which" > > > Key: MESOS-4576 > URL: https://issues.apache.org/jira/browse/MESOS-4576 > Project: Mesos > Issue Type: Improvement > Components: stout >Reporter: Joseph Wu > Labels: mesosphere > > We may want to add a helper to {{stout/os.hpp}} that will natively emulate > the functionality of the Linux utility {{which}}. i.e. > {code} > Option which(const string& command) > { > Option path = os::getenv("PATH"); > // Loop through path and return the first one which os::exists(...). > return None(); > } > {code} > This helper may be useful: > * for test filters in {{src/tests/environment.cpp}} > * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} > * the {{sha512}} utility in {{src/common/command_utils.cpp}} > * as runtime checks in the {{LogrotateContainerLogger}} > * etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4576) Introduce a stout helper for "which"
Joseph Wu created MESOS-4576: Summary: Introduce a stout helper for "which" Key: MESOS-4576 URL: https://issues.apache.org/jira/browse/MESOS-4576 Project: Mesos Issue Type: Improvement Components: stout Reporter: Joseph Wu We may want to add a helper to {{stout/os.hpp}} that will natively emulate the functionality of the Linux utility {{which}}. i.e. {code} Option which(const string& command) { Option path = os::getenv("PATH"); // Loop through path and return the first one which os::exists(...). return None(); } {code} This helper may be useful: * for test filters in {{src/tests/environment.cpp}} * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}} * the {{sha512}} utility in {{src/common/command_utils.cpp}} * as runtime checks in the {{LogrotateContainerLogger}} * etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4537) Optionally install stout and libprocess test binaries
[ https://issues.apache.org/jira/browse/MESOS-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Disha Singh reassigned MESOS-4537: -- Assignee: Disha Singh > Optionally install stout and libprocess test binaries > - > > Key: MESOS-4537 > URL: https://issues.apache.org/jira/browse/MESOS-4537 > Project: Mesos > Issue Type: Improvement > Components: build, libprocess, stout >Reporter: Benjamin Bannier >Assignee: Disha Singh >Priority: Trivial > Labels: newbie > > With MESOS-3608 we add a way to install mesos-test binaries. We should > provide the same functionality for stout and libprocess test. Like > mesos-tests they should also be run automatically during distcheck via > installcheck. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3831) Document operator HTTP endpoints
[ https://issues.apache.org/jira/browse/MESOS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-3831: --- Story Points: 3 > Document operator HTTP endpoints > > > Key: MESOS-3831 > URL: https://issues.apache.org/jira/browse/MESOS-3831 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Neil Conway >Assignee: Kevin Klues >Priority: Minor > Labels: documentation, mesosphere, newbie > > These are not exhaustively documented; they probably should be. > Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described > in the reservation doc page. But it would be good to have a single page that > lists all the endpoints and their semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4071) Master crash during framework teardown ( Check failed: total.resources.contains(slaveId))
[ https://issues.apache.org/jira/browse/MESOS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4071: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Master crash during framework teardown ( Check failed: > total.resources.contains(slaveId)) > - > > Key: MESOS-4071 > URL: https://issues.apache.org/jira/browse/MESOS-4071 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 0.25.0 >Reporter: Mandeep Chadha >Assignee: Neil Conway > Labels: mesosphere > > Stack Trace : > NOTE : Replaced IP address with XX.XX.XX.XX > {code} > I1204 10:31:03.391127 2588810 master.cpp:5564] Processing TEARDOWN call for > framework 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST) at > scheduler-c8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237 > I1204 10:31:03.391177 2588810 master.cpp:5576] Removing framework > 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST)) at > schedulerc8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237 > I1204 10:31:03.391337 2588805 hierarchical.hpp:605] Deactivated framework > 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 > F1204 10:31:03.395500 2588810 sorter.cpp:233] Check failed: > total.resources.contains(slaveId) > *** Check failure stack trace: *** > @ 0x7f2b3dda53d8 google::LogMessage::Fail() > @ 0x7f2b3dda5327 google::LogMessage::SendToLog() > @ 0x7f2b3dda4d38 google::LogMessage::Flush() > @ 0x7f2b3dda7a6c google::LogMessageFatal::~LogMessageFatal() > @ 0x7f2b3d3351a1 > mesos::internal::master::allocator::DRFSorter::remove() > @ 0x7f2b3d0b8c29 > mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework() > @ 0x7f2b3d0ca823 > _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDES6_EEvRKNS_3PIDIT_EEMSA_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESJ_ > @ 0x7f2b3d0dc8dc > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_11FrameworkIDESA_EEvRKNS0_3PIDIT_EEMSE_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2 > _ > @ 0x7f2b3dd2cc35 std::function<>::operator()() > @ 0x7f2b3dd15ae5 process::ProcessBase::visit() > @ 0x7f2b3dd188e2 process::DispatchEvent::visit() > @ 0x472366 process::ProcessBase::serve() > @ 0x7f2b3dd1203f process::ProcessManager::resume() > @ 0x7f2b3dd061b2 process::internal::schedule() > @ 0x7f2b3dd63efd > _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Inde > x_tupleIJXspT_EEE > @ 0x7f2b3dd63e4d std::_Bind_simple<>::operator()() > @ 0x7f2b3dd63de6 std::thread::_Impl<>::_M_run() > @ 0x318c2b6470 (unknown) > @ 0x318b2079d1 (unknown) > @ 0x318aae8b5d (unknown) > @ (nil) (unknown) > Aborted (core dumped) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-191) Add support for multiple disk resources
[ https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-191: --- Sprint: Mesosphere Sprint 27 (was: Mesosphere Sprint 27, Mesosphere Sprint 28) > Add support for multiple disk resources > --- > > Key: MESOS-191 > URL: https://issues.apache.org/jira/browse/MESOS-191 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone >Assignee: Joris Van Remoortere > Labels: mesosphere, persistent-volumes > > It would be nice to schedule mesos tasks with fine-grained disk scheduling. > The idea is, a slave with multiple spindles, would specify spindle specific > config. Mesos would then include this info in its resource offers to > frameworks. > Official Design Doc: > https://docs.google.com/document/d/1syPxygVNEHjG6FoyqslnpUGgNpYKU9QzKBuV2yKmjfQ/edit#heading=h.4fzj9sl24cwy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3763) Need for http::put request method
[ https://issues.apache.org/jira/browse/MESOS-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3763: - Sprint: Mesosphere Sprint 27 (was: Mesosphere Sprint 27, Mesosphere Sprint 28) > Need for http::put request method > - > > Key: MESOS-3763 > URL: https://issues.apache.org/jira/browse/MESOS-3763 > Project: Mesos > Issue Type: Task >Reporter: Joerg Schad >Assignee: Yongqiao Wang >Priority: Minor > Labels: mesosphere > > As we decided to create a more restful api for managing Quota request. > Therefore we also want to use the HTTP put request and hence need to enable > the libprocess/http to send put request besides get and post requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2317) Remove deprecated checkpoint=false code
[ https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2317: - Sprint: Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 26, Mesosphere Sprint 27) > Remove deprecated checkpoint=false code > --- > > Key: MESOS-2317 > URL: https://issues.apache.org/jira/browse/MESOS-2317 > Project: Mesos > Issue Type: Epic >Affects Versions: 0.22.0 >Reporter: Adam B >Assignee: Joerg Schad > Labels: checkpoint, mesosphere > > Cody's plan from MESOS-444 was: > 1) -Make it so the flag can't be changed at the command line- > 2) -Remove the checkpoint variable entirely from slave/flags.hpp. This is a > fairly involved change since a number of unit tests depend on manually > setting the flag, as well as the default being non-checkpointing.- > 3) -Remove logic around checkpointing in the slave, remove logic inside the > master.- > 4) Drop the flag from the SlaveInfo struct (Will require a deprecation cycle). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2742) Architecture doc on global resources
[ https://issues.apache.org/jira/browse/MESOS-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2742: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Architecture doc on global resources > > > Key: MESOS-2742 > URL: https://issues.apache.org/jira/browse/MESOS-2742 > Project: Mesos > Issue Type: Task >Reporter: Niklas Quarfot Nielsen >Assignee: Joerg Schad > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4364) Add roles validation code to master
[ https://issues.apache.org/jira/browse/MESOS-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4364: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Add roles validation code to master > --- > > Key: MESOS-4364 > URL: https://issues.apache.org/jira/browse/MESOS-4364 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Benjamin Bannier >Assignee: Qian Zhang > Labels: mesosphere > > A {{FrameworkInfo}} can only have one of role or roles. A natural location > for this appears to be under {{validation::operation::validate}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3943) Support dynamic weight in allocator
[ https://issues.apache.org/jira/browse/MESOS-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3943: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > Support dynamic weight in allocator > --- > > Key: MESOS-3943 > URL: https://issues.apache.org/jira/browse/MESOS-3943 > Project: Mesos > Issue Type: Task >Reporter: James Wang >Assignee: Yongqiao Wang > > This JIRA will focus on update the allocator API to support weight update of > a role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3193) Implement AppC image discovery.
[ https://issues.apache.org/jira/browse/MESOS-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3193: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Implement AppC image discovery. > --- > > Key: MESOS-3193 > URL: https://issues.apache.org/jira/browse/MESOS-3193 > Project: Mesos > Issue Type: Task >Reporter: Yan Xu >Assignee: Jojy Varghese > Labels: mesosphere, twitter, unified-containerizer-mvp > > Appc spec specifies two image discovery mechanisms: simple and meta > discovery. We need to have an abstraction for image discovery in AppcStore. > For MVP, we can implement the simple discovery first. > https://reviews.apache.org/r/34139/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3871) Document libprocess message delivery semantics
[ https://issues.apache.org/jira/browse/MESOS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3871: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > Document libprocess message delivery semantics > -- > > Key: MESOS-3871 > URL: https://issues.apache.org/jira/browse/MESOS-3871 > Project: Mesos > Issue Type: Documentation > Components: documentation, libprocess >Reporter: Neil Conway >Assignee: Benjamin Hindman >Priority: Minor > Labels: mesosphere > > What are the semantics of {{send()}} in libprocess? Specifically, does > libprocess guarantee that messages will not be dropped, reordered, or > duplicated? These are important properties to understand when building > software on top of libprocess. > Clearly message drops are allowed. Message reordering _appears_ to be > allowed, although it should only happen in corner cases (see MESOS-3870). > Duplicate message delivery probably can't happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3273) EventCall Test Framework is flaky
[ https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3273: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > EventCall Test Framework is flaky > - > > Key: MESOS-3273 > URL: https://issues.apache.org/jira/browse/MESOS-3273 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 0.24.0 > Environment: > https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull >Reporter: Vinod Kone >Assignee: Vinod Kone > Labels: flaky-test, mesosphere, tech-debt > Attachments: asan.log > > > Observed this on ASF CI. h/t [~haosd...@gmail.com] > Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master. > {code} > [ RUN ] ExamplesTest.EventCallFramework > Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx' > I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the > driver is aborted! > Shutting down > Sending SIGTERM to process tree at pid 26061 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26062 > Shutting down > Killing the following process trees: > [ > ] > Sending SIGTERM to process tree at pid 26063 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26098 > Killing the following process trees: > [ > ] > Shutting down > Sending SIGTERM to process tree at pid 26099 > Killing the following process trees: > [ > ] > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on > 172.17.2.10:60249 for 16 cpus > I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR > I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0 > I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms > I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms > I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns > I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in > 8429ns > I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the > db in 4219ns > I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery > I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status > I0813 19:55:17.181970 26126 master.cpp:378] Master > 20150813-195517-167907756-60249-26100 (297daca2d01a) started on > 172.17.2.10:60249 > I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: > --acls="permissive: false > register_frameworks { > principals { > type: SOME > values: "test-principal" > } > roles { > type: SOME > values: "*" > } > } > run_tasks { > principals { > type: SOME > values: "test-principal" > } > users { > type: SOME > values: "mesos" > } > } > " --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="false" --authenticate_slaves="false" > --authenticators="crammd5" > --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" > --framework_sorter="drf" --help="false" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_slave_ping_timeouts="5" --quiet="false" > --recovery_slave_removal_limit="100%" --registry="replicated_log" > --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" > --registry_strict="false" --root_submissions="true" > --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" > --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" > --zk_session_timeout="10secs" > I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated > frameworks to register > I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated > slaves to register > I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for > authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' > W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials > file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. > It is recommended that your credentials file is NOT accessible by others. > I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received > a broadcasted recover request > I0813 19:55:17.184306 26126 master.cpp:469] Using default 'crammd5' > authenticator > I0813 19:55:17.184661 26126
[jira] [Updated] (MESOS-4367) Add tracking of the role a Resource was offered for
[ https://issues.apache.org/jira/browse/MESOS-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4367: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Add tracking of the role a Resource was offered for > --- > > Key: MESOS-4367 > URL: https://issues.apache.org/jira/browse/MESOS-4367 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > Labels: mesosphere > > If a framework can have multiple roles, we need a way to identify for which > of the framework's role a resource was offered for (e.g., for resource > recovery and reconciliation). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"
[ https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4421: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Document that /reserve, /create-volumes endpoints can return misleading > "success" > - > > Key: MESOS-4421 > URL: https://issues.apache.org/jira/browse/MESOS-4421 > Project: Mesos > Issue Type: Task > Components: documentation, master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: documentation, endpoint, mesosphere, persistent-volumes, > reservations > > The docs for the {{/reserve}} endpoint say: > {noformat} > 200 OK: Success (the requested resources have been reserved). > {noformat} > This is not true: the master returns {{200}} when the request has been > validated and a {{CheckpointResourcesMessage}} has been sent to the agent, > but the master does not attempt to verify that the message has been received > or that the agent successfully checkpointed. Same behavior applies to > {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}. > We should _either_: > 1. Accurately document what {{200}} return code means. > 2. Change the implementation to wait for the agent's next checkpoint to > succeed (and to include the effect of the operation) before returning success > to the HTTP client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3568) The State (/state) endpoint should be documented
[ https://issues.apache.org/jira/browse/MESOS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3568: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > The State (/state) endpoint should be documented > > > Key: MESOS-3568 > URL: https://issues.apache.org/jira/browse/MESOS-3568 > Project: Mesos > Issue Type: Documentation > Components: documentation, master >Reporter: James Fisher >Assignee: Kevin Klues > Labels: documentation, mesosphere, newbie, tech-debt > > Our tests are using a resource `/state.json` hosted by the Mesos master. I > have searched for the documentation for this resource but have been unable to > find anything. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4366) Migrate all existing uses of FrameworkInfo.role to FrameworkInfo.roles
[ https://issues.apache.org/jira/browse/MESOS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4366: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Migrate all existing uses of FrameworkInfo.role to FrameworkInfo.roles > -- > > Key: MESOS-4366 > URL: https://issues.apache.org/jira/browse/MESOS-4366 > Project: Mesos > Issue Type: Improvement > Components: framework, master, slave >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3854) Finalize design for generalized Authorizer interface
[ https://issues.apache.org/jira/browse/MESOS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3854: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Finalize design for generalized Authorizer interface > > > Key: MESOS-3854 > URL: https://issues.apache.org/jira/browse/MESOS-3854 > Project: Mesos > Issue Type: Task > Components: security >Reporter: Bernd Mathiske >Assignee: Alexander Rojas > Labels: authorization, mesosphere > > Finalize the structure the interface and achieve consensus on the design doc > proposed in MESOS-2949. > https://docs.google.com/document/d/1-XARWJFUq0r_TgRHz_472NvLZNjbqE4G8c2JL44OSMQ/edit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3763) Need for http::put request method
[ https://issues.apache.org/jira/browse/MESOS-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3763: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Need for http::put request method > - > > Key: MESOS-3763 > URL: https://issues.apache.org/jira/browse/MESOS-3763 > Project: Mesos > Issue Type: Task >Reporter: Joerg Schad >Assignee: Yongqiao Wang >Priority: Minor > Labels: mesosphere > > As we decided to create a more restful api for managing Quota request. > Therefore we also want to use the HTTP put request and hence need to enable > the libprocess/http to send put request besides get and post requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess
[ https://issues.apache.org/jira/browse/MESOS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3570: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess > > > Key: MESOS-3570 > URL: https://issues.apache.org/jira/browse/MESOS-3570 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Vinod Kone > Labels: mesosphere, newbie > > Currently, the scheduler library sends calls in order by chaining them and > sending them only when it has received a response for the earlier call. This > was done because there was no HTTP Pipelining abstraction in Libprocess > {{process::post}}. > However once {{MESOS-3332}} is resolved, we should be now able to use the new > abstraction. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2179) ExamplesTest.NoExecutorFramework terminates with segmentation fault
[ https://issues.apache.org/jira/browse/MESOS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2179: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > ExamplesTest.NoExecutorFramework terminates with segmentation fault > --- > > Key: MESOS-2179 > URL: https://issues.apache.org/jira/browse/MESOS-2179 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.22.0 > Environment: Centos7 inside Docker > Mesos master commit: 49d4553a0645624179f17ed6da8d2443e88998bf >Reporter: Cody Maloney >Assignee: Joerg Schad >Priority: Minor > Labels: flaky, mesosphere > > {code} > [ RUN ] ExamplesTest.NoExecutorFramework > ../../src/tests/script.cpp:83: Failure > Failed > no_executor_framework_test.sh terminated with signal Segmentation fault > [ FAILED ] ExamplesTest.NoExecutorFramework (2543 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4390) Shared Volumes Design Doc
[ https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4390: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Shared Volumes Design Doc > - > > Key: MESOS-4390 > URL: https://issues.apache.org/jira/browse/MESOS-4390 > Project: Mesos > Issue Type: Task >Reporter: Adam B >Assignee: Anindya Sinha > Labels: mesosphere > > Review & Approve design doc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4499) Docker provisioner store should reuse existing layers in the cache.
[ https://issues.apache.org/jira/browse/MESOS-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4499: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Docker provisioner store should reuse existing layers in the cache. > --- > > Key: MESOS-4499 > URL: https://issues.apache.org/jira/browse/MESOS-4499 > Project: Mesos > Issue Type: Bug >Reporter: Jie Yu >Assignee: Jie Yu > Labels: mesosphere > > Currently, the docker provisioner store will download all the layers > associated with an image if the image is not found locally, even though some > layers of it might already exist in the cache. > This is problematic because anytime a user deploys a new image, Mesos will > fetch all layers of that new image, even though most of the layers are > already cached locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4004) Support default entrypoint and command runtime config in Mesos containerizer
[ https://issues.apache.org/jira/browse/MESOS-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4004: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Support default entrypoint and command runtime config in Mesos containerizer > > > Key: MESOS-4004 > URL: https://issues.apache.org/jira/browse/MESOS-4004 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Timothy Chen >Assignee: Gilbert Song > Labels: mesosphere, unified-containerizer-mvp > > We need to use the entrypoint and command runtime configuration returned from > image to be used in Mesos containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4439) Fix appc CachedImage image validation
[ https://issues.apache.org/jira/browse/MESOS-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4439: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Fix appc CachedImage image validation > - > > Key: MESOS-4439 > URL: https://issues.apache.org/jira/browse/MESOS-4439 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Jojy Varghese >Assignee: Jojy Varghese > Labels: mesosphere, unified-containerizer-mvp > > Currently image validation is done assuming that the image's filename will > have digest (SHA-512) information. This is not part of the spec > (https://github.com/appc/spec/blob/master/spec/discovery.md). > > The spec specifies the tuple as unique identifier > for discovering an image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4233) Logging is too verbose for sysadmins / syslog
[ https://issues.apache.org/jira/browse/MESOS-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4233: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > Logging is too verbose for sysadmins / syslog > - > > Key: MESOS-4233 > URL: https://issues.apache.org/jira/browse/MESOS-4233 > Project: Mesos > Issue Type: Epic >Reporter: Cody Maloney >Assignee: Kapil Arya > Labels: mesosphere > Attachments: giant_port_range_logging > > > Currently mesos logs a lot. When launching a thousand tasks in the space of > 10 seconds it will print tens of thousands of log lines, overwhelming syslog > (there is a max rate at which a process can send stuff over a unix socket) > and not giving useful information to a sysadmin who cares about just the > high-level activity and when something goes wrong. > Note mesos also blocks writing to its log locations, so when writing a lot of > log messages, it can fill up the write buffer in the kernel, and be suspended > until the syslog agent catches up reading from the socket (GLOG does a > blocking fwrite to stderr). GLOG also has a big mutex around logging so only > one thing logs at a time. > While for "internal debugging" it is useful to see things like "message went > from internal compoent x to internal component y", from a sysadmin > perspective I only care about the high level actions taken (launched task for > framework x), sent offer to framework y, got task failed from host z. Note > those are what I'd expect at the "INFO" level. At the "WARNING" level I'd > expect very little to be logged / almost nothing in normal operation. Just > things like "WARN: Repliacted log write took longer than expected". WARN > would also get things like backtraces on crashes and abnormal exits / abort. > When trying to launch 3k+ tasks inside a second, mesos logging currently > overwhelms syslog with 100k+ messages, many of which are thousands of bytes. > Sysadmins expect to be able to use syslog to monitor basic events in their > system. This is too much. > We can keep logging the messages to files, but the logging to stderr needs to > be reduced significantly (stderr gets picked up and forwarded to syslog / > central aggregation). > What I would like is if I can set the stderr logging level to be different / > independent from the file logging level (Syslog giving the "sysadmin" > aggregated overview, files useful for debugging in depth what happened in a > cluster). A lot of what mesos currently logs at info is really debugging info > / should show up as debug log level. > Some samples of mesos logging a lot more than a sysadmin would want / expect > are attached, and some are below: > - Every task gets printed multiple times for a basic launch: > {noformat} > Dec 15 22:58:30 ip-10-0-7-60.us-west-2.compute.internal mesos-master[1311]: > I1215 22:58:29.382644 1315 master.cpp:3248] Launching task > envy.5b19a713-a37f-11e5-8b3e-0251692d6109 of framework > 5178f46d-71d6-422f-922c-5bbe82dff9cc- (marathon) > Dec 15 22:58:30 ip-10-0-7-60.us-west-2.compute.internal mesos-master[1311]: > I1215 22:58:29.382925 1315 master.hpp:176] Adding task > envy.5b1958f2-a37f-11e5-8b3e-0251692d6109 with resources cpus(*):0.0001; > mem(*):16; ports(*):[14047-14047] > {noformat} > - Every task status update prints many log lines, successful ones are part > of normal operation and maybe should be logged at info / debug levels, but > not to a sysadmin (Just show when things fail, and maybe aggregate counters > to tell of the volume of working) > - No log messagse should be really big / more than 1k characters (Would > prevent the giant port list attached, make that easily discoverable / bug > filable / fixable) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4344) Allow operators to assign net_cls major handles to mesos agents
[ https://issues.apache.org/jira/browse/MESOS-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4344: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Allow operators to assign net_cls major handles to mesos agents > --- > > Key: MESOS-4344 > URL: https://issues.apache.org/jira/browse/MESOS-4344 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: container, mesosphere > > The net_cls cgroup associates a 16-bit major and 16-bit minor network handle > to packets originating from tasks associated with a specific net_cls cgroup. > In mesos we need to give the operator the ability to fix the 16-bit major > handle used in an agent (the minor handle will be allocated by the agent. See > MESOS-4345). Fixing the parent handle on the agent allows operators to > install default firewall rules using the parent handle to enforce a default > policy (say DENY ALL) for all container traffic till the container is > allocated a minor handle. > A simple way to achieve this requirement is to pass the major handle as a > flag to the agent at startup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6
[ https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4053: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > MemoryPressureMesosTest tests fail on CentOS 6.6 > > > Key: MESOS-4053 > URL: https://issues.apache.org/jira/browse/MESOS-4053 > Project: Mesos > Issue Type: Bug > Environment: CentOS 6.6 >Reporter: Greg Mann >Assignee: Benjamin Hindman > Labels: mesosphere, test-failure > > {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and > {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It > seems that mounted cgroups are not properly cleaned up after previous tests, > so multiple hierarchies are detected and thus an error is produced: > {code} > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics > ../../src/tests/mesos.cpp:849: Failure > Value of: _baseHierarchy.get() > Actual: "/cgroup" > Expected: baseHierarchy > Which is: "/tmp/mesos_test_cgroup" > - > Multiple cgroups base hierarchies detected: > '/tmp/mesos_test_cgroup' > '/cgroup' > Mesos does not support multiple cgroups base hierarchies. > Please unmount the corresponding (or all) subsystems. > - > ../../src/tests/mesos.cpp:932: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms) > [ RUN ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery > ../../src/tests/mesos.cpp:849: Failure > Value of: _baseHierarchy.get() > Actual: "/cgroup" > Expected: baseHierarchy > Which is: "/tmp/mesos_test_cgroup" > - > Multiple cgroups base hierarchies detected: > '/tmp/mesos_test_cgroup' > '/cgroup' > Mesos does not support multiple cgroups base hierarchies. > Please unmount the corresponding (or all) subsystems. > - > ../../src/tests/mesos.cpp:932: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy > [ FAILED ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-191) Add support for multiple disk resources
[ https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-191: Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Add support for multiple disk resources > --- > > Key: MESOS-191 > URL: https://issues.apache.org/jira/browse/MESOS-191 > Project: Mesos > Issue Type: Epic >Reporter: Vinod Kone >Assignee: Joris Van Remoortere > Labels: mesosphere, persistent-volumes > > It would be nice to schedule mesos tasks with fine-grained disk scheduling. > The idea is, a slave with multiple spindles, would specify spindle specific > config. Mesos would then include this info in its resource offers to > frameworks. > Official Design Doc: > https://docs.google.com/document/d/1syPxygVNEHjG6FoyqslnpUGgNpYKU9QzKBuV2yKmjfQ/edit#heading=h.4fzj9sl24cwy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4005) Support workdir runtime configuration from image
[ https://issues.apache.org/jira/browse/MESOS-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4005: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Support workdir runtime configuration from image > - > > Key: MESOS-4005 > URL: https://issues.apache.org/jira/browse/MESOS-4005 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Timothy Chen >Assignee: Gilbert Song > Labels: mesosphere, unified-containerizer-mvp > > We need to support workdir runtime configuration returned from image such as > Dockerfile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4564) Separate Appc protobuf messages to its own file.
[ https://issues.apache.org/jira/browse/MESOS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4564: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Separate Appc protobuf messages to its own file. > > > Key: MESOS-4564 > URL: https://issues.apache.org/jira/browse/MESOS-4564 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Jojy Varghese >Assignee: Jojy Varghese > Labels: mesosphere, unified-containerizer-mvp > > It would be cleaner to keep the Appc protobuf messages separate from other > mesos messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4383) Support docker runtime configuration env var from image.
[ https://issues.apache.org/jira/browse/MESOS-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4383: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Support docker runtime configuration env var from image. > > > Key: MESOS-4383 > URL: https://issues.apache.org/jira/browse/MESOS-4383 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere, unified-containerizer-mvp > > We need to support env var configuration returned from docker image in mesos > containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4200) Test case(s) for weights + allocation behavior
[ https://issues.apache.org/jira/browse/MESOS-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4200: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Test case(s) for weights + allocation behavior > -- > > Key: MESOS-4200 > URL: https://issues.apache.org/jira/browse/MESOS-4200 > Project: Mesos > Issue Type: Task > Components: allocation, test >Reporter: Neil Conway >Assignee: Yongqiao Wang > Labels: mesosphere, test, weight > > As far as I can see, we currently have NO test cases for behavior when > weights are defined. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4285) Mesos command task doesn't support volumes with image
[ https://issues.apache.org/jira/browse/MESOS-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4285: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > Mesos command task doesn't support volumes with image > - > > Key: MESOS-4285 > URL: https://issues.apache.org/jira/browse/MESOS-4285 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Timothy Chen >Assignee: Timothy Chen > Labels: mesosphere, unified-containerizer-mvp > > Currently volumes are stripped when an image is specified running a command > task with Mesos containerizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4365) Add internal migration from role to roles to master
[ https://issues.apache.org/jira/browse/MESOS-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4365: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Add internal migration from role to roles to master > --- > > Key: MESOS-4365 > URL: https://issues.apache.org/jira/browse/MESOS-4365 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Benjamin Bannier >Assignee: Benjamin Bannier > Labels: mesosphere > > If only the {{role}} field is given, add it as single entry to {{roles}}. Add > a note to {{CHANGELOG}}/release notes on deprecation of the existing {{role}} > field. File a JIRA issue for removal of that migration code once the > deprecation cycle is over. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4479) Implement reservation labels
[ https://issues.apache.org/jira/browse/MESOS-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4479: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Implement reservation labels > > > Key: MESOS-4479 > URL: https://issues.apache.org/jira/browse/MESOS-4479 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Neil Conway >Assignee: Neil Conway > Labels: labels, mesosphere, reservations > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4377) Document units associated with resource types
[ https://issues.apache.org/jira/browse/MESOS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4377: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Document units associated with resource types > - > > Key: MESOS-4377 > URL: https://issues.apache.org/jira/browse/MESOS-4377 > Project: Mesos > Issue Type: Documentation > Components: documentation >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: documentation, mesosphere > > We should document the units associated with memory and disk resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4363) Add a roles field to FrameworkInfo
[ https://issues.apache.org/jira/browse/MESOS-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4363: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Add a roles field to FrameworkInfo > -- > > Key: MESOS-4363 > URL: https://issues.apache.org/jira/browse/MESOS-4363 > Project: Mesos > Issue Type: Improvement > Components: framework, master >Reporter: Benjamin Bannier >Assignee: Qian Zhang > Labels: mesosphere > > To represent multiple roles per framework a new repeated string field for > roles is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation
[ https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4368: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Make HierarchicalAllocatorProcess set a Resource's active role during > allocation > > > Key: MESOS-4368 > URL: https://issues.apache.org/jira/browse/MESOS-4368 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Benjamin Bannier >Assignee: Jan Schlicht > Labels: mesosphere > > The concrete implementation here depends on the implementation strategy used > to solve MESOS-4367. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4214) Introduce HTTP endpoint /weights for updating weight
[ https://issues.apache.org/jira/browse/MESOS-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4214: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > Introduce HTTP endpoint /weights for updating weight > > > Key: MESOS-4214 > URL: https://issues.apache.org/jira/browse/MESOS-4214 > Project: Mesos > Issue Type: Task >Reporter: Yongqiao Wang >Assignee: Yongqiao Wang > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4291) fs::enter(rootfs) does not work if 'rootfs' is read only.
[ https://issues.apache.org/jira/browse/MESOS-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4291: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > fs::enter(rootfs) does not work if 'rootfs' is read only. > - > > Key: MESOS-4291 > URL: https://issues.apache.org/jira/browse/MESOS-4291 > Project: Mesos > Issue Type: Bug >Reporter: Jie Yu >Assignee: Jie Yu > Labels: mesosphere, unified-containerizer-mvp > > I noticed this when I was testing the unified containerizer with the bind > mount backend and no volumes. > The current implementation of fs::enter will put the old root under > /tmp/._old_root_.XX in the new rootfs. It assumes that /tmp is writable > in the new rootfs, but this might not be true, especially if the bind mount > backend is used. > To solve the problem, what we can do is to mount tmpfs to /tmp in the new > rootfs and umount it after pivot_root. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4333) Refactor Appc provisioner tests
[ https://issues.apache.org/jira/browse/MESOS-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4333: - Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 26, Mesosphere Sprint 27) > Refactor Appc provisioner tests > - > > Key: MESOS-4333 > URL: https://issues.apache.org/jira/browse/MESOS-4333 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Jojy Varghese >Assignee: Jojy Varghese > Labels: mesosphere > > Current tests can be refactored so that we can reuse some common tasks like > test image creation. This will benefit future tests like appc image puller > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4345) Implement a network-handle manager for net_cls cgroup subsystem
[ https://issues.apache.org/jira/browse/MESOS-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4345: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Implement a network-handle manager for net_cls cgroup subsystem > --- > > Key: MESOS-4345 > URL: https://issues.apache.org/jira/browse/MESOS-4345 > Project: Mesos > Issue Type: Task > Components: containerization >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: containerizer, containers, mesosphere > > As part of implementing the net_cls cgroup isolator we need a mechanism to > manage the minor handles that will be allocated to containers when they are > associated with a net_cls cgroup. The network-handle manager needs to provide > the following functionality: > a) During normal operation keep track of the free and allocated network > handles. There can be a total of 64K such network handles. > b) On startup, learn the allocated network handle by walking the net_cls > cgroup tree for mesos and build a map of free network handles available to > the agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4261) Remove docker auth server flag
[ https://issues.apache.org/jira/browse/MESOS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4261: - Sprint: Mesosphere Sprint 25, Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 25, Mesosphere Sprint 26, Mesosphere Sprint 27) > Remove docker auth server flag > -- > > Key: MESOS-4261 > URL: https://issues.apache.org/jira/browse/MESOS-4261 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Timothy Chen >Assignee: Jie Yu > Labels: mesosphere, unified-containerizer-mvp > > We currently use a configured docker auth server from a slave flag to get > token auth for docker registry. However this doesn't work for private > registries as docker registry supports sending down the correct auth server > to contact. > We should remove docker auth server flag completely and ask the docker > registry for auth server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4552) There is currently no way to get at the strings in the global process::help object programmatically.
[ https://issues.apache.org/jira/browse/MESOS-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4552: - Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > There is currently no way to get at the strings in the global process::help > object programmatically. > > > Key: MESOS-4552 > URL: https://issues.apache.org/jira/browse/MESOS-4552 > Project: Mesos > Issue Type: Improvement >Reporter: Kevin Klues >Assignee: Kevin Klues >Priority: Minor > Labels: mesosphere > > There is currently no way to extract the help strings from the help process > once they have been installed into it. The only way to get at them is to > visit the http endpoint they are associated with and pull it from there. > Moreover, there is no way to uninstall a string from this process if a route > is ever taken offline. We need support for programmatically getting/removing > strings from the help process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail
[ https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2017: - Sprint: Twitter Mesos Q4 Sprint 3, Mesosphere Sprint 27, Mesosphere Sprint 28 (was: Twitter Mesos Q4 Sprint 3, Mesosphere Sprint 27) > Segfault with "Pure virtual method called" when tests fail > -- > > Key: MESOS-2017 > URL: https://issues.apache.org/jira/browse/MESOS-2017 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 0.21.0 >Reporter: Yan Xu >Assignee: Kevin Klues > Labels: mesosphere, tests > > The most recent one: > {noformat:title=DRFAllocatorTest.DRFAllocatorProcess} > [ RUN ] DRFAllocatorTest.DRFAllocatorProcess > Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j' > I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms > I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms > I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns > I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in > 2018ns > I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the > db in 335ns > I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery > I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status > I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received > a broadcasted recover request > I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from > a replica in EMPTY status > I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to > STARTING > I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 591981ns > I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to > STARTING > I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status > I1030 05:55:06.940820 24489 master.cpp:312] Master > 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on > 67.195.81.187:40429 > I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing > authenticated frameworks to register > I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing > authenticated slaves to register > I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for > authentication from > '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials' > I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled > I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising > offers for all slaves > I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status > received a broadcasted recover request > I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] > Initializing hierarchical allocator process with master : > master@67.195.81.187:40429 > I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from > a replica in STARTING status > I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is > master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459 > I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master! > I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar > I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar > I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING > I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 536365ns > I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to > VOTING > I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos > group > I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated > I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer > I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit > promise request with proposal 1 > I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to > leveldb took 806463ns > I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1 > I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to > fill missing position > I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit > promise request for position 0 with proposal 2 > I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to > leveldb took 603843ns > I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0 > I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request > for position 0 > I1030
[jira] [Updated] (MESOS-4066) Expose when agent is recovering in the agent's /state.json endpoint.
[ https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4066: - Sprint: Mesosphere Sprint 28 Story Points: 3 > Expose when agent is recovering in the agent's /state.json endpoint. > > > Key: MESOS-4066 > URL: https://issues.apache.org/jira/browse/MESOS-4066 > Project: Mesos > Issue Type: Task > Components: slave >Reporter: Benjamin Mahler >Assignee: Vinod Kone > Labels: mesosphere > > Currently when a user is hitting /state.json on the agent, it may return > partial state if the agent has failed over and is recovering. There is > currently no clear way to tell if this is the case when looking at a > response, so the user may incorrectly interpret the agent as being empty of > tasks. > We could consider exposing the 'state' enum of the agent in the endpoint: > {code} > enum State > { > RECOVERING, // Slave is doing recovery. > DISCONNECTED, // Slave is not connected to the master. > RUNNING, // Slave has (re-)registered. > TERMINATING, // Slave is shutting down. > } state; > {code} > This may be a bit tricky to maintain as far as backwards-compatibility of the > endpoint, if we were to alter this enum. > Exposing this would allow users to be more informed about the state of the > agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2971) Implement OverlayFS based provisioner backend
[ https://issues.apache.org/jira/browse/MESOS-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-2971: - Sprint: (was: Mesosphere Sprint 27) > Implement OverlayFS based provisioner backend > - > > Key: MESOS-2971 > URL: https://issues.apache.org/jira/browse/MESOS-2971 > Project: Mesos > Issue Type: Improvement >Reporter: Timothy Chen >Assignee: Mei Wan > Labels: mesosphere, twitter, unified-containerizer-mvp > > Part of the image provisioning process is to call a backend to create a root > filesystem based on the image on disk layout. > The problem with the copy backend is that it's both waste of IO and space, > and bind only can deal with one layer. > Overlayfs backend allows us to utilize the filesystem to merge multiple > filesystems into one efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4544) Propose design doc for agent partitioning behavior
[ https://issues.apache.org/jira/browse/MESOS-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4544: --- Shepherd: Vinod Kone > Propose design doc for agent partitioning behavior > -- > > Key: MESOS-4544 > URL: https://issues.apache.org/jira/browse/MESOS-4544 > Project: Mesos > Issue Type: Task > Components: general >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1471) Document replicated log design/internals
[ https://issues.apache.org/jira/browse/MESOS-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-1471: - Sprint: Q3 Sprint 1 (was: Q3 Sprint 1, Mesosphere Sprint 28) > Document replicated log design/internals > > > Key: MESOS-1471 > URL: https://issues.apache.org/jira/browse/MESOS-1471 > Project: Mesos > Issue Type: Documentation > Components: documentation, replicated log >Reporter: Benjamin Mahler >Assignee: Neil Conway > Labels: documentation, mesosphere > > The replicated log could benefit from some documentation. In particular, how > does it work? What do operators need to know? Possibly there is some overlap > with our future maintenance documentation in MESOS-1470. > I believe [~jieyu] has some unpublished work that could be leveraged here! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3413) Docker containerizer does not symlink persistent volumes into sandbox
[ https://issues.apache.org/jira/browse/MESOS-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-3413: - Shepherd: Jie Yu > Docker containerizer does not symlink persistent volumes into sandbox > - > > Key: MESOS-3413 > URL: https://issues.apache.org/jira/browse/MESOS-3413 > Project: Mesos > Issue Type: Bug > Components: containerization, docker, slave >Affects Versions: 0.23.0 >Reporter: Max Neunhöffer >Assignee: Timothy Chen > Labels: docker, mesosphere, persistent-volumes > Original Estimate: 1h > Remaining Estimate: 1h > > For the ArangoDB framework I am trying to use the persistent primitives. > nearly all is working, but I am missing a crucial piece at the end: I have > successfully created a persistent disk resource and have set the persistence > and volume information in the DiskInfo message. However, I do not see any way > to find out what directory on the host the mesos slave has reserved for us. I > know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we > have no way to query this information anywhere. The docker containerizer does > not automatically mount this directory into our docker container, or symlinks > it into our sandbox. Therefore, I have essentially no access to it. Note that > the mesos containerizer (which I cannot use for other reasons) seems to > create a symlink in the sandbox to the actual path for the persistent volume. > With that, I could mount the volume into our docker container and all would > be well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4570) DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.
[ https://issues.apache.org/jira/browse/MESOS-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4570: - Sprint: Mesosphere Sprint 28 > DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky. > - > > Key: MESOS-4570 > URL: https://issues.apache.org/jira/browse/MESOS-4570 > Project: Mesos > Issue Type: Bug > Environment: Debian 8 >Reporter: Till Toenshoff >Assignee: Gilbert Song > Labels: flaky-test > > {noformat} > ../configure --enable-ssl --enable-libevent && make check > {noformat} > {noformat} > --gtest_repeat=-1 --gtest_break_on_failure > --gtest_filter=DockerFetcherPluginTest.INTERNET_CURL_FetchImage > {noformat} > Failed at the 22nd run. > {noformat} > [ RUN ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage > ../../src/tests/uri_fetcher_tests.cpp:276: Failure > Failed to wait 15secs for fetcher.get()->fetch(uri, dir) > *** Aborted at 1454207653 (unix time) try "date -d @1454207653" if you are > using GNU date *** > PC: @ 0x167023a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 19868 (TID 0x7f500fc877c0) from PID 0; > stack trace: *** > @ 0x7f5008f368d0 (unknown) > @ 0x167023a testing::UnitTest::AddTestPartResult() > @ 0x1664c73 testing::internal::AssertHelper::operator=() > @ 0x146ac6f > mesos::internal::tests::DockerFetcherPluginTest_INTERNET_CURL_FetchImage_Test::TestBody() > @ 0x168dc70 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x1688cc8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x166a013 testing::Test::Run() > @ 0x166a7a1 testing::TestInfo::Run() > @ 0x166addc testing::TestCase::Run() > @ 0x167172b testing::internal::UnitTestImpl::RunAllTests() > @ 0x168e8ff > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x168981e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x167045b testing::UnitTest::Run() > @ 0xe2d476 RUN_ALL_TESTS() > @ 0xe2d08c main > @ 0x7f5008b9fb45 (unknown) > @ 0x9c6bf9 (unknown) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4570) DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.
[ https://issues.apache.org/jira/browse/MESOS-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4570: - Shepherd: Jie Yu Assignee: Gilbert Song Story Points: 1 > DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky. > - > > Key: MESOS-4570 > URL: https://issues.apache.org/jira/browse/MESOS-4570 > Project: Mesos > Issue Type: Bug > Environment: Debian 8 >Reporter: Till Toenshoff >Assignee: Gilbert Song > Labels: flaky-test > > {noformat} > ../configure --enable-ssl --enable-libevent && make check > {noformat} > {noformat} > --gtest_repeat=-1 --gtest_break_on_failure > --gtest_filter=DockerFetcherPluginTest.INTERNET_CURL_FetchImage > {noformat} > Failed at the 22nd run. > {noformat} > [ RUN ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage > ../../src/tests/uri_fetcher_tests.cpp:276: Failure > Failed to wait 15secs for fetcher.get()->fetch(uri, dir) > *** Aborted at 1454207653 (unix time) try "date -d @1454207653" if you are > using GNU date *** > PC: @ 0x167023a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 19868 (TID 0x7f500fc877c0) from PID 0; > stack trace: *** > @ 0x7f5008f368d0 (unknown) > @ 0x167023a testing::UnitTest::AddTestPartResult() > @ 0x1664c73 testing::internal::AssertHelper::operator=() > @ 0x146ac6f > mesos::internal::tests::DockerFetcherPluginTest_INTERNET_CURL_FetchImage_Test::TestBody() > @ 0x168dc70 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x1688cc8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x166a013 testing::Test::Run() > @ 0x166a7a1 testing::TestInfo::Run() > @ 0x166addc testing::TestCase::Run() > @ 0x167172b testing::internal::UnitTestImpl::RunAllTests() > @ 0x168e8ff > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x168981e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x167045b testing::UnitTest::Run() > @ 0xe2d476 RUN_ALL_TESTS() > @ 0xe2d08c main > @ 0x7f5008b9fb45 (unknown) > @ 0x9c6bf9 (unknown) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4575) Fix Appc image caching to share with image fetcher
Jojy Varghese created MESOS-4575: Summary: Fix Appc image caching to share with image fetcher Key: MESOS-4575 URL: https://issues.apache.org/jira/browse/MESOS-4575 Project: Mesos Issue Type: Improvement Reporter: Jojy Varghese Assignee: Jojy Varghese As Appc image fetcher is being developed, Image cache needs to be shared between store and the image fetcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4066) Expose when agent is recovering in the agent's /state.json endpoint.
[ https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126896#comment-15126896 ] Joerg Schad commented on MESOS-4066: Short question hasn't state.json deprecated for /state (https://github.com/apache/mesos/blob/master/docs/upgrades.md#upgrading-from-024x-to-025x)? > Expose when agent is recovering in the agent's /state.json endpoint. > > > Key: MESOS-4066 > URL: https://issues.apache.org/jira/browse/MESOS-4066 > Project: Mesos > Issue Type: Task > Components: slave >Reporter: Benjamin Mahler >Assignee: Vinod Kone > Labels: mesosphere > > Currently when a user is hitting /state.json on the agent, it may return > partial state if the agent has failed over and is recovering. There is > currently no clear way to tell if this is the case when looking at a > response, so the user may incorrectly interpret the agent as being empty of > tasks. > We could consider exposing the 'state' enum of the agent in the endpoint: > {code} > enum State > { > RECOVERING, // Slave is doing recovery. > DISCONNECTED, // Slave is not connected to the master. > RUNNING, // Slave has (re-)registered. > TERMINATING, // Slave is shutting down. > } state; > {code} > This may be a bit tricky to maintain as far as backwards-compatibility of the > endpoint, if we were to alter this enum. > Exposing this would allow users to be more informed about the state of the > agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4066) Expose when agent is recovering in the agent's /state.json endpoint.
[ https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4066: - Shepherd: Benjamin Mahler Assignee: Vinod Kone > Expose when agent is recovering in the agent's /state.json endpoint. > > > Key: MESOS-4066 > URL: https://issues.apache.org/jira/browse/MESOS-4066 > Project: Mesos > Issue Type: Task > Components: slave >Reporter: Benjamin Mahler >Assignee: Vinod Kone > Labels: mesosphere > > Currently when a user is hitting /state.json on the agent, it may return > partial state if the agent has failed over and is recovering. There is > currently no clear way to tell if this is the case when looking at a > response, so the user may incorrectly interpret the agent as being empty of > tasks. > We could consider exposing the 'state' enum of the agent in the endpoint: > {code} > enum State > { > RECOVERING, // Slave is doing recovery. > DISCONNECTED, // Slave is not connected to the master. > RUNNING, // Slave has (re-)registered. > TERMINATING, // Slave is shutting down. > } state; > {code} > This may be a bit tricky to maintain as far as backwards-compatibility of the > endpoint, if we were to alter this enum. > Exposing this would allow users to be more informed about the state of the > agent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4554) Investigate test suite crashes after ZK socket disconnections.
[ https://issues.apache.org/jira/browse/MESOS-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-4554: -- Sprint: (was: Mesosphere Sprint 28) > Investigate test suite crashes after ZK socket disconnections. > -- > > Key: MESOS-4554 > URL: https://issues.apache.org/jira/browse/MESOS-4554 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar > Labels: flaky-test, mesosphere > > Showed up on ASF CI: > https://builds.apache.org/job/Mesos/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/1579/console > The test crashed with the following logs: > {code} > [ RUN ] ContentType/ExecutorHttpApiTest.DefaultAccept/1 > I0129 02:00:35.137161 31926 leveldb.cpp:174] Opened db in 118.902333ms > I0129 02:00:35.187021 31926 leveldb.cpp:181] Compacted db in 49.836241ms > I0129 02:00:35.187088 31926 leveldb.cpp:196] Created db iterator in 33825ns > I0129 02:00:35.187109 31926 leveldb.cpp:202] Seeked to beginning of db in > 7965ns > I0129 02:00:35.187121 31926 leveldb.cpp:271] Iterated through 0 keys in the > db in 6350ns > I0129 02:00:35.187165 31926 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0129 02:00:35.188433 31950 recover.cpp:447] Starting replica recovery > I0129 02:00:35.188796 31950 recover.cpp:473] Replica is in EMPTY status > I0129 02:00:35.190021 31949 replica.cpp:673] Replica in EMPTY status received > a broadcasted recover request from (11817)@172.17.0.3:60904 > I0129 02:00:35.190569 31958 recover.cpp:193] Received a recover response from > a replica in EMPTY status > I0129 02:00:35.190994 31959 recover.cpp:564] Updating replica status to > STARTING > I0129 02:00:35.191522 31953 master.cpp:374] Master > 823f2212-bf28-4dd6-959d-796029d32afb (90665f991b70) started on > 172.17.0.3:60904 > I0129 02:00:35.191640 31953 master.cpp:376] Flags at startup: --acls="" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" > --authenticators="crammd5" --authorizers="local" > --credentials="/tmp/B9O6zq/credentials" --framework_sorter="drf" > --help="false" --hostname_lookup="true" --http_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" > --quiet="false" --recovery_slave_removal_limit="100%" > --registry="replicated_log" --registry_fetch_timeout="1mins" > --registry_store_timeout="25secs" --registry_strict="true" > --root_submissions="true" --slave_ping_timeout="15secs" > --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" > --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" > --work_dir="/tmp/B9O6zq/master" --zk_session_timeout="10secs" > I0129 02:00:35.191926 31953 master.cpp:421] Master only allowing > authenticated frameworks to register > I0129 02:00:35.191936 31953 master.cpp:426] Master only allowing > authenticated slaves to register > I0129 02:00:35.191943 31953 credentials.hpp:35] Loading credentials for > authentication from '/tmp/B9O6zq/credentials' > I0129 02:00:35.192229 31953 master.cpp:466] Using default 'crammd5' > authenticator > I0129 02:00:35.192366 31953 master.cpp:535] Using default 'basic' HTTP > authenticator > I0129 02:00:35.192530 31953 master.cpp:569] Authorization enabled > I0129 02:00:35.192719 31950 whitelist_watcher.cpp:77] No whitelist given > I0129 02:00:35.192756 31957 hierarchical.cpp:144] Initialized hierarchical > allocator process > I0129 02:00:35.194291 31955 master.cpp:1710] The newly elected leader is > master@172.17.0.3:60904 with id 823f2212-bf28-4dd6-959d-796029d32afb > I0129 02:00:35.194335 31955 master.cpp:1723] Elected as the leading master! > I0129 02:00:35.194350 31955 master.cpp:1468] Recovering from registrar > I0129 02:00:35.194545 31958 registrar.cpp:307] Recovering registrar > I0129 02:00:35.220226 31948 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 29.150097ms > I0129 02:00:35.220262 31948 replica.cpp:320] Persisted replica status to > STARTING > I0129 02:00:35.220484 31959 recover.cpp:473] Replica is in STARTING status > I0129 02:00:35.221220 31954 replica.cpp:673] Replica in STARTING status > received a broadcasted recover request from (11819)@172.17.0.3:60904 > I0129 02:00:35.221539 31959 recover.cpp:193] Received a recover response from > a replica in STARTING status > I0129 02:00:35.221871 31954 recover.cpp:564] Updating replica status to VOTING > I0129 02:00:35.245329 31949 leveldb.cpp:304] Persisting metadata (8 bytes) to > leveldb took 23.326002ms > I0129 02:0
[jira] [Updated] (MESOS-4545) Propose design doc for reliable floating point behavior
[ https://issues.apache.org/jira/browse/MESOS-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4545: --- Sprint: Mesosphere Sprint 28 (was: Mesosphere Sprint 27) > Propose design doc for reliable floating point behavior > --- > > Key: MESOS-4545 > URL: https://issues.apache.org/jira/browse/MESOS-4545 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Neil Conway > Labels: mesosphere, resources > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4544) Propose design doc for agent partitioning behavior
[ https://issues.apache.org/jira/browse/MESOS-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway updated MESOS-4544: --- Story Points: 8 (was: 9) > Propose design doc for agent partitioning behavior > -- > > Key: MESOS-4544 > URL: https://issues.apache.org/jira/browse/MESOS-4544 > Project: Mesos > Issue Type: Task > Components: general >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4487) Introduce status() interface in `Containerizer`
[ https://issues.apache.org/jira/browse/MESOS-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4487: -- Story Points: 2 (was: 3) > Introduce status() interface in `Containerizer` > --- > > Key: MESOS-4487 > URL: https://issues.apache.org/jira/browse/MESOS-4487 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: containerizer, mesosphere > > In the Containerizer, during container isolation, the isolators end up > modifying the state of the containers. Examples would be IP address > allocation to a container by the 'network isolator, or net_cls handle > allocation by the cgroup/net_cls isolator. > Often times the state of the container, needs to be exposed to operators > through the state.json end-point. For e.g. operators or frameworks might want > to know the IP-address configured on a particular container, or the net_cls > handle associated with a container to configure the right TC rules. However, > at present, there is no clean interface for the slave to retrieve the state > of a container from the Containerizer for any of the launched containers. > Thus, we need to introduce a `status` interface in the `Containerizer` base > class, in order for the slave to expose container state information in its > state.json. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-4544) Propose design doc for agent partitioning behavior
[ https://issues.apache.org/jira/browse/MESOS-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Conway reassigned MESOS-4544: -- Assignee: Neil Conway > Propose design doc for agent partitioning behavior > -- > > Key: MESOS-4544 > URL: https://issues.apache.org/jira/browse/MESOS-4544 > Project: Mesos > Issue Type: Task > Components: general >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4531) Document multi-disk support.
[ https://issues.apache.org/jira/browse/MESOS-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4531: - Sprint: Mesosphere Sprint 28 > Document multi-disk support. > > > Key: MESOS-4531 > URL: https://issues.apache.org/jira/browse/MESOS-4531 > Project: Mesos > Issue Type: Task > Components: documentation >Reporter: Jie Yu >Assignee: Joris Van Remoortere > Labels: documentation, mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4531) Document multi-disk support.
[ https://issues.apache.org/jira/browse/MESOS-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Harutyunyan updated MESOS-4531: - Sprint: (was: Mesosphere Sprint 28) > Document multi-disk support. > > > Key: MESOS-4531 > URL: https://issues.apache.org/jira/browse/MESOS-4531 > Project: Mesos > Issue Type: Task > Components: documentation >Reporter: Jie Yu >Assignee: Joris Van Remoortere > Labels: documentation, mesosphere > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-3003) Support mounting in default configuration files/volumes into every new container
[ https://issues.apache.org/jira/browse/MESOS-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-3003: -- Sprint: (was: Mesosphere Sprint 27) > Support mounting in default configuration files/volumes into every new > container > > > Key: MESOS-3003 > URL: https://issues.apache.org/jira/browse/MESOS-3003 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Timothy Chen > Labels: mesosphere, unified-containerizer-mvp > > Most container images leave out system configuration (e.g: /etc/*) and expect > the container runtimes to mount in specific configurations as needed such as > /etc/resolv.conf from the host into the container when needed. > We need to support mounting in specific configuration files for command > executor to work, and also allow the user to optionally define other > configuration files to mount in as well via flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4490) Get container status information in slave.
[ https://issues.apache.org/jira/browse/MESOS-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4490: -- Story Points: 3 > Get container status information in slave. > --- > > Key: MESOS-4490 > URL: https://issues.apache.org/jira/browse/MESOS-4490 > Project: Mesos > Issue Type: Improvement > Components: containerization >Reporter: Avinash Sridharan >Assignee: Avinash Sridharan > Labels: mesosphere > > As part of MESOS-4487 an interface will be introduce into the `Containerizer` > to allow agents to retrieve container state information. The agent needs to > use this interface to retrieve container state information during status > updates from the executor. The container state information can be then use by > the agent to expose various isolator specific configuration (for e.g., IP > address allocated by network isolators, net_cls handles allocated by > `cgroups/net_cls` isolator), that has been applied to the container, in the > state.json endpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-4517) Introduce docker runtime isolator.
[ https://issues.apache.org/jira/browse/MESOS-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-4517: -- Story Points: 3 > Introduce docker runtime isolator. > -- > > Key: MESOS-4517 > URL: https://issues.apache.org/jira/browse/MESOS-4517 > Project: Mesos > Issue Type: Bug > Components: isolation >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: mesosphere > > Currently docker image default configuration are included in `ProvisionInfo`. > We should grab necessary config from `ProvisionInfo` into `ContainerInfo`, > and handle all these runtime informations inside of docker runtime isolator. > Return a `ContainerLaunchInfo` containing `working_dir`, `env` and merged > `commandInfo`, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)