[jira] [Updated] (MESOS-4578) docker run -c is deprecated

2016-02-01 Thread Cody Maloney (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cody Maloney updated MESOS-4578:

Labels: mesosphere newbie  (was: mesosphere)

> docker run -c is deprecated
> ---
>
> Key: MESOS-4578
> URL: https://issues.apache.org/jira/browse/MESOS-4578
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, docker
>Affects Versions: 0.26.0
> Environment: CoreOS 7
>Reporter: Cody Maloney
>  Labels: mesosphere, newbie
>
> When running mesos slave with the docker containerizer enabled on CoreOS 
> 766.4.0, launching docker containers results in the following in stderr:
> {noformat}
> Warning: '-c' is deprecated, it will be replaced by '--cpu-shares' soon. See 
> usage.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4578) docker run -c is deprecated

2016-02-01 Thread Cody Maloney (JIRA)
Cody Maloney created MESOS-4578:
---

 Summary: docker run -c is deprecated
 Key: MESOS-4578
 URL: https://issues.apache.org/jira/browse/MESOS-4578
 Project: Mesos
  Issue Type: Improvement
  Components: containerization, docker
Affects Versions: 0.26.0
 Environment: CoreOS 7
Reporter: Cody Maloney


When running mesos slave with the docker containerizer enabled on CoreOS 
766.4.0, launching docker containers results in the following in stderr:
{noformat}
Warning: '-c' is deprecated, it will be replaced by '--cpu-shares' soon. See 
usage.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4066) Expose when agent is recovering in the agent's /state endpoint.

2016-02-01 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127802#comment-15127802
 ] 

Guangya Liu commented on MESOS-4066:


Yes, I think that it should be /state endpoint and have update the summary for 
this jira ticket.

> Expose when agent is recovering in the agent's /state endpoint.
> ---
>
> Key: MESOS-4066
> URL: https://issues.apache.org/jira/browse/MESOS-4066
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Benjamin Mahler
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Currently when a user is hitting /state.json on the agent, it may return 
> partial state if the agent has failed over and is recovering. There is 
> currently no clear way to tell if this is the case when looking at a 
> response, so the user may incorrectly interpret the agent as being empty of 
> tasks.
> We could consider exposing the 'state' enum of the agent in the endpoint:
> {code}
>   enum State
>   {
> RECOVERING,   // Slave is doing recovery.
> DISCONNECTED, // Slave is not connected to the master.
> RUNNING,  // Slave has (re-)registered.
> TERMINATING,  // Slave is shutting down.
>   } state;
> {code}
> This may be a bit tricky to maintain as far as backwards-compatibility of the 
> endpoint, if we were to alter this enum.
> Exposing this would allow users to be more informed about the state of the 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4066) Expose when agent is recovering in the agent's /state endpoint.

2016-02-01 Thread Guangya Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guangya Liu updated MESOS-4066:
---
Summary: Expose when agent is recovering in the agent's /state endpoint.  
(was: Expose when agent is recovering in the agent's /state.json endpoint.)

> Expose when agent is recovering in the agent's /state endpoint.
> ---
>
> Key: MESOS-4066
> URL: https://issues.apache.org/jira/browse/MESOS-4066
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Benjamin Mahler
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Currently when a user is hitting /state.json on the agent, it may return 
> partial state if the agent has failed over and is recovering. There is 
> currently no clear way to tell if this is the case when looking at a 
> response, so the user may incorrectly interpret the agent as being empty of 
> tasks.
> We could consider exposing the 'state' enum of the agent in the endpoint:
> {code}
>   enum State
>   {
> RECOVERING,   // Slave is doing recovery.
> DISCONNECTED, // Slave is not connected to the master.
> RUNNING,  // Slave has (re-)registered.
> TERMINATING,  // Slave is shutting down.
>   } state;
> {code}
> This may be a bit tricky to maintain as far as backwards-compatibility of the 
> endpoint, if we were to alter this enum.
> Exposing this would allow users to be more informed about the state of the 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper

2016-02-01 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127759#comment-15127759
 ] 

Shuai Lin commented on MESOS-1806:
--

Hi, I think the rationale of this ticket is some company/organization already 
have a running etcd cluster, so it would be easier for them to deploy a mesos 
cluster without having to setup a dedicated zookeeper quorum. It doesn't mean 
etcd can afford something better than zk. 

> Substituting etcd for Zookeeper
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Task
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper

2016-02-01 Thread Brandon Philips (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127644#comment-15127644
 ] 

Brandon Philips commented on MESOS-1806:


The etcd v3 api is a better match for the things you are looking to do.




> Substituting etcd for Zookeeper
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Task
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64)

2016-02-01 Thread AndyPang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AndyPang updated MESOS-4577:

Summary: libprocess can not run on 16-byte aligned stack mandatory 
architecture(aarch64)   (was: libprocess can not run on 16-byte aligned stack 
mandatory architecture(aarch64 ppc64) )

> libprocess can not run on 16-byte aligned stack mandatory 
> architecture(aarch64) 
> 
>
> Key: MESOS-4577
> URL: https://issues.apache.org/jira/browse/MESOS-4577
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, libprocess, stout
> Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 
> 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
>Reporter: AndyPang
>Assignee: AndyPang
>  Labels: mesosphere
>
> mesos run in AArch64 will get error, the log is:
> {code}
> E0101 00:06:56.636520 32411 slave.cpp:3342] Container 
> 'b6be429a-08f0-4d52-b01d-abfcb6e0106b' for executor 
> 'hello.84d205ae-f626-11de-bd66-7a3f6cf980b9' of framework 
> '868b9f04-9179-427b-b050-ee8f89ffa3bd-' failed to start: Failed to fork 
> executor: Failed to clone child process: Failed to clone: Invalid argument 
> {code}
> the "clone" achieve in libprocess 3rdparty stout library(in linux.hpp) 
> packaging a syscall "clone" :
> {code:title=clone|borderStyle=solid}
> inline pid_t clone(const lambda::function& func, int flags)
> {
>   // Stack for the child.
>   // - unsigned long long used for best alignment.
>   // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux.
>   //
>   // NOTE: We need to allocate the stack dynamically. This is because
>   // glibc's 'clone' will modify the stack passed to it, therefore the
>   // stack must NOT be shared as multiple 'clone's can be invoked
>   // simultaneously.
>   int stackSize = 8 * 1024 * 1024;
>   unsigned long long *stack =
> new unsigned long long[stackSize/sizeof(unsigned long long)];
>   pid_t pid = ::clone(
>   childMain,
>   &stack[stackSize/sizeof(stack[0]) - 1],  // stack grows down.
>   flags,
>   (void*) &func);
>   // If CLONE_VM is not set, ::clone would create a process which runs in a
>   // separate copy of the memory space of the calling process. So we destroy 
> the
>   // stack here to avoid memory leak. If CLONE_VM is set, ::clone would 
> create a
>   // thread which runs in the same memory space with the calling process.
>   if (!(flags & CLONE_VM)) {
> delete[] stack;
>   }
>   return pid;
> }
> {code}
> syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned 
> stack mandatory architecture(aarch64 ppc64) it will get error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)

2016-02-01 Thread AndyPang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127538#comment-15127538
 ] 

AndyPang edited comment on MESOS-4577 at 2/2/16 4:05 AM:
-

the syscal "clone" achieve in /arch/arm64/kernel/process.c, in "copy_thread" 
function:
{code}
   if (stack_start) {
   /* 16-byte aligned stack mandatory on AArch64 */
   if (stack_start & 15)
   return -EINVAL;
   childregs->sp = stack_start;
   }
{code}
AArch64 the stack must be 16-byte aligned


was (Author: andypang):
the syscal "clone" achieve in linux-4.1.6/arch/arm64/kernel/process.c, in 
"copy_thread" function:
{code}
   if (stack_start) {
   /* 16-byte aligned stack mandatory on AArch64 */
   if (stack_start & 15)
   return -EINVAL;
   childregs->sp = stack_start;
   }
{code}
AArch64 the stack must be 16-byte aligned

> libprocess can not run on 16-byte aligned stack mandatory 
> architecture(aarch64 ppc64) 
> --
>
> Key: MESOS-4577
> URL: https://issues.apache.org/jira/browse/MESOS-4577
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, libprocess, stout
> Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 
> 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
>Reporter: AndyPang
>Assignee: AndyPang
>  Labels: mesosphere
>
> mesos run in AArch64 will get error, the log is:
> {code}
> E0101 00:06:56.636520 32411 slave.cpp:3342] Container 
> 'b6be429a-08f0-4d52-b01d-abfcb6e0106b' for executor 
> 'hello.84d205ae-f626-11de-bd66-7a3f6cf980b9' of framework 
> '868b9f04-9179-427b-b050-ee8f89ffa3bd-' failed to start: Failed to fork 
> executor: Failed to clone child process: Failed to clone: Invalid argument 
> {code}
> the "clone" achieve in libprocess 3rdparty stout library(in linux.hpp) 
> packaging a syscall "clone" :
> {code:title=clone|borderStyle=solid}
> inline pid_t clone(const lambda::function& func, int flags)
> {
>   // Stack for the child.
>   // - unsigned long long used for best alignment.
>   // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux.
>   //
>   // NOTE: We need to allocate the stack dynamically. This is because
>   // glibc's 'clone' will modify the stack passed to it, therefore the
>   // stack must NOT be shared as multiple 'clone's can be invoked
>   // simultaneously.
>   int stackSize = 8 * 1024 * 1024;
>   unsigned long long *stack =
> new unsigned long long[stackSize/sizeof(unsigned long long)];
>   pid_t pid = ::clone(
>   childMain,
>   &stack[stackSize/sizeof(stack[0]) - 1],  // stack grows down.
>   flags,
>   (void*) &func);
>   // If CLONE_VM is not set, ::clone would create a process which runs in a
>   // separate copy of the memory space of the calling process. So we destroy 
> the
>   // stack here to avoid memory leak. If CLONE_VM is set, ::clone would 
> create a
>   // thread which runs in the same memory space with the calling process.
>   if (!(flags & CLONE_VM)) {
> delete[] stack;
>   }
>   return pid;
> }
> {code}
> syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned 
> stack mandatory architecture(aarch64 ppc64) it will get error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2585) Use full width for mesos div.container

2016-02-01 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-2585:

Assignee: Michael Lunøe  (was: haosdent)

> Use full width for mesos div.container
> --
>
> Key: MESOS-2585
> URL: https://issues.apache.org/jira/browse/MESOS-2585
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Alson Kemp
>Assignee: Michael Lunøe
>Priority: Trivial
> Attachments: After (patch 2).png, Narrow (current).png, Wide 
> (patched).png, github_full_width.png
>
>
> I've patched our Mesos installation so that the webui takes up the full page 
> width and is much nicer to look at on large monitors.  It's a small change.  
> If y'all want me to submit a PR with the update, I'll do so.
> Before:
> !https://issues.apache.org/jira/secure/attachment/12708818/Narrow%20%28current%29.png|width=800!
> After:
> !https://issues.apache.org/jira/secure/attachment/12708861/After%20%28patch%202%29.png|width=800!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2585) Use full width for mesos div.container

2016-02-01 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127629#comment-15127629
 ] 

haosdent commented on MESOS-2585:
-

So not matter which screen size, we always keep use full width, right?

> Use full width for mesos div.container
> --
>
> Key: MESOS-2585
> URL: https://issues.apache.org/jira/browse/MESOS-2585
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Alson Kemp
>Assignee: haosdent
>Priority: Trivial
> Attachments: After (patch 2).png, Narrow (current).png, Wide 
> (patched).png, github_full_width.png
>
>
> I've patched our Mesos installation so that the webui takes up the full page 
> width and is much nicer to look at on large monitors.  It's a small change.  
> If y'all want me to submit a PR with the update, I'll do so.
> Before:
> !https://issues.apache.org/jira/secure/attachment/12708818/Narrow%20%28current%29.png|width=800!
> After:
> !https://issues.apache.org/jira/secure/attachment/12708861/After%20%28patch%202%29.png|width=800!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper

2016-02-01 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127593#comment-15127593
 ] 

Deshi Xiao commented on MESOS-1806:
---

Shuai Lin,

Through reading your result on implement etcd, i can't found any benefit to 
replace zookeeper, and do you have the same feeling? even i hate the zookeeper, 
but the etcd implement result is not good match my requirements.

> Substituting etcd for Zookeeper
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Task
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)

2016-02-01 Thread AndyPang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AndyPang updated MESOS-4577:

Description: 
mesos run in AArch64 will get error, the log is:
{code}
E0101 00:06:56.636520 32411 slave.cpp:3342] Container 
'b6be429a-08f0-4d52-b01d-abfcb6e0106b' for executor 
'hello.84d205ae-f626-11de-bd66-7a3f6cf980b9' of framework 
'868b9f04-9179-427b-b050-ee8f89ffa3bd-' failed to start: Failed to fork 
executor: Failed to clone child process: Failed to clone: Invalid argument 
{code}
the "clone" achieve in libprocess 3rdparty stout library(in linux.hpp) 
packaging a syscall "clone" :
{code:title=clone|borderStyle=solid}
inline pid_t clone(const lambda::function& func, int flags)
{
  // Stack for the child.
  // - unsigned long long used for best alignment.
  // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux.
  //
  // NOTE: We need to allocate the stack dynamically. This is because
  // glibc's 'clone' will modify the stack passed to it, therefore the
  // stack must NOT be shared as multiple 'clone's can be invoked
  // simultaneously.
  int stackSize = 8 * 1024 * 1024;
  unsigned long long *stack =
new unsigned long long[stackSize/sizeof(unsigned long long)];

  pid_t pid = ::clone(
  childMain,
  &stack[stackSize/sizeof(stack[0]) - 1],  // stack grows down.
  flags,
  (void*) &func);

  // If CLONE_VM is not set, ::clone would create a process which runs in a
  // separate copy of the memory space of the calling process. So we destroy the
  // stack here to avoid memory leak. If CLONE_VM is set, ::clone would create a
  // thread which runs in the same memory space with the calling process.
  if (!(flags & CLONE_VM)) {
delete[] stack;
  }

  return pid;
}
{code}
syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned stack 
mandatory architecture(aarch64 ppc64) it will get error.

  was:
libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" :
{code:title=clone|borderStyle=solid}
inline pid_t clone(const lambda::function& func, int flags)
{
  // Stack for the child.
  // - unsigned long long used for best alignment.
  // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux.
  //
  // NOTE: We need to allocate the stack dynamically. This is because
  // glibc's 'clone' will modify the stack passed to it, therefore the
  // stack must NOT be shared as multiple 'clone's can be invoked
  // simultaneously.
  int stackSize = 8 * 1024 * 1024;
  unsigned long long *stack =
new unsigned long long[stackSize/sizeof(unsigned long long)];

  pid_t pid = ::clone(
  childMain,
  &stack[stackSize/sizeof(stack[0]) - 1],  // stack grows down.
  flags,
  (void*) &func);

  // If CLONE_VM is not set, ::clone would create a process which runs in a
  // separate copy of the memory space of the calling process. So we destroy the
  // stack here to avoid memory leak. If CLONE_VM is set, ::clone would create a
  // thread which runs in the same memory space with the calling process.
  if (!(flags & CLONE_VM)) {
delete[] stack;
  }

  return pid;
}
{code}
syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned stack 
mandatory architecture(aarch64 ppc64) it will get error.


> libprocess can not run on 16-byte aligned stack mandatory 
> architecture(aarch64 ppc64) 
> --
>
> Key: MESOS-4577
> URL: https://issues.apache.org/jira/browse/MESOS-4577
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, libprocess, stout
> Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 
> 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
>Reporter: AndyPang
>Assignee: AndyPang
>  Labels: mesosphere
>
> mesos run in AArch64 will get error, the log is:
> {code}
> E0101 00:06:56.636520 32411 slave.cpp:3342] Container 
> 'b6be429a-08f0-4d52-b01d-abfcb6e0106b' for executor 
> 'hello.84d205ae-f626-11de-bd66-7a3f6cf980b9' of framework 
> '868b9f04-9179-427b-b050-ee8f89ffa3bd-' failed to start: Failed to fork 
> executor: Failed to clone child process: Failed to clone: Invalid argument 
> {code}
> the "clone" achieve in libprocess 3rdparty stout library(in linux.hpp) 
> packaging a syscall "clone" :
> {code:title=clone|borderStyle=solid}
> inline pid_t clone(const lambda::function& func, int flags)
> {
>   // Stack for the child.
>   // - unsigned long long used for best alignment.
>   // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux.
>   //
>   // NOTE: We need to allocate the stack dynamically. This is because
>   // glibc's 'clone' will modify the stack passed to it, therefore the
>   // stack must NOT be shared as multiple 'clone's can be invoked
>   // simultaneously.
>   

[jira] [Commented] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)

2016-02-01 Thread AndyPang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127538#comment-15127538
 ] 

AndyPang commented on MESOS-4577:
-

the syscal "clone" achieve in linux-4.1.6/arch/arm64/kernel/process.c, in 
"copy_thread" function:
{code}
   if (stack_start) {
   /* 16-byte aligned stack mandatory on AArch64 */
   if (stack_start & 15)
   return -EINVAL;
   childregs->sp = stack_start;
   }
{code}
AArch64 the stack must be 16-byte aligned

> libprocess can not run on 16-byte aligned stack mandatory 
> architecture(aarch64 ppc64) 
> --
>
> Key: MESOS-4577
> URL: https://issues.apache.org/jira/browse/MESOS-4577
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, libprocess, stout
> Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 
> 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
>Reporter: AndyPang
>Assignee: AndyPang
>  Labels: mesosphere
>
> libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" :
> {code:title=clone|borderStyle=solid}
> inline pid_t clone(const lambda::function& func, int flags)
> {
>   // Stack for the child.
>   // - unsigned long long used for best alignment.
>   // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux.
>   //
>   // NOTE: We need to allocate the stack dynamically. This is because
>   // glibc's 'clone' will modify the stack passed to it, therefore the
>   // stack must NOT be shared as multiple 'clone's can be invoked
>   // simultaneously.
>   int stackSize = 8 * 1024 * 1024;
>   unsigned long long *stack =
> new unsigned long long[stackSize/sizeof(unsigned long long)];
>   pid_t pid = ::clone(
>   childMain,
>   &stack[stackSize/sizeof(stack[0]) - 1],  // stack grows down.
>   flags,
>   (void*) &func);
>   // If CLONE_VM is not set, ::clone would create a process which runs in a
>   // separate copy of the memory space of the calling process. So we destroy 
> the
>   // stack here to avoid memory leak. If CLONE_VM is set, ::clone would 
> create a
>   // thread which runs in the same memory space with the calling process.
>   if (!(flags & CLONE_VM)) {
> delete[] stack;
>   }
>   return pid;
> }
> {code}
> syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned 
> stack mandatory architecture(aarch64 ppc64) it will get error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)

2016-02-01 Thread AndyPang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AndyPang reassigned MESOS-4577:
---

Assignee: AndyPang

> libprocess can not run on 16-byte aligned stack mandatory 
> architecture(aarch64 ppc64) 
> --
>
> Key: MESOS-4577
> URL: https://issues.apache.org/jira/browse/MESOS-4577
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, libprocess, stout
> Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 
> 01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
>Reporter: AndyPang
>Assignee: AndyPang
>  Labels: mesosphere
>
> libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" :
> {code:title=clone|borderStyle=solid}
> inline pid_t clone(const lambda::function& func, int flags)
> {
>   // Stack for the child.
>   // - unsigned long long used for best alignment.
>   // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux.
>   //
>   // NOTE: We need to allocate the stack dynamically. This is because
>   // glibc's 'clone' will modify the stack passed to it, therefore the
>   // stack must NOT be shared as multiple 'clone's can be invoked
>   // simultaneously.
>   int stackSize = 8 * 1024 * 1024;
>   unsigned long long *stack =
> new unsigned long long[stackSize/sizeof(unsigned long long)];
>   pid_t pid = ::clone(
>   childMain,
>   &stack[stackSize/sizeof(stack[0]) - 1],  // stack grows down.
>   flags,
>   (void*) &func);
>   // If CLONE_VM is not set, ::clone would create a process which runs in a
>   // separate copy of the memory space of the calling process. So we destroy 
> the
>   // stack here to avoid memory leak. If CLONE_VM is set, ::clone would 
> create a
>   // thread which runs in the same memory space with the calling process.
>   if (!(flags & CLONE_VM)) {
> delete[] stack;
>   }
>   return pid;
> }
> {code}
> syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned 
> stack mandatory architecture(aarch64 ppc64) it will get error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4577) libprocess can not run on 16-byte aligned stack mandatory architecture(aarch64 ppc64)

2016-02-01 Thread AndyPang (JIRA)
AndyPang created MESOS-4577:
---

 Summary: libprocess can not run on 16-byte aligned stack mandatory 
architecture(aarch64 ppc64) 
 Key: MESOS-4577
 URL: https://issues.apache.org/jira/browse/MESOS-4577
 Project: Mesos
  Issue Type: Bug
  Components: containerization, libprocess, stout
 Environment: Linux 10-175-112-202 4.1.6-rc3.aarch64 #1 SMP Mon Oct 12 
01:43:03 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
Reporter: AndyPang


libprocess 3rdparty stout library(in linux.hpp) packaging a syscall "clone" :
{code:title=clone|borderStyle=solid}
inline pid_t clone(const lambda::function& func, int flags)
{
  // Stack for the child.
  // - unsigned long long used for best alignment.
  // - 8 MiB appears to be the default for "ulimit -s" on OSX and Linux.
  //
  // NOTE: We need to allocate the stack dynamically. This is because
  // glibc's 'clone' will modify the stack passed to it, therefore the
  // stack must NOT be shared as multiple 'clone's can be invoked
  // simultaneously.
  int stackSize = 8 * 1024 * 1024;
  unsigned long long *stack =
new unsigned long long[stackSize/sizeof(unsigned long long)];

  pid_t pid = ::clone(
  childMain,
  &stack[stackSize/sizeof(stack[0]) - 1],  // stack grows down.
  flags,
  (void*) &func);

  // If CLONE_VM is not set, ::clone would create a process which runs in a
  // separate copy of the memory space of the calling process. So we destroy the
  // stack here to avoid memory leak. If CLONE_VM is set, ::clone would create a
  // thread which runs in the same memory space with the calling process.
  if (!(flags & CLONE_VM)) {
delete[] stack;
  }

  return pid;
}
{code}
syscal "clone" parameter stack is 8-byte aligned,so if in 16-byte aligned stack 
mandatory architecture(aarch64 ppc64) it will get error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"

2016-02-01 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127429#comment-15127429
 ] 

haosdent commented on MESOS-4576:
-

+1

> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4576) Introduce a stout helper for "which"

2016-02-01 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-4576:
---

Assignee: haosdent

> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>Assignee: haosdent
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2585) Use full width for mesos div.container

2016-02-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127301#comment-15127301
 ] 

Michael Lunøe commented on MESOS-2585:
--

The problem with the suggested solution is that it overrides Bootstrap 
functionality and does not work with, but against Bootstrap styles.
I have created a patch to show my proposed solution here: 
https://reviews.apache.org/r/43072/

> Use full width for mesos div.container
> --
>
> Key: MESOS-2585
> URL: https://issues.apache.org/jira/browse/MESOS-2585
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Alson Kemp
>Assignee: haosdent
>Priority: Trivial
> Attachments: After (patch 2).png, Narrow (current).png, Wide 
> (patched).png, github_full_width.png
>
>
> I've patched our Mesos installation so that the webui takes up the full page 
> width and is much nicer to look at on large monitors.  It's a small change.  
> If y'all want me to submit a PR with the update, I'll do so.
> Before:
> !https://issues.apache.org/jira/secure/attachment/12708818/Narrow%20%28current%29.png|width=800!
> After:
> !https://issues.apache.org/jira/secure/attachment/12708861/After%20%28patch%202%29.png|width=800!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"

2016-02-01 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127287#comment-15127287
 ] 

Greg Mann commented on MESOS-4421:
--

Sure I'm happy to help review :-)

> Document that /reserve, /create-volumes endpoints can return misleading 
> "success"
> -
>
> Key: MESOS-4421
> URL: https://issues.apache.org/jira/browse/MESOS-4421
> Project: Mesos
>  Issue Type: Task
>  Components: documentation, master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: documentation, endpoint, mesosphere, persistent-volumes, 
> reservations
>
> The docs for the {{/reserve}} endpoint say:
> {noformat}
> 200 OK: Success (the requested resources have been reserved).
> {noformat}
> This is not true: the master returns {{200}} when the request has been 
> validated and a {{CheckpointResourcesMessage}} has been sent to the agent, 
> but the master does not attempt to verify that the message has been received 
> or that the agent successfully checkpointed. Same behavior applies to 
> {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}.
> We should _either_:
> 1. Accurately document what {{200}} return code means.
> 2. Change the implementation to wait for the agent's next checkpoint to 
> succeed (and to include the effect of the operation) before returning success 
> to the HTTP client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-416) Ensure master / slave do not get kernel OOM before executors, by setting oom_adj control.

2016-02-01 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127282#comment-15127282
 ] 

James DeFelice commented on MESOS-416:
--

FWIW kubernetes is doing this already for its important procs

> Ensure master / slave do not get kernel OOM before executors, by setting 
> oom_adj control.
> -
>
> Key: MESOS-416
> URL: https://issues.apache.org/jira/browse/MESOS-416
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: mesosphere, security, twitter
>
> We can adjust the /proc//oom_adj control during master / slave startup, 
> setting it to a low value to ensure we aren't killed first during an OOM.
> Relevant LWN article: http://lwn.net/Articles/317814/
> Also relevant: https://bugzilla.redhat.com/show_bug.cgi?id=239313



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-416) Ensure master / slave do not get kernel OOM before executors, by setting oom_adj control.

2016-02-01 Thread James DeFelice (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James DeFelice updated MESOS-416:
-
Labels: mesosphere security twitter  (was: mesosphere twitter)

> Ensure master / slave do not get kernel OOM before executors, by setting 
> oom_adj control.
> -
>
> Key: MESOS-416
> URL: https://issues.apache.org/jira/browse/MESOS-416
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: mesosphere, security, twitter
>
> We can adjust the /proc//oom_adj control during master / slave startup, 
> setting it to a low value to ensure we aren't killed first during an OOM.
> Relevant LWN article: http://lwn.net/Articles/317814/
> Also relevant: https://bugzilla.redhat.com/show_bug.cgi?id=239313



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4235) JSON generation performance improvement

2016-02-01 Thread Cong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127275#comment-15127275
 ] 

Cong Wang commented on MESOS-4235:
--

Hi, [~mcypark]
Since all the subtickets are resolved, I assume this one is resolved too in the 
latest code base?

> JSON generation performance improvement
> ---
>
> Key: MESOS-4235
> URL: https://issues.apache.org/jira/browse/MESOS-4235
> Project: Mesos
>  Issue Type: Epic
>  Components: libprocess, master, stout
>Reporter: Michael Park
>Assignee: Michael Park
>  Labels: mesosphere, scalability, twitter
>
> This is an epic which evolved from MESOS-2353. As mentioned in the 
> description of MESOS-2353, most of the work is spent performing memory 
> allocation/deallocation. Some preliminary efforts have been made such as 
> calling {{reserve}} for {{JSON::Array}}. There are still plenty of dynamic 
> allocations being made especially from instances of {{JSON::Object}} which 
> hold a {{std::map}} as a member.
> The current approach being adopted is to introduce a {{jsonify}} function 
> which by-passes these unnecessary dynamic allocations and copying, and to 
> simply hold references to the underlying objects.
> We plan to first introduce the {{jsonify}} function to {{stout}}, and update 
> master's {{state}} endpoint, then proceed to update the rest of the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-416) Ensure master / slave do not get kernel OOM before executors, by setting oom_adj control.

2016-02-01 Thread Cody Maloney (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cody Maloney updated MESOS-416:
---
Labels: mesosphere twitter  (was: twitter)

> Ensure master / slave do not get kernel OOM before executors, by setting 
> oom_adj control.
> -
>
> Key: MESOS-416
> URL: https://issues.apache.org/jira/browse/MESOS-416
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: mesosphere, twitter
>
> We can adjust the /proc//oom_adj control during master / slave startup, 
> setting it to a low value to ensure we aren't killed first during an OOM.
> Relevant LWN article: http://lwn.net/Articles/317814/
> Also relevant: https://bugzilla.redhat.com/show_bug.cgi?id=239313



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-416) Ensure master / slave do not get kernel OOM before executors, by setting oom_adj control.

2016-02-01 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127272#comment-15127272
 ] 

James DeFelice commented on MESOS-416:
--

AKA oom_score_adj ?

> Ensure master / slave do not get kernel OOM before executors, by setting 
> oom_adj control.
> -
>
> Key: MESOS-416
> URL: https://issues.apache.org/jira/browse/MESOS-416
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: twitter
>
> We can adjust the /proc//oom_adj control during master / slave startup, 
> setting it to a low value to ensure we aren't killed first during an OOM.
> Relevant LWN article: http://lwn.net/Articles/317814/
> Also relevant: https://bugzilla.redhat.com/show_bug.cgi?id=239313



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess

2016-02-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3570:
--
Shepherd: Vinod Kone
Assignee: Anand Mazumdar  (was: Vinod Kone)

> Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess
> 
>
> Key: MESOS-3570
> URL: https://issues.apache.org/jira/browse/MESOS-3570
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: mesosphere, newbie
>
> Currently, the scheduler library sends calls in order by chaining them and 
> sending them only when it has received a response for the earlier call. This 
> was done because there was no HTTP Pipelining abstraction in Libprocess 
> {{process::post}}.
> However once {{MESOS-3332}} is resolved, we should be now able to use the new 
> abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"

2016-02-01 Thread Artem Harutyunyan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127221#comment-15127221
 ] 

Artem Harutyunyan commented on MESOS-4421:
--

Hey [~greggomann], could you please take a look at this one? Jie is fine with 
committing it once it has a Ship It from you. 

> Document that /reserve, /create-volumes endpoints can return misleading 
> "success"
> -
>
> Key: MESOS-4421
> URL: https://issues.apache.org/jira/browse/MESOS-4421
> Project: Mesos
>  Issue Type: Task
>  Components: documentation, master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: documentation, endpoint, mesosphere, persistent-volumes, 
> reservations
>
> The docs for the {{/reserve}} endpoint say:
> {noformat}
> 200 OK: Success (the requested resources have been reserved).
> {noformat}
> This is not true: the master returns {{200}} when the request has been 
> validated and a {{CheckpointResourcesMessage}} has been sent to the agent, 
> but the master does not attempt to verify that the message has been received 
> or that the agent successfully checkpointed. Same behavior applies to 
> {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}.
> We should _either_:
> 1. Accurately document what {{200}} return code means.
> 2. Change the implementation to wait for the agent's next checkpoint to 
> succeed (and to include the effect of the operation) before returning success 
> to the HTTP client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4053:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27  (was: Mesosphere Sprint 
26, Mesosphere Sprint 27, Mesosphere Sprint 28)

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"

2016-02-01 Thread Jojy Varghese (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127138#comment-15127138
 ] 

Jojy Varghese commented on MESOS-4576:
--

Not sure. The choice between *which* and *find* is same as the use case of 
these commands on a linux machine. *which* is used when you know the command 
and want to find the path to the command in PATH. *find* is used when you want 
to search for a file/command in a path. 

I think, in the case of SHAxxx, what we need is a combination of which and 
find- we want to *find* the first command that looks like 
"sha\(512\)\*sum"|"openssl" in a set of Paths ( that could be PATH).

> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"

2016-02-01 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127107#comment-15127107
 ] 

Joseph Wu commented on MESOS-4576:
--

I think the usage for the {{sha512}} case would be multiple calls to 
{{os::which}}.  
i.e. Something not-too-pretty like: 
{code}
Option whichSha = os::which("shasum");

if (whichSha.isNone()) {
  whichSha = os::which("sha512sum");

  if (whichSha.isNone()) {
 whichSha = os::which("openssl");

 if (whichSha.isNone()) {
return Error("...");
 }
  }
}
{code}

> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4576) Introduce a stout helper for "which"

2016-02-01 Thread Jojy Varghese (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127080#comment-15127080
 ] 

Jojy Varghese edited comment on MESOS-4576 at 2/1/16 9:38 PM:
--

Wondering if we need another interface that accepts a regular expression of 
command. For example, in the case of shasum, we dont know what command to look 
for *shasum* or *sha512sum* or maybe *openssl* (openssl dgst).

Maybe that interface looks like *find*.


was (Author: jojy):
Wondering if we need another interface that accepts a regular expression of 
command. For example, in the case of shasum, we dont know what command to look 
for *shasum* or *sha512sum* or maybe *openssl* (openssl dgst).

> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"

2016-02-01 Thread Jojy Varghese (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127080#comment-15127080
 ] 

Jojy Varghese commented on MESOS-4576:
--

Wondering if we need another interface that accepts a regular expression of 
command. For example, in the case of shasum, we dont know what command to look 
for *shasum* or *sha512sum* or maybe *openssl* (openssl dgst).

> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2974) stout flags can't have their defaults reset

2016-02-01 Thread Joerg Schad (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Schad reassigned MESOS-2974:
--

Assignee: Joerg Schad

> stout flags can't have their defaults reset
> ---
>
> Key: MESOS-2974
> URL: https://issues.apache.org/jira/browse/MESOS-2974
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Joris Van Remoortere
>Assignee: Joerg Schad
>  Labels: flags, newbie, stout
>
> Stout flags don't remember their default values, and so can't have their 
> defaults reset. This makes it hard to reset flags to their defaults between 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4556) ShasumTest.SHA512SimpleFile failed on centos7.

2016-02-01 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127055#comment-15127055
 ] 

Jie Yu commented on MESOS-4556:
---

commit 8b5523d2c512cfab4b69b70960515c4a8d791d2b
Author: haosdent huang 
Date:   Mon Feb 1 13:17:00 2016 -0800

Fixed ShasumTest.SHA512SimpleFile on centos7.

Review: https://reviews.apache.org/r/43014/

> ShasumTest.SHA512SimpleFile failed on centos7.
> --
>
> Key: MESOS-4556
> URL: https://issues.apache.org/jira/browse/MESOS-4556
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: haosdent
>
> Looks like shasum is not available on some systems. We should check if it's 
> available on those systems.
> {noformat}
> [ RUN  ] ShasumTest.SHA512SimpleFile
> ../../src/tests/common/command_utils_tests.cpp:237: Failure
> (sha512).failure(): Subprocess 'shasum, shasum, -a, 512, /tmp/o9CZPZ/test' 
> failed: ABORT: (../../../3rdparty/libprocess/src/subprocess.cpp:293): Failed 
> to os::execvpe on path 'shasum': No such file or directory
> *** Aborted at 1454097934 (unix time) try "date -d @1454097934" if you are 
> using GNU date ***
> PC: @ 0x7f26a0ae75f7 __GI_raise
> *** SIGABRT (@0x3e8258d) received by PID 9613 (TID 0x7f26a7aa78c0) from 
> PID 9613; stack trace: ***
> @ 0x7f26a1aae100 (unknown)
> @ 0x7f26a0ae75f7 __GI_raise
> @ 0x7f26a0ae8ce8 __GI_abort
> @   0x998808 _Abort()
> @   0x998836 _Abort()
> @ 0x7f26a678937d process::childMain()
> @ 0x7f26a678f5e3 
> _ZNSt5_BindIFPFiRKSsPPcS3_RK6OptionISt8functionIFivEEERKN7process10Subprocess2IO20InputFileDescriptorsERKNSD_21OutputFileDescriptorsESJ_ESsS3_S3_S8_SE_SH_SH_EE6__callIiJEJLm0ELm1ELm2ELm3ELm4ELm5ELm6T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @ 0x7f26a678ed52 std::_Bind<>::operator()<>()
> @ 0x7f26a678e124 std::_Function_handler<>::_M_invoke()
> @   0x99b49c std::function<>::operator()()
> @ 0x7f26a67890e6 process::defaultClone()
> @ 0x7f26a678d922 std::_Function_handler<>::_M_invoke()
> @ 0x7f26a678d1fd std::function<>::operator()()
> @ 0x7f26a6789ed7 process::subprocess()
> @ 0x7f26a57b9cec mesos::internal::command::launch()
> @ 0x7f26a57bacdc mesos::internal::command::shasum()
> @ 0x7f26a57bae37 mesos::internal::command::sha512()
> @  0x1440cce 
> mesos::internal::tests::ShasumTest_SHA512SimpleFile_Test::TestBody()
> @  0x164e8a2 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x16497f0 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x162add5 testing::Test::Run()
> @  0x162b558 testing::TestInfo::Run()
> @  0x162bb9e testing::TestCase::Run()
> @  0x1632478 testing::internal::UnitTestImpl::RunAllTests()
> @  0x164f4c7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x164a36e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x16311be testing::UnitTest::Run()
> @   0xdeeb0a RUN_ALL_TESTS()
> @   0xdee720 main
> @ 0x7f26a0ad3b15 __libc_start_main
> @   0x997599 (unknown)
> [  FAILED  ] ShasumTest.SHA512SimpleFile (202 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4576) Introduce a stout helper for "which"

2016-02-01 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127047#comment-15127047
 ] 

Jie Yu commented on MESOS-4576:
---

+1

We need to check if the binary is executable or not as well (using 'access').


> Introduce a stout helper for "which"
> 
>
> Key: MESOS-4576
> URL: https://issues.apache.org/jira/browse/MESOS-4576
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> We may want to add a helper to {{stout/os.hpp}} that will natively emulate 
> the functionality of the Linux utility {{which}}.  i.e.
> {code}
> Option which(const string& command)
> {
>   Option path = os::getenv("PATH");
>   // Loop through path and return the first one which os::exists(...).
>   return None();
> }
> {code}
> This helper may be useful:
> * for test filters in {{src/tests/environment.cpp}}
> * a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
> * the {{sha512}} utility in {{src/common/command_utils.cpp}}
> * as runtime checks in the {{LogrotateContainerLogger}}
> * etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4576) Introduce a stout helper for "which"

2016-02-01 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-4576:


 Summary: Introduce a stout helper for "which"
 Key: MESOS-4576
 URL: https://issues.apache.org/jira/browse/MESOS-4576
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Reporter: Joseph Wu


We may want to add a helper to {{stout/os.hpp}} that will natively emulate the 
functionality of the Linux utility {{which}}.  i.e.
{code}
Option which(const string& command)
{
  Option path = os::getenv("PATH");

  // Loop through path and return the first one which os::exists(...).

  return None();
}
{code}

This helper may be useful:
* for test filters in {{src/tests/environment.cpp}}
* a few tests in {{src/tests/containerizer/port_mapping_tests.cpp}}
* the {{sha512}} utility in {{src/common/command_utils.cpp}}
* as runtime checks in the {{LogrotateContainerLogger}}
* etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4537) Optionally install stout and libprocess test binaries

2016-02-01 Thread Disha Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Disha Singh reassigned MESOS-4537:
--

Assignee: Disha Singh

> Optionally install stout and libprocess test binaries
> -
>
> Key: MESOS-4537
> URL: https://issues.apache.org/jira/browse/MESOS-4537
> Project: Mesos
>  Issue Type: Improvement
>  Components: build, libprocess, stout
>Reporter: Benjamin Bannier
>Assignee: Disha Singh
>Priority: Trivial
>  Labels: newbie
>
> With MESOS-3608 we add a way to install mesos-test binaries. We should 
> provide the same functionality for stout and libprocess test. Like 
> mesos-tests they should also be run automatically during distcheck via 
> installcheck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3831) Document operator HTTP endpoints

2016-02-01 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-3831:
---
Story Points: 3

> Document operator HTTP endpoints
> 
>
> Key: MESOS-3831
> URL: https://issues.apache.org/jira/browse/MESOS-3831
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Kevin Klues
>Priority: Minor
>  Labels: documentation, mesosphere, newbie
>
> These are not exhaustively documented; they probably should be.
> Some endpoints have docs: e.g., {{/reserve}} and {{/unreserve}} are described 
> in the reservation doc page. But it would be good to have a single page that 
> lists all the endpoints and their semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4071) Master crash during framework teardown ( Check failed: total.resources.contains(slaveId))

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4071:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Master crash during framework teardown ( Check failed: 
> total.resources.contains(slaveId))
> -
>
> Key: MESOS-4071
> URL: https://issues.apache.org/jira/browse/MESOS-4071
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.25.0
>Reporter: Mandeep Chadha
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Stack Trace :
> NOTE : Replaced IP address with XX.XX.XX.XX 
> {code}
> I1204 10:31:03.391127 2588810 master.cpp:5564] Processing TEARDOWN call for 
> framework 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 
> (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST) at 
> scheduler-c8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237
> I1204 10:31:03.391177 2588810 master.cpp:5576] Removing framework 
> 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014 
> (mloop-coprocesses-183c4999-9ce9-47b2-bc96-a865c672fcbb (TEST)) at 
> schedulerc8ab2103-cf36-40d8-8a2d-a6b69a8fc...@xx.xx.xx.xx:35237
> I1204 10:31:03.391337 2588805 hierarchical.hpp:605] Deactivated framework 
> 61ce62d1-7418-4ae1-aa78-a8ebf75ad502-0014
> F1204 10:31:03.395500 2588810 sorter.cpp:233] Check failed: 
> total.resources.contains(slaveId)
> *** Check failure stack trace: ***
> @ 0x7f2b3dda53d8  google::LogMessage::Fail()
> @ 0x7f2b3dda5327  google::LogMessage::SendToLog()
> @ 0x7f2b3dda4d38  google::LogMessage::Flush()
> @ 0x7f2b3dda7a6c  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f2b3d3351a1  
> mesos::internal::master::allocator::DRFSorter::remove()
> @ 0x7f2b3d0b8c29  
> mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
> @ 0x7f2b3d0ca823 
> _ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_11FrameworkIDES6_EEvRKNS_3PIDIT_EEMSA_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESJ_
> @ 0x7f2b3d0dc8dc  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_11FrameworkIDESA_EEvRKNS0_3PIDIT_EEMSE_FvT0_ET1_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2
> _
> @ 0x7f2b3dd2cc35  std::function<>::operator()()
> @ 0x7f2b3dd15ae5  process::ProcessBase::visit()
> @ 0x7f2b3dd188e2  process::DispatchEvent::visit()
> @   0x472366  process::ProcessBase::serve()
> @ 0x7f2b3dd1203f  process::ProcessManager::resume()
> @ 0x7f2b3dd061b2  process::internal::schedule()
> @ 0x7f2b3dd63efd  
> _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Inde
> x_tupleIJXspT_EEE
> @ 0x7f2b3dd63e4d  std::_Bind_simple<>::operator()()
> @ 0x7f2b3dd63de6  std::thread::_Impl<>::_M_run()
> @   0x318c2b6470  (unknown)
> @   0x318b2079d1  (unknown)
> @   0x318aae8b5d  (unknown)
> @  (nil)  (unknown)
> Aborted (core dumped)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-191) Add support for multiple disk resources

2016-02-01 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-191:
---
Sprint: Mesosphere Sprint 27  (was: Mesosphere Sprint 27, Mesosphere Sprint 
28)

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Epic
>Reporter: Vinod Kone
>Assignee: Joris Van Remoortere
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 
> Official Design Doc: 
> https://docs.google.com/document/d/1syPxygVNEHjG6FoyqslnpUGgNpYKU9QzKBuV2yKmjfQ/edit#heading=h.4fzj9sl24cwy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3763) Need for http::put request method

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3763:
-
Sprint: Mesosphere Sprint 27  (was: Mesosphere Sprint 27, Mesosphere Sprint 
28)

> Need for http::put request method
> -
>
> Key: MESOS-3763
> URL: https://issues.apache.org/jira/browse/MESOS-3763
> Project: Mesos
>  Issue Type: Task
>Reporter: Joerg Schad
>Assignee: Yongqiao Wang
>Priority: Minor
>  Labels: mesosphere
>
> As we decided to create a more restful api for managing Quota request.
> Therefore we also want to use the HTTP put request and hence need to enable 
> the libprocess/http to send put request besides get and post requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2317) Remove deprecated checkpoint=false code

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-2317:
-
Sprint: Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, 
Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 
10, Mesosphere Sprint 11, Mesosphere Sprint 26, Mesosphere Sprint 27, 
Mesosphere Sprint 28  (was: Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 
7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, 
Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 26, Mesosphere 
Sprint 27)

> Remove deprecated checkpoint=false code
> ---
>
> Key: MESOS-2317
> URL: https://issues.apache.org/jira/browse/MESOS-2317
> Project: Mesos
>  Issue Type: Epic
>Affects Versions: 0.22.0
>Reporter: Adam B
>Assignee: Joerg Schad
>  Labels: checkpoint, mesosphere
>
> Cody's plan from MESOS-444 was:
> 1) -Make it so the flag can't be changed at the command line-
> 2) -Remove the checkpoint variable entirely from slave/flags.hpp. This is a 
> fairly involved change since a number of unit tests depend on manually 
> setting the flag, as well as the default being non-checkpointing.-
> 3) -Remove logic around checkpointing in the slave, remove logic inside the 
> master.-
> 4) Drop the flag from the SlaveInfo struct (Will require a deprecation cycle).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2742) Architecture doc on global resources

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-2742:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Architecture doc on global resources
> 
>
> Key: MESOS-2742
> URL: https://issues.apache.org/jira/browse/MESOS-2742
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>Assignee: Joerg Schad
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4364) Add roles validation code to master

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4364:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Add roles validation code to master
> ---
>
> Key: MESOS-4364
> URL: https://issues.apache.org/jira/browse/MESOS-4364
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> A {{FrameworkInfo}} can only have one of role or roles. A natural location 
> for this appears to be under {{validation::operation::validate}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3943) Support dynamic weight in allocator

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3943:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> Support dynamic weight in allocator
> ---
>
> Key: MESOS-3943
> URL: https://issues.apache.org/jira/browse/MESOS-3943
> Project: Mesos
>  Issue Type: Task
>Reporter: James Wang
>Assignee: Yongqiao Wang
>
> This JIRA will focus on update the allocator API to support weight update of 
> a role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3193) Implement AppC image discovery.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3193:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Implement AppC image discovery.
> ---
>
> Key: MESOS-3193
> URL: https://issues.apache.org/jira/browse/MESOS-3193
> Project: Mesos
>  Issue Type: Task
>Reporter: Yan Xu
>Assignee: Jojy Varghese
>  Labels: mesosphere, twitter, unified-containerizer-mvp
>
> Appc spec specifies two image discovery mechanisms: simple and meta 
> discovery. We need to have an abstraction for image discovery in AppcStore. 
> For MVP, we can implement the simple discovery first.
> https://reviews.apache.org/r/34139/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3871) Document libprocess message delivery semantics

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3871:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> Document libprocess message delivery semantics
> --
>
> Key: MESOS-3871
> URL: https://issues.apache.org/jira/browse/MESOS-3871
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, libprocess
>Reporter: Neil Conway
>Assignee: Benjamin Hindman
>Priority: Minor
>  Labels: mesosphere
>
> What are the semantics of {{send()}} in libprocess? Specifically, does 
> libprocess guarantee that messages will not be dropped, reordered, or 
> duplicated? These are important properties to understand when building 
> software on top of libprocess.
> Clearly message drops are allowed. Message reordering _appears_ to be 
> allowed, although it should only happen in corner cases (see MESOS-3870). 
> Duplicate message delivery probably can't happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3273) EventCall Test Framework is flaky

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3273:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: flaky-test, mesosphere, tech-debt
> Attachments: asan.log
>
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status
> I0813 19:55:17.181970 26126 master.cpp:378] Master 
> 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 
> 172.17.2.10:60249
> I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: 
> --acls="permissive: false
> register_frameworks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   roles {
> type: SOME
> values: "*"
>   }
> }
> run_tasks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   users {
> type: SOME
> values: "mesos"
>   }
> }
> " --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" 
> --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" 
> --zk_session_timeout="10secs"
> I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated 
> frameworks to register
> I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated 
> slaves to register
> I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials'
> W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials 
> file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. 
> It is recommended that your credentials file is NOT accessible by others.
> I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0813 19:55:17.184306 26126 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0813 19:55:17.184661 26126

[jira] [Updated] (MESOS-4367) Add tracking of the role a Resource was offered for

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4367:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Add tracking of the role a Resource was offered for
> ---
>
> Key: MESOS-4367
> URL: https://issues.apache.org/jira/browse/MESOS-4367
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> If a framework can have multiple roles, we need a way to identify for which 
> of the framework's role a resource was offered for (e.g., for resource 
> recovery and reconciliation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4421) Document that /reserve, /create-volumes endpoints can return misleading "success"

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4421:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Document that /reserve, /create-volumes endpoints can return misleading 
> "success"
> -
>
> Key: MESOS-4421
> URL: https://issues.apache.org/jira/browse/MESOS-4421
> Project: Mesos
>  Issue Type: Task
>  Components: documentation, master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: documentation, endpoint, mesosphere, persistent-volumes, 
> reservations
>
> The docs for the {{/reserve}} endpoint say:
> {noformat}
> 200 OK: Success (the requested resources have been reserved).
> {noformat}
> This is not true: the master returns {{200}} when the request has been 
> validated and a {{CheckpointResourcesMessage}} has been sent to the agent, 
> but the master does not attempt to verify that the message has been received 
> or that the agent successfully checkpointed. Same behavior applies to 
> {{/unreserve}}, {{/create-volumes}}, and {{/destroy-volumes}}.
> We should _either_:
> 1. Accurately document what {{200}} return code means.
> 2. Change the implementation to wait for the agent's next checkpoint to 
> succeed (and to include the effect of the operation) before returning success 
> to the HTTP client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3568) The State (/state) endpoint should be documented

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3568:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> The State (/state) endpoint should be documented
> 
>
> Key: MESOS-3568
> URL: https://issues.apache.org/jira/browse/MESOS-3568
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, master
>Reporter: James Fisher
>Assignee: Kevin Klues
>  Labels: documentation, mesosphere, newbie, tech-debt
>
> Our tests are using a resource `/state.json` hosted by the Mesos master. I 
> have searched for the documentation for this resource but have been unable to 
> find anything.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4366) Migrate all existing uses of FrameworkInfo.role to FrameworkInfo.roles

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4366:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Migrate all existing uses of FrameworkInfo.role to FrameworkInfo.roles
> --
>
> Key: MESOS-4366
> URL: https://issues.apache.org/jira/browse/MESOS-4366
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, master, slave
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3854) Finalize design for generalized Authorizer interface

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3854:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Finalize design for generalized Authorizer interface
> 
>
> Key: MESOS-3854
> URL: https://issues.apache.org/jira/browse/MESOS-3854
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Bernd Mathiske
>Assignee: Alexander Rojas
>  Labels: authorization, mesosphere
>
> Finalize the structure the interface and achieve consensus on the design doc 
> proposed in MESOS-2949.
> https://docs.google.com/document/d/1-XARWJFUq0r_TgRHz_472NvLZNjbqE4G8c2JL44OSMQ/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3763) Need for http::put request method

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3763:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Need for http::put request method
> -
>
> Key: MESOS-3763
> URL: https://issues.apache.org/jira/browse/MESOS-3763
> Project: Mesos
>  Issue Type: Task
>Reporter: Joerg Schad
>Assignee: Yongqiao Wang
>Priority: Minor
>  Labels: mesosphere
>
> As we decided to create a more restful api for managing Quota request.
> Therefore we also want to use the HTTP put request and hence need to enable 
> the libprocess/http to send put request besides get and post requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3570) Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3570:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Make Scheduler Library use HTTP Pipelining Abstraction in Libprocess
> 
>
> Key: MESOS-3570
> URL: https://issues.apache.org/jira/browse/MESOS-3570
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Vinod Kone
>  Labels: mesosphere, newbie
>
> Currently, the scheduler library sends calls in order by chaining them and 
> sending them only when it has received a response for the earlier call. This 
> was done because there was no HTTP Pipelining abstraction in Libprocess 
> {{process::post}}.
> However once {{MESOS-3332}} is resolved, we should be now able to use the new 
> abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2179) ExamplesTest.NoExecutorFramework terminates with segmentation fault

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-2179:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> ExamplesTest.NoExecutorFramework terminates with segmentation fault
> ---
>
> Key: MESOS-2179
> URL: https://issues.apache.org/jira/browse/MESOS-2179
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
> Environment: Centos7 inside Docker
> Mesos master commit: 49d4553a0645624179f17ed6da8d2443e88998bf
>Reporter: Cody Maloney
>Assignee: Joerg Schad
>Priority: Minor
>  Labels: flaky, mesosphere
>
> {code}
> [ RUN  ] ExamplesTest.NoExecutorFramework
> ../../src/tests/script.cpp:83: Failure
> Failed
> no_executor_framework_test.sh terminated with signal Segmentation fault
> [  FAILED  ] ExamplesTest.NoExecutorFramework (2543 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4390) Shared Volumes Design Doc

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4390:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Shared Volumes Design Doc
> -
>
> Key: MESOS-4390
> URL: https://issues.apache.org/jira/browse/MESOS-4390
> Project: Mesos
>  Issue Type: Task
>Reporter: Adam B
>Assignee: Anindya Sinha
>  Labels: mesosphere
>
> Review & Approve design doc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4499) Docker provisioner store should reuse existing layers in the cache.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4499:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Docker provisioner store should reuse existing layers in the cache.
> ---
>
> Key: MESOS-4499
> URL: https://issues.apache.org/jira/browse/MESOS-4499
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
>
> Currently, the docker provisioner store will download all the layers 
> associated with an image if the image is not found locally, even though some 
> layers of it might already exist in the cache.
> This is problematic because anytime a user deploys a new image, Mesos will 
> fetch all layers of that new image, even though most of the layers are 
> already cached locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4004) Support default entrypoint and command runtime config in Mesos containerizer

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4004:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Support default entrypoint and command runtime config in Mesos containerizer
> 
>
> Key: MESOS-4004
> URL: https://issues.apache.org/jira/browse/MESOS-4004
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Gilbert Song
>  Labels: mesosphere, unified-containerizer-mvp
>
> We need to use the entrypoint and command runtime configuration returned from 
> image to be used in Mesos containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4439) Fix appc CachedImage image validation

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4439:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Fix appc CachedImage image validation
> -
>
> Key: MESOS-4439
> URL: https://issues.apache.org/jira/browse/MESOS-4439
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere, unified-containerizer-mvp
>
> Currently image validation is done assuming that the image's filename will 
> have  digest (SHA-512) information. This is not part of the spec
> (https://github.com/appc/spec/blob/master/spec/discovery.md).
> 
> The spec specifies the tuple  as unique identifier 
> for  discovering an image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4233) Logging is too verbose for sysadmins / syslog

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4233:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> Logging is too verbose for sysadmins / syslog
> -
>
> Key: MESOS-4233
> URL: https://issues.apache.org/jira/browse/MESOS-4233
> Project: Mesos
>  Issue Type: Epic
>Reporter: Cody Maloney
>Assignee: Kapil Arya
>  Labels: mesosphere
> Attachments: giant_port_range_logging
>
>
> Currently mesos logs a lot. When launching a thousand tasks in the space of 
> 10 seconds it will print tens of thousands of log lines, overwhelming syslog 
> (there is a max rate at which a process can send stuff over a unix socket) 
> and not giving useful information to a sysadmin who cares about just the 
> high-level activity and when something goes wrong.
> Note mesos also blocks writing to its log locations, so when writing a lot of 
> log messages, it can fill up the write buffer in the kernel, and be suspended 
> until the syslog agent catches up reading from the socket (GLOG does a 
> blocking fwrite to stderr). GLOG also has a big mutex around logging so only 
> one thing logs at a time.
> While for "internal debugging" it is useful to see things like "message went 
> from internal compoent x to internal component y", from a sysadmin 
> perspective I only care about the high level actions taken (launched task for 
> framework x), sent offer to framework y, got task failed from host z. Note 
> those are what I'd expect at the "INFO" level. At the "WARNING" level I'd 
> expect very little to be logged / almost nothing in normal operation. Just 
> things like "WARN: Repliacted log write took longer than expected". WARN 
> would also get things like backtraces on crashes and abnormal exits / abort.
> When trying to launch 3k+ tasks inside a second, mesos logging currently 
> overwhelms syslog with 100k+ messages, many of which are thousands of bytes. 
> Sysadmins expect to be able to use syslog to monitor basic events in their 
> system. This is too much.
> We can keep logging the messages to files, but the logging to stderr needs to 
> be reduced significantly (stderr gets picked up and forwarded to syslog / 
> central aggregation).
> What I would like is if I can set the stderr logging level to be different / 
> independent from the file logging level (Syslog giving the "sysadmin" 
> aggregated overview, files useful for debugging in depth what happened in a 
> cluster). A lot of what mesos currently logs at info is really debugging info 
> / should show up as debug log level.
> Some samples of mesos logging a lot more than a sysadmin would want / expect 
> are attached, and some are below:
>  - Every task gets printed multiple times for a basic launch:
> {noformat}
> Dec 15 22:58:30 ip-10-0-7-60.us-west-2.compute.internal mesos-master[1311]: 
> I1215 22:58:29.382644  1315 master.cpp:3248] Launching task 
> envy.5b19a713-a37f-11e5-8b3e-0251692d6109 of framework 
> 5178f46d-71d6-422f-922c-5bbe82dff9cc- (marathon)
> Dec 15 22:58:30 ip-10-0-7-60.us-west-2.compute.internal mesos-master[1311]: 
> I1215 22:58:29.382925  1315 master.hpp:176] Adding task 
> envy.5b1958f2-a37f-11e5-8b3e-0251692d6109 with resources cpus(​*):0.0001; 
> mem(*​):16; ports(*):[14047-14047]
> {noformat}
>  - Every task status update prints many log lines, successful ones are part 
> of normal operation and maybe should be logged at info / debug levels, but 
> not to a sysadmin (Just show when things fail, and maybe aggregate counters 
> to tell of the volume of working)
>  - No log messagse should be really big / more than 1k characters (Would 
> prevent the giant port list attached, make that easily discoverable / bug 
> filable / fixable) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4344) Allow operators to assign net_cls major handles to mesos agents

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4344:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Allow operators to assign net_cls major handles to mesos agents
> ---
>
> Key: MESOS-4344
> URL: https://issues.apache.org/jira/browse/MESOS-4344
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: container, mesosphere
>
> The net_cls cgroup associates a 16-bit major and 16-bit minor network handle 
> to packets originating from tasks associated with a specific net_cls cgroup. 
> In mesos we need to give the operator the ability to fix the 16-bit major 
> handle used in an agent (the minor handle will be allocated by the agent. See 
> MESOS-4345). Fixing the parent handle on the agent allows operators to 
> install default firewall rules using the parent handle to enforce a default 
> policy (say DENY ALL) for all container traffic till the container is 
> allocated a minor handle. 
> A simple way to achieve this requirement is to pass the major handle as a 
> flag to the agent at startup. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4053) MemoryPressureMesosTest tests fail on CentOS 6.6

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4053:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> MemoryPressureMesosTest tests fail on CentOS 6.6
> 
>
> Key: MESOS-4053
> URL: https://issues.apache.org/jira/browse/MESOS-4053
> Project: Mesos
>  Issue Type: Bug
> Environment: CentOS 6.6
>Reporter: Greg Mann
>Assignee: Benjamin Hindman
>  Labels: mesosphere, test-failure
>
> {{MemoryPressureMesosTest.CGROUPS_ROOT_Statistics}} and 
> {{MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery}} fail on CentOS 6.6. It 
> seems that mounted cgroups are not properly cleaned up after previous tests, 
> so multiple hierarchies are detected and thus an error is produced:
> {code}
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (12 ms)
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery
> ../../src/tests/mesos.cpp:849: Failure
> Value of: _baseHierarchy.get()
>   Actual: "/cgroup"
> Expected: baseHierarchy
> Which is: "/tmp/mesos_test_cgroup"
> -
> Multiple cgroups base hierarchies detected:
>   '/tmp/mesos_test_cgroup'
>   '/cgroup'
> Mesos does not support multiple cgroups base hierarchies.
> Please unmount the corresponding (or all) subsystems.
> -
> ../../src/tests/mesos.cpp:932: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery (7 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-191) Add support for multiple disk resources

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-191:

Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Epic
>Reporter: Vinod Kone
>Assignee: Joris Van Remoortere
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 
> Official Design Doc: 
> https://docs.google.com/document/d/1syPxygVNEHjG6FoyqslnpUGgNpYKU9QzKBuV2yKmjfQ/edit#heading=h.4fzj9sl24cwy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4005) Support workdir runtime configuration from image

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4005:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Support workdir runtime configuration from image 
> -
>
> Key: MESOS-4005
> URL: https://issues.apache.org/jira/browse/MESOS-4005
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Gilbert Song
>  Labels: mesosphere, unified-containerizer-mvp
>
> We need to support workdir runtime configuration returned from image such as 
> Dockerfile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4564) Separate Appc protobuf messages to its own file.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4564:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Separate Appc protobuf messages to its own file.
> 
>
> Key: MESOS-4564
> URL: https://issues.apache.org/jira/browse/MESOS-4564
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere, unified-containerizer-mvp
>
> It would be cleaner to keep the Appc protobuf messages separate from other 
> mesos messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4383) Support docker runtime configuration env var from image.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4383:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Support docker runtime configuration env var from image.
> 
>
> Key: MESOS-4383
> URL: https://issues.apache.org/jira/browse/MESOS-4383
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: mesosphere, unified-containerizer-mvp
>
> We need to support env var configuration returned from docker image in mesos 
> containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4200) Test case(s) for weights + allocation behavior

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4200:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Test case(s) for weights + allocation behavior
> --
>
> Key: MESOS-4200
> URL: https://issues.apache.org/jira/browse/MESOS-4200
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, test
>Reporter: Neil Conway
>Assignee: Yongqiao Wang
>  Labels: mesosphere, test, weight
>
> As far as I can see, we currently have NO test cases for behavior when 
> weights are defined.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4285) Mesos command task doesn't support volumes with image

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4285:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> Mesos command task doesn't support volumes with image
> -
>
> Key: MESOS-4285
> URL: https://issues.apache.org/jira/browse/MESOS-4285
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Timothy Chen
>  Labels: mesosphere, unified-containerizer-mvp
>
> Currently volumes are stripped when an image is specified running a command 
> task with Mesos containerizer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4365) Add internal migration from role to roles to master

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4365:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Add internal migration from role to roles to master
> ---
>
> Key: MESOS-4365
> URL: https://issues.apache.org/jira/browse/MESOS-4365
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> If only the {{role}} field is given, add it as single entry to {{roles}}. Add 
> a note to {{CHANGELOG}}/release notes on deprecation of the existing {{role}} 
> field. File a JIRA issue for removal of that migration code once the 
> deprecation cycle is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4479) Implement reservation labels

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4479:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Implement reservation labels
> 
>
> Key: MESOS-4479
> URL: https://issues.apache.org/jira/browse/MESOS-4479
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: labels, mesosphere, reservations
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4377) Document units associated with resource types

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4377:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Document units associated with resource types
> -
>
> Key: MESOS-4377
> URL: https://issues.apache.org/jira/browse/MESOS-4377
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: documentation, mesosphere
>
> We should document the units associated with memory and disk resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4363) Add a roles field to FrameworkInfo

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4363:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Add a roles field to FrameworkInfo
> --
>
> Key: MESOS-4363
> URL: https://issues.apache.org/jira/browse/MESOS-4363
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, master
>Reporter: Benjamin Bannier
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> To represent multiple roles per framework a new repeated string field for 
> roles is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4368) Make HierarchicalAllocatorProcess set a Resource's active role during allocation

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4368:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Make HierarchicalAllocatorProcess set a Resource's active role during 
> allocation
> 
>
> Key: MESOS-4368
> URL: https://issues.apache.org/jira/browse/MESOS-4368
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Bannier
>Assignee: Jan Schlicht
>  Labels: mesosphere
>
> The concrete implementation here depends on the implementation strategy used 
> to solve MESOS-4367.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4214) Introduce HTTP endpoint /weights for updating weight

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4214:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> Introduce HTTP endpoint /weights for updating weight
> 
>
> Key: MESOS-4214
> URL: https://issues.apache.org/jira/browse/MESOS-4214
> Project: Mesos
>  Issue Type: Task
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4291) fs::enter(rootfs) does not work if 'rootfs' is read only.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4291:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> fs::enter(rootfs) does not work if 'rootfs' is read only.
> -
>
> Key: MESOS-4291
> URL: https://issues.apache.org/jira/browse/MESOS-4291
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere, unified-containerizer-mvp
>
> I noticed this when I was testing the unified containerizer with the bind 
> mount backend and no volumes.
> The current implementation of fs::enter will put the old root under 
> /tmp/._old_root_.XX in the new rootfs. It assumes that /tmp is writable 
> in the new rootfs, but this might not be true, especially if the bind mount 
> backend is used.
> To solve the problem, what we can do is to mount tmpfs to /tmp in the new 
> rootfs and umount it after pivot_root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4333) Refactor Appc provisioner tests

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4333:
-
Sprint: Mesosphere Sprint 26, Mesosphere Sprint 27, Mesosphere Sprint 28  
(was: Mesosphere Sprint 26, Mesosphere Sprint 27)

> Refactor Appc provisioner tests  
> -
>
> Key: MESOS-4333
> URL: https://issues.apache.org/jira/browse/MESOS-4333
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere
>
> Current tests can be refactored so that we can reuse some common tasks like 
> test image creation. This will benefit future tests like appc image puller 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4345) Implement a network-handle manager for net_cls cgroup subsystem

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4345:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> Implement a network-handle manager for net_cls cgroup subsystem
> ---
>
> Key: MESOS-4345
> URL: https://issues.apache.org/jira/browse/MESOS-4345
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: containerizer, containers, mesosphere
>
> As part of implementing the net_cls cgroup isolator we need a mechanism to 
> manage the minor handles that will be allocated to containers when they are 
> associated with a net_cls cgroup. The network-handle manager needs to provide 
> the following functionality:
> a) During normal operation keep track of the free and allocated network 
> handles. There can be a total of 64K such network handles.
> b) On startup, learn the allocated network handle by walking the net_cls 
> cgroup tree for mesos and build a map of free network handles available to 
> the agent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4261) Remove docker auth server flag

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4261:
-
Sprint: Mesosphere Sprint 25, Mesosphere Sprint 26, Mesosphere Sprint 27, 
Mesosphere Sprint 28  (was: Mesosphere Sprint 25, Mesosphere Sprint 26, 
Mesosphere Sprint 27)

> Remove docker auth server flag
> --
>
> Key: MESOS-4261
> URL: https://issues.apache.org/jira/browse/MESOS-4261
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>Assignee: Jie Yu
>  Labels: mesosphere, unified-containerizer-mvp
>
> We currently use a configured docker auth server from a slave flag to get 
> token auth for docker registry. However this doesn't work for private 
> registries as docker registry supports sending down the correct auth server 
> to contact.
> We should remove docker auth server flag completely and ask the docker 
> registry for auth server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4552) There is currently no way to get at the strings in the global process::help object programmatically.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4552:
-
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 28  (was: Mesosphere Sprint 
27)

> There is currently no way to get at the strings in the global process::help 
> object programmatically.
> 
>
> Key: MESOS-4552
> URL: https://issues.apache.org/jira/browse/MESOS-4552
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Minor
>  Labels: mesosphere
>
> There is currently no way to extract the help strings from the help process 
> once they have been installed into it.  The only way to get at them is to 
> visit the http endpoint they are associated with and pull it from there.  
> Moreover, there is no way to uninstall a string from this process if a route 
> is ever taken offline.  We need support for programmatically getting/removing 
> strings from the help process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2017) Segfault with "Pure virtual method called" when tests fail

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-2017:
-
Sprint: Twitter Mesos Q4 Sprint 3, Mesosphere Sprint 27, Mesosphere Sprint 
28  (was: Twitter Mesos Q4 Sprint 3, Mesosphere Sprint 27)

> Segfault with "Pure virtual method called" when tests fail
> --
>
> Key: MESOS-2017
> URL: https://issues.apache.org/jira/browse/MESOS-2017
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Yan Xu
>Assignee: Kevin Klues
>  Labels: mesosphere, tests
>
> The most recent one:
> {noformat:title=DRFAllocatorTest.DRFAllocatorProcess}
> [ RUN  ] DRFAllocatorTest.DRFAllocatorProcess
> Using temporary directory '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j'
> I1030 05:55:06.934813 24459 leveldb.cpp:176] Opened db in 3.175202ms
> I1030 05:55:06.935925 24459 leveldb.cpp:183] Compacted db in 1.077924ms
> I1030 05:55:06.935976 24459 leveldb.cpp:198] Created db iterator in 16460ns
> I1030 05:55:06.935995 24459 leveldb.cpp:204] Seeked to beginning of db in 
> 2018ns
> I1030 05:55:06.936005 24459 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 335ns
> I1030 05:55:06.936039 24459 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1030 05:55:06.936705 24480 recover.cpp:437] Starting replica recovery
> I1030 05:55:06.937023 24480 recover.cpp:463] Replica is in EMPTY status
> I1030 05:55:06.938158 24475 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I1030 05:55:06.938859 24482 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I1030 05:55:06.939486 24474 recover.cpp:554] Updating replica status to 
> STARTING
> I1030 05:55:06.940249 24489 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 591981ns
> I1030 05:55:06.940274 24489 replica.cpp:320] Persisted replica status to 
> STARTING
> I1030 05:55:06.940752 24481 recover.cpp:463] Replica is in STARTING status
> I1030 05:55:06.940820 24489 master.cpp:312] Master 
> 20141030-055506-3142697795-40429-24459 (pomona.apache.org) started on 
> 67.195.81.187:40429
> I1030 05:55:06.940871 24489 master.cpp:358] Master only allowing 
> authenticated frameworks to register
> I1030 05:55:06.940891 24489 master.cpp:363] Master only allowing 
> authenticated slaves to register
> I1030 05:55:06.940908 24489 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/DRFAllocatorTest_DRFAllocatorProcess_BI905j/credentials'
> I1030 05:55:06.941215 24489 master.cpp:392] Authorization enabled
> I1030 05:55:06.941751 24475 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I1030 05:55:06.942227 24474 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I1030 05:55:06.942401 24476 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.187:40429
> I1030 05:55:06.942895 24483 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I1030 05:55:06.943035 24474 master.cpp:1242] The newly elected leader is 
> master@67.195.81.187:40429 with id 20141030-055506-3142697795-40429-24459
> I1030 05:55:06.943063 24474 master.cpp:1255] Elected as the leading master!
> I1030 05:55:06.943079 24474 master.cpp:1073] Recovering from registrar
> I1030 05:55:06.943313 24480 registrar.cpp:313] Recovering registrar
> I1030 05:55:06.943455 24475 recover.cpp:554] Updating replica status to VOTING
> I1030 05:55:06.944144 24474 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 536365ns
> I1030 05:55:06.944172 24474 replica.cpp:320] Persisted replica status to 
> VOTING
> I1030 05:55:06.944355 24489 recover.cpp:568] Successfully joined the Paxos 
> group
> I1030 05:55:06.944576 24489 recover.cpp:452] Recover process terminated
> I1030 05:55:06.945155 24486 log.cpp:656] Attempting to start the writer
> I1030 05:55:06.947013 24473 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I1030 05:55:06.947854 24473 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 806463ns
> I1030 05:55:06.947883 24473 replica.cpp:342] Persisted promised to 1
> I1030 05:55:06.948547 24481 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I1030 05:55:06.950269 24479 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I1030 05:55:06.950933 24479 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 603843ns
> I1030 05:55:06.950961 24479 replica.cpp:676] Persisted action at 0
> I1030 05:55:06.952180 24476 replica.cpp:508] Replica received write request 
> for position 0
> I1030 

[jira] [Updated] (MESOS-4066) Expose when agent is recovering in the agent's /state.json endpoint.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4066:
-
  Sprint: Mesosphere Sprint 28
Story Points: 3

> Expose when agent is recovering in the agent's /state.json endpoint.
> 
>
> Key: MESOS-4066
> URL: https://issues.apache.org/jira/browse/MESOS-4066
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Benjamin Mahler
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Currently when a user is hitting /state.json on the agent, it may return 
> partial state if the agent has failed over and is recovering. There is 
> currently no clear way to tell if this is the case when looking at a 
> response, so the user may incorrectly interpret the agent as being empty of 
> tasks.
> We could consider exposing the 'state' enum of the agent in the endpoint:
> {code}
>   enum State
>   {
> RECOVERING,   // Slave is doing recovery.
> DISCONNECTED, // Slave is not connected to the master.
> RUNNING,  // Slave has (re-)registered.
> TERMINATING,  // Slave is shutting down.
>   } state;
> {code}
> This may be a bit tricky to maintain as far as backwards-compatibility of the 
> endpoint, if we were to alter this enum.
> Exposing this would allow users to be more informed about the state of the 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2971) Implement OverlayFS based provisioner backend

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-2971:
-
Sprint:   (was: Mesosphere Sprint 27)

> Implement OverlayFS based provisioner backend
> -
>
> Key: MESOS-2971
> URL: https://issues.apache.org/jira/browse/MESOS-2971
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Timothy Chen
>Assignee: Mei Wan
>  Labels: mesosphere, twitter, unified-containerizer-mvp
>
> Part of the image provisioning process is to call a backend to create a root 
> filesystem based on the image on disk layout.
> The problem with the copy backend is that it's both waste of IO and space, 
> and bind only can deal with one layer.
> Overlayfs backend allows us to utilize the filesystem to merge multiple 
> filesystems into one efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4544) Propose design doc for agent partitioning behavior

2016-02-01 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4544:
---
Shepherd: Vinod Kone

> Propose design doc for agent partitioning behavior
> --
>
> Key: MESOS-4544
> URL: https://issues.apache.org/jira/browse/MESOS-4544
> Project: Mesos
>  Issue Type: Task
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1471) Document replicated log design/internals

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-1471:
-
Sprint: Q3 Sprint 1  (was: Q3 Sprint 1, Mesosphere Sprint 28)

> Document replicated log design/internals
> 
>
> Key: MESOS-1471
> URL: https://issues.apache.org/jira/browse/MESOS-1471
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation, replicated log
>Reporter: Benjamin Mahler
>Assignee: Neil Conway
>  Labels: documentation, mesosphere
>
> The replicated log could benefit from some documentation. In particular, how 
> does it work? What do operators need to know? Possibly there is some overlap 
> with our future maintenance documentation in MESOS-1470.
> I believe [~jieyu] has some unpublished work that could be leveraged here!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3413) Docker containerizer does not symlink persistent volumes into sandbox

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3413:
-
Shepherd: Jie Yu

> Docker containerizer does not symlink persistent volumes into sandbox
> -
>
> Key: MESOS-3413
> URL: https://issues.apache.org/jira/browse/MESOS-3413
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker, slave
>Affects Versions: 0.23.0
>Reporter: Max Neunhöffer
>Assignee: Timothy Chen
>  Labels: docker, mesosphere, persistent-volumes
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For the ArangoDB framework I am trying to use the persistent primitives. 
> nearly all is working, but I am missing a crucial piece at the end: I have 
> successfully created a persistent disk resource and have set the persistence 
> and volume information in the DiskInfo message. However, I do not see any way 
> to find out what directory on the host the mesos slave has reserved for us. I 
> know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we 
> have no way to query this information anywhere. The docker containerizer does 
> not automatically mount this directory into our docker container, or symlinks 
> it into our sandbox. Therefore, I have essentially no access to it. Note that 
> the mesos containerizer (which I cannot use for other reasons) seems to 
> create a symlink in the sandbox to the actual path for the persistent volume. 
> With that, I could mount the volume into our docker container and all would 
> be well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4570) DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4570:
-
Sprint: Mesosphere Sprint 28

> DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.
> -
>
> Key: MESOS-4570
> URL: https://issues.apache.org/jira/browse/MESOS-4570
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 8
>Reporter: Till Toenshoff
>Assignee: Gilbert Song
>  Labels: flaky-test
>
> {noformat}
> ../configure --enable-ssl --enable-libevent && make check
> {noformat}
> {noformat}
> --gtest_repeat=-1 --gtest_break_on_failure  
> --gtest_filter=DockerFetcherPluginTest.INTERNET_CURL_FetchImage 
> {noformat}
> Failed at the 22nd run. 
> {noformat}
> [ RUN  ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage
> ../../src/tests/uri_fetcher_tests.cpp:276: Failure
> Failed to wait 15secs for fetcher.get()->fetch(uri, dir)
> *** Aborted at 1454207653 (unix time) try "date -d @1454207653" if you are 
> using GNU date ***
> PC: @  0x167023a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 19868 (TID 0x7f500fc877c0) from PID 0; 
> stack trace: ***
> @ 0x7f5008f368d0 (unknown)
> @  0x167023a testing::UnitTest::AddTestPartResult()
> @  0x1664c73 testing::internal::AssertHelper::operator=()
> @  0x146ac6f 
> mesos::internal::tests::DockerFetcherPluginTest_INTERNET_CURL_FetchImage_Test::TestBody()
> @  0x168dc70 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x1688cc8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x166a013 testing::Test::Run()
> @  0x166a7a1 testing::TestInfo::Run()
> @  0x166addc testing::TestCase::Run()
> @  0x167172b testing::internal::UnitTestImpl::RunAllTests()
> @  0x168e8ff 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x168981e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x167045b testing::UnitTest::Run()
> @   0xe2d476 RUN_ALL_TESTS()
> @   0xe2d08c main
> @ 0x7f5008b9fb45 (unknown)
> @   0x9c6bf9 (unknown)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4570) DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4570:
-
Shepherd: Jie Yu
Assignee: Gilbert Song
Story Points: 1

> DockerFetcherPluginTest.INTERNET_CURL_FetchImage seems flaky.
> -
>
> Key: MESOS-4570
> URL: https://issues.apache.org/jira/browse/MESOS-4570
> Project: Mesos
>  Issue Type: Bug
> Environment: Debian 8
>Reporter: Till Toenshoff
>Assignee: Gilbert Song
>  Labels: flaky-test
>
> {noformat}
> ../configure --enable-ssl --enable-libevent && make check
> {noformat}
> {noformat}
> --gtest_repeat=-1 --gtest_break_on_failure  
> --gtest_filter=DockerFetcherPluginTest.INTERNET_CURL_FetchImage 
> {noformat}
> Failed at the 22nd run. 
> {noformat}
> [ RUN  ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage
> ../../src/tests/uri_fetcher_tests.cpp:276: Failure
> Failed to wait 15secs for fetcher.get()->fetch(uri, dir)
> *** Aborted at 1454207653 (unix time) try "date -d @1454207653" if you are 
> using GNU date ***
> PC: @  0x167023a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 19868 (TID 0x7f500fc877c0) from PID 0; 
> stack trace: ***
> @ 0x7f5008f368d0 (unknown)
> @  0x167023a testing::UnitTest::AddTestPartResult()
> @  0x1664c73 testing::internal::AssertHelper::operator=()
> @  0x146ac6f 
> mesos::internal::tests::DockerFetcherPluginTest_INTERNET_CURL_FetchImage_Test::TestBody()
> @  0x168dc70 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x1688cc8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x166a013 testing::Test::Run()
> @  0x166a7a1 testing::TestInfo::Run()
> @  0x166addc testing::TestCase::Run()
> @  0x167172b testing::internal::UnitTestImpl::RunAllTests()
> @  0x168e8ff 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x168981e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x167045b testing::UnitTest::Run()
> @   0xe2d476 RUN_ALL_TESTS()
> @   0xe2d08c main
> @ 0x7f5008b9fb45 (unknown)
> @   0x9c6bf9 (unknown)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4575) Fix Appc image caching to share with image fetcher

2016-02-01 Thread Jojy Varghese (JIRA)
Jojy Varghese created MESOS-4575:


 Summary: Fix Appc image caching to share with image fetcher
 Key: MESOS-4575
 URL: https://issues.apache.org/jira/browse/MESOS-4575
 Project: Mesos
  Issue Type: Improvement
Reporter: Jojy Varghese
Assignee: Jojy Varghese


As Appc image fetcher is being developed, Image cache needs to be shared 
between store and the image fetcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4066) Expose when agent is recovering in the agent's /state.json endpoint.

2016-02-01 Thread Joerg Schad (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126896#comment-15126896
 ] 

Joerg Schad commented on MESOS-4066:


Short question hasn't state.json deprecated for /state 
(https://github.com/apache/mesos/blob/master/docs/upgrades.md#upgrading-from-024x-to-025x)?

> Expose when agent is recovering in the agent's /state.json endpoint.
> 
>
> Key: MESOS-4066
> URL: https://issues.apache.org/jira/browse/MESOS-4066
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Benjamin Mahler
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Currently when a user is hitting /state.json on the agent, it may return 
> partial state if the agent has failed over and is recovering. There is 
> currently no clear way to tell if this is the case when looking at a 
> response, so the user may incorrectly interpret the agent as being empty of 
> tasks.
> We could consider exposing the 'state' enum of the agent in the endpoint:
> {code}
>   enum State
>   {
> RECOVERING,   // Slave is doing recovery.
> DISCONNECTED, // Slave is not connected to the master.
> RUNNING,  // Slave has (re-)registered.
> TERMINATING,  // Slave is shutting down.
>   } state;
> {code}
> This may be a bit tricky to maintain as far as backwards-compatibility of the 
> endpoint, if we were to alter this enum.
> Exposing this would allow users to be more informed about the state of the 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4066) Expose when agent is recovering in the agent's /state.json endpoint.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4066:
-
Shepherd: Benjamin Mahler
Assignee: Vinod Kone

> Expose when agent is recovering in the agent's /state.json endpoint.
> 
>
> Key: MESOS-4066
> URL: https://issues.apache.org/jira/browse/MESOS-4066
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Benjamin Mahler
>Assignee: Vinod Kone
>  Labels: mesosphere
>
> Currently when a user is hitting /state.json on the agent, it may return 
> partial state if the agent has failed over and is recovering. There is 
> currently no clear way to tell if this is the case when looking at a 
> response, so the user may incorrectly interpret the agent as being empty of 
> tasks.
> We could consider exposing the 'state' enum of the agent in the endpoint:
> {code}
>   enum State
>   {
> RECOVERING,   // Slave is doing recovery.
> DISCONNECTED, // Slave is not connected to the master.
> RUNNING,  // Slave has (re-)registered.
> TERMINATING,  // Slave is shutting down.
>   } state;
> {code}
> This may be a bit tricky to maintain as far as backwards-compatibility of the 
> endpoint, if we were to alter this enum.
> Exposing this would allow users to be more informed about the state of the 
> agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4554) Investigate test suite crashes after ZK socket disconnections.

2016-02-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4554:
--
Sprint:   (was: Mesosphere Sprint 28)

> Investigate test suite crashes after ZK socket disconnections.
> --
>
> Key: MESOS-4554
> URL: https://issues.apache.org/jira/browse/MESOS-4554
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: flaky-test, mesosphere
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/1579/console
> The test crashed with the following logs:
> {code}
> [ RUN  ] ContentType/ExecutorHttpApiTest.DefaultAccept/1
> I0129 02:00:35.137161 31926 leveldb.cpp:174] Opened db in 118.902333ms
> I0129 02:00:35.187021 31926 leveldb.cpp:181] Compacted db in 49.836241ms
> I0129 02:00:35.187088 31926 leveldb.cpp:196] Created db iterator in 33825ns
> I0129 02:00:35.187109 31926 leveldb.cpp:202] Seeked to beginning of db in 
> 7965ns
> I0129 02:00:35.187121 31926 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 6350ns
> I0129 02:00:35.187165 31926 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0129 02:00:35.188433 31950 recover.cpp:447] Starting replica recovery
> I0129 02:00:35.188796 31950 recover.cpp:473] Replica is in EMPTY status
> I0129 02:00:35.190021 31949 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (11817)@172.17.0.3:60904
> I0129 02:00:35.190569 31958 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0129 02:00:35.190994 31959 recover.cpp:564] Updating replica status to 
> STARTING
> I0129 02:00:35.191522 31953 master.cpp:374] Master 
> 823f2212-bf28-4dd6-959d-796029d32afb (90665f991b70) started on 
> 172.17.0.3:60904
> I0129 02:00:35.191640 31953 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/B9O6zq/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/B9O6zq/master" --zk_session_timeout="10secs"
> I0129 02:00:35.191926 31953 master.cpp:421] Master only allowing 
> authenticated frameworks to register
> I0129 02:00:35.191936 31953 master.cpp:426] Master only allowing 
> authenticated slaves to register
> I0129 02:00:35.191943 31953 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/B9O6zq/credentials'
> I0129 02:00:35.192229 31953 master.cpp:466] Using default 'crammd5' 
> authenticator
> I0129 02:00:35.192366 31953 master.cpp:535] Using default 'basic' HTTP 
> authenticator
> I0129 02:00:35.192530 31953 master.cpp:569] Authorization enabled
> I0129 02:00:35.192719 31950 whitelist_watcher.cpp:77] No whitelist given
> I0129 02:00:35.192756 31957 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0129 02:00:35.194291 31955 master.cpp:1710] The newly elected leader is 
> master@172.17.0.3:60904 with id 823f2212-bf28-4dd6-959d-796029d32afb
> I0129 02:00:35.194335 31955 master.cpp:1723] Elected as the leading master!
> I0129 02:00:35.194350 31955 master.cpp:1468] Recovering from registrar
> I0129 02:00:35.194545 31958 registrar.cpp:307] Recovering registrar
> I0129 02:00:35.220226 31948 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 29.150097ms
> I0129 02:00:35.220262 31948 replica.cpp:320] Persisted replica status to 
> STARTING
> I0129 02:00:35.220484 31959 recover.cpp:473] Replica is in STARTING status
> I0129 02:00:35.221220 31954 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (11819)@172.17.0.3:60904
> I0129 02:00:35.221539 31959 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0129 02:00:35.221871 31954 recover.cpp:564] Updating replica status to VOTING
> I0129 02:00:35.245329 31949 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 23.326002ms
> I0129 02:0

[jira] [Updated] (MESOS-4545) Propose design doc for reliable floating point behavior

2016-02-01 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4545:
---
Sprint: Mesosphere Sprint 28  (was: Mesosphere Sprint 27)

> Propose design doc for reliable floating point behavior
> ---
>
> Key: MESOS-4545
> URL: https://issues.apache.org/jira/browse/MESOS-4545
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Neil Conway
>  Labels: mesosphere, resources
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4544) Propose design doc for agent partitioning behavior

2016-02-01 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-4544:
---
Story Points: 8  (was: 9)

> Propose design doc for agent partitioning behavior
> --
>
> Key: MESOS-4544
> URL: https://issues.apache.org/jira/browse/MESOS-4544
> Project: Mesos
>  Issue Type: Task
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4487) Introduce status() interface in `Containerizer`

2016-02-01 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4487:
--
Story Points: 2  (was: 3)

> Introduce status() interface in `Containerizer`
> ---
>
> Key: MESOS-4487
> URL: https://issues.apache.org/jira/browse/MESOS-4487
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: containerizer, mesosphere
>
> In the Containerizer, during container isolation, the isolators end up 
> modifying the state of the containers. Examples would be IP address 
> allocation to a container by the 'network isolator, or net_cls handle 
> allocation by the cgroup/net_cls isolator. 
> Often times the state of the container, needs to be exposed to operators 
> through the state.json end-point. For e.g. operators or frameworks might want 
> to know the IP-address configured on a particular container, or the net_cls 
> handle associated with a container to configure the right TC rules. However, 
> at present, there is no clean interface for the slave to retrieve the state 
> of a container from the Containerizer for any of the launched containers. 
> Thus, we need to introduce a `status` interface in the `Containerizer` base 
> class, in order for the slave to expose container state information in its 
> state.json.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4544) Propose design doc for agent partitioning behavior

2016-02-01 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-4544:
--

Assignee: Neil Conway

> Propose design doc for agent partitioning behavior
> --
>
> Key: MESOS-4544
> URL: https://issues.apache.org/jira/browse/MESOS-4544
> Project: Mesos
>  Issue Type: Task
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4531) Document multi-disk support.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4531:
-
Sprint: Mesosphere Sprint 28

> Document multi-disk support.
> 
>
> Key: MESOS-4531
> URL: https://issues.apache.org/jira/browse/MESOS-4531
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Reporter: Jie Yu
>Assignee: Joris Van Remoortere
>  Labels: documentation, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4531) Document multi-disk support.

2016-02-01 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-4531:
-
Sprint:   (was: Mesosphere Sprint 28)

> Document multi-disk support.
> 
>
> Key: MESOS-4531
> URL: https://issues.apache.org/jira/browse/MESOS-4531
> Project: Mesos
>  Issue Type: Task
>  Components: documentation
>Reporter: Jie Yu
>Assignee: Joris Van Remoortere
>  Labels: documentation, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3003) Support mounting in default configuration files/volumes into every new container

2016-02-01 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-3003:
--
Sprint:   (was: Mesosphere Sprint 27)

> Support mounting in default configuration files/volumes into every new 
> container
> 
>
> Key: MESOS-3003
> URL: https://issues.apache.org/jira/browse/MESOS-3003
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Timothy Chen
>  Labels: mesosphere, unified-containerizer-mvp
>
> Most container images leave out system configuration (e.g: /etc/*) and expect 
> the container runtimes to mount in specific configurations as needed such as 
> /etc/resolv.conf from the host into the container when needed.
> We need to support mounting in specific configuration files for command 
> executor to work, and also allow the user to optionally define other 
> configuration files to mount in as well via flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4490) Get container status information in slave.

2016-02-01 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4490:
--
Story Points: 3

> Get container status information in slave. 
> ---
>
> Key: MESOS-4490
> URL: https://issues.apache.org/jira/browse/MESOS-4490
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> As part of MESOS-4487 an interface will be introduce into the `Containerizer` 
> to allow agents to retrieve container state information. The agent needs to 
> use this interface to retrieve container state information during status 
> updates from the executor. The container state information can be then use by 
> the agent to expose various isolator specific configuration (for e.g., IP 
> address allocated by network isolators, net_cls handles allocated by 
> `cgroups/net_cls` isolator), that has been applied to the container, in the 
> state.json endpoint.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4517) Introduce docker runtime isolator.

2016-02-01 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4517:
--
Story Points: 3

> Introduce docker runtime isolator.
> --
>
> Key: MESOS-4517
> URL: https://issues.apache.org/jira/browse/MESOS-4517
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: mesosphere
>
> Currently docker image default configuration are included in `ProvisionInfo`. 
> We should grab necessary config from `ProvisionInfo` into `ContainerInfo`, 
> and handle all these runtime informations inside of docker runtime isolator. 
> Return a `ContainerLaunchInfo` containing `working_dir`, `env` and merged 
> `commandInfo`, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >