[jira] [Commented] (MESOS-3177) Make Mesos own configuration of roles/weights

2015-09-17 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791639#comment-14791639
 ] 

Yong Qiao Wang commented on MESOS-3177:
---

Thanks [~cmaloney] for your kindly reply. I have some questions and comments 
for your above thinks:

1. As we know, currently roles and weights are not persist in the replicated 
log, do you mean that we should persist them?

2. If yes for #1, then I think the initial replicated log for roles and weights 
are created when Mesos master starts in the first time, and the content of the 
log should be the roles and weights specified by --roles and --weights flag. is 
it right?

3. For add a new role "add_role", in code level, there are two places only need 
to change:

Add a new HTTP endpoint in master.cpp to add a new item in 
{code}
hashmap roles;
{code}

and call allocator to update the RoleSorter;

4. For remove an existing role "remove_role", I think it should ensure the 
following things before remove an existing role: 
  - Kill all tasks which using the resources reserved by this role;
  - Shutdown all executors which using the resources reserved by this role;
  - Unreserve the dynamically reserved resources for this role;
  - Destory the persisted volumn which using the resources reserved by this 
role;
  - Remove all frameworks which associated with this role?
  - Remove the related ACL of this role;

5. Do you mean the authorization rather than authentication in above comments?

[~cmaloney], Welcome your any comments for above thinks of me.

> Make Mesos own configuration of roles/weights
> -
>
> Key: MESOS-3177
> URL: https://issues.apache.org/jira/browse/MESOS-3177
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Cody Maloney
>Assignee: Thomas Rampelberg
>  Labels: mesosphere
>
> All roles and weights must currently be specified up-front when starting 
> Mesos masters currently. In addition, they should be consistent on every 
> master, otherwise unexpected behavior could occur (You can have them be 
> inconsistent for some upgrade paths / changing the set).
> This makes it hard to introduce new groups of machines under new roles 
> dynamically (Have to generate a new master configuration, deploy that, before 
> we can connect slaves with a new role to the cluster).
> Ideally an administrator can manually add / remove / edit roles and have the 
> settings replicated / passed to all masters in the cluster by Mesos. 
> Effectively Mesos takes ownership of the setting, rather than requiring it to 
> be done externally.
> In addition, if a new slave joins the cluster with an unexpected / new role 
> that should just work, making it much easier to introduce machines with new 
> roles. (Policy around whether or not a slave can cause creation of a new 
> role, a given slave can register with a given role, etc. is out of scope, and 
> would be controls in the general registration process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3177) Make Mesos own configuration of roles/weights

2015-09-17 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791661#comment-14791661
 ] 

Yong Qiao Wang commented on MESOS-3177:
---

In addition, When we remove an existing role, we also need to call related 
slave to release the resources which reserved by that role before. [~cmaloney], 
any thoughts for this? Thanks! 

> Make Mesos own configuration of roles/weights
> -
>
> Key: MESOS-3177
> URL: https://issues.apache.org/jira/browse/MESOS-3177
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Cody Maloney
>Assignee: Thomas Rampelberg
>  Labels: mesosphere
>
> All roles and weights must currently be specified up-front when starting 
> Mesos masters currently. In addition, they should be consistent on every 
> master, otherwise unexpected behavior could occur (You can have them be 
> inconsistent for some upgrade paths / changing the set).
> This makes it hard to introduce new groups of machines under new roles 
> dynamically (Have to generate a new master configuration, deploy that, before 
> we can connect slaves with a new role to the cluster).
> Ideally an administrator can manually add / remove / edit roles and have the 
> settings replicated / passed to all masters in the cluster by Mesos. 
> Effectively Mesos takes ownership of the setting, rather than requiring it to 
> be done externally.
> In addition, if a new slave joins the cluster with an unexpected / new role 
> that should just work, making it much easier to introduce machines with new 
> roles. (Policy around whether or not a slave can cause creation of a new 
> role, a given slave can register with a given role, etc. is out of scope, and 
> would be controls in the general registration process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3450) Update Mesos C++ Style Guide for namespace usage

2015-09-17 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791747#comment-14791747
 ] 

Guangya Liu commented on MESOS-3450:


RR: https://reviews.apache.org/r/38452/ 

[~bmahler] can you please help review? Thanks!

> Update Mesos C++ Style Guide for namespace usage
> 
>
> Key: MESOS-3450
> URL: https://issues.apache.org/jira/browse/MESOS-3450
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.25.0
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 0.25.0
>
>
> Discussed with [~bmahler] , the current C++ style guide do not including the 
> usage for namespace, we need to update the document to tell developers how to 
> use namespace for future coding.
> In general we avoid 'using namespace foo' statements as it is not explicit 
> about which symbols are pulled in, and it can often pull in a lot of symbols, 
> which sometimes lead to conflicts.
> we're going to need namespace aliases to help pull in subnamespaces, e.g. 
> namespace http = process::http; but this isn't in the style guide yet, we 
> need to update the style guide to reflect this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3433) Unmount irrelevant host mounts in the new container's mount namespace.

2015-09-17 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-3433:
--
Summary: Unmount irrelevant host mounts in the new container's mount 
namespace.  (was: Unmount work dir and persistent volume mounts of other 
containers in the new mount namespace.)

> Unmount irrelevant host mounts in the new container's mount namespace.
> --
>
> Key: MESOS-3433
> URL: https://issues.apache.org/jira/browse/MESOS-3433
> Project: Mesos
>  Issue Type: Task
>Reporter: Yan Xu
>Assignee: Yan Xu
>  Labels: twitter
>
> As described in this 
> [TODO|https://github.com/apache/mesos/blob/e601e469c64594dd8339352af405cbf26a574ea8/src/slave/containerizer/isolators/filesystem/linux.cpp#L418]:
> {noformat:title=}
>   // TODO(jieyu): Try to unmount work directory mounts and persistent
>   // volume mounts for other containers to release the extra
>   // references to those mounts.
> {noformat}
> This will a best effort attempt to alleviate the race condition between 
> provisioner's container cleanup and new containers copying host mount table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3452) Do not use "using namespace foo" for statements

2015-09-17 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-3452:
--

 Summary: Do not use "using namespace foo" for statements
 Key: MESOS-3452
 URL: https://issues.apache.org/jira/browse/MESOS-3452
 Project: Mesos
  Issue Type: Bug
  Components: general
Affects Versions: 0.26.0
Reporter: Guangya Liu
 Fix For: 0.26.0


The MESOS-3450 is updating C++ style guide to introduce how to use namespace in 
mesos, it is better that we update all of the namespace usage according to the 
new style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3293) Failing ROOT_ tests on CentOS 7.1 - LimitedCpuIsolatorTest

2015-09-17 Thread Jian Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791784#comment-14791784
 ] 

Jian Qiu commented on MESOS-3293:
-

RR: https://reviews.apache.org/r/38454/

> Failing ROOT_ tests on CentOS 7.1 - LimitedCpuIsolatorTest
> --
>
> Key: MESOS-3293
> URL: https://issues.apache.org/jira/browse/MESOS-3293
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker, test
>Affects Versions: 0.23.0, 0.24.0
> Environment: CentOS Linux release 7.1
> Linux 3.10.0
>Reporter: Marco Massenzio
>Assignee: Jian Qiu
>Priority: Blocker
>  Labels: mesosphere, tech-debt
> Attachments: 20150818-mesos-tests.log
>
>
> h2. LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> This is one of several ROOT failing tests: we want to track them 
> *individually* and for each of them decide whether to:
> * fix;
> * remove; OR
> * redesign.
> (full verbose logs attached)
> h2. Steps to Reproduce
> Completely cleaned the build, removed directory, clean pull from {{master}} 
> (SHA: {{fb93d93}}) - same results, 9 failed tests:
> {noformat}
> [==] 751 tests from 114 test cases ran. (231218 ms total)
> [  PASSED  ] 742 tests.
> [  FAILED  ] 9 tests, listed below:
> [  FAILED  ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [  FAILED  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where 
> TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess
> [  FAILED  ] ContainerizerTest.ROOT_CGROUPS_BalloonFramework
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost
> [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem
> [  FAILED  ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs
>  9 FAILED TESTS
>   YOU HAVE 10 DISABLED TESTS
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3453) Add patch file to silence deprecation warnings when we compile protobufs on Windows

2015-09-17 Thread Alex Clemmer (JIRA)
Alex Clemmer created MESOS-3453:
---

 Summary: Add patch file to silence deprecation warnings when we 
compile protobufs on Windows
 Key: MESOS-3453
 URL: https://issues.apache.org/jira/browse/MESOS-3453
 Project: Mesos
  Issue Type: Task
  Components: build
Reporter: Alex Clemmer
Assignee: Alex Clemmer


Right now when you compile Protobuf v2.5.0, it gives you deprecation warnings 
because stdext was removed. You can silence these, but it will require either 
submitting a PR to the project or adding a patchfile to be applied to the repot 
when you untar it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3037) Add a QUIESCE call to the scheduler

2015-09-17 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802837#comment-14802837
 ] 

Guangya Liu commented on MESOS-3037:


[~vinodkone] Can you please help review if we can have all of the patches 
landed by Monday?

> Add a QUIESCE call to the scheduler
> ---
>
> Key: MESOS-3037
> URL: https://issues.apache.org/jira/browse/MESOS-3037
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.25.0
>Reporter: Vinod Kone
>Assignee: Guangya Liu
>  Labels: September23th
> Fix For: 0.25.0
>
>
> SUPPRESS call is the complement to the current REVIVE call i.e., it will 
> inform Mesos to stop sending offers to the framework. 
> For the scheduler driver to send only Call messages (MESOS-2913), 
> DeactivateFrameworkMessage needs to be converted to Call(s). We can implement 
> this by having the driver send a SUPPRESS call followed by a DECLINE call for 
> outstanding offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3456) Provide common primitives can be used to write a new Mesos Framework

2015-09-17 Thread Micheal Benedict (JIRA)
Micheal Benedict created MESOS-3456:
---

 Summary: Provide common primitives can be used to write a new 
Mesos Framework
 Key: MESOS-3456
 URL: https://issues.apache.org/jira/browse/MESOS-3456
 Project: Mesos
  Issue Type: Epic
  Components: framework
Reporter: Micheal Benedict


[draft description - subject to change as more details are collected)

Mesos Frameworks tend to require some common functionality such as
1. ACL
2. Quota (provisioning of quota on behalf of user of framework)
3. Usage Metering per user of framework

Framework writers have expressed pain around having to rewrite some features 
which already exist in other frameworks.

The goal of this effort should be to provide a set of common libs to enable 
consistency across various frameworks on common features (ex, ACL) so as to not 
reinvent the wheel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3455) Higher level construct for expressing process dispatch

2015-09-17 Thread Jojy Varghese (JIRA)
Jojy Varghese created MESOS-3455:


 Summary: Higher level construct for expressing process dispatch
 Key: MESOS-3455
 URL: https://issues.apache.org/jira/browse/MESOS-3455
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Reporter: Jojy Varghese
Assignee: Jojy Varghese


Since mesos code is based on the actor model and dispatching an interface

asynchronously is a large part of the code base, generalizing the concept of

asynchronously dispatching an interface would eliminate the need to manual

programming of the dispatch boilerplate.

An example usage:

For a simple interface like:

class Interface 
 
{   
 
  virtual Future writeToFile(const char* data) = 0; 
 
  virtual ~Interface(); 
 
}; 
Today the developer has to do the following:

a. Write a wrapper class that implements the same interface to add the

dispatching boilerplate.

b. Spend precious time in reviews.

c. Risk introducing bugs.

None of the above steps add any value to the executable binary.

The wrapper class would look like:

// -- hpp file  
 
class InterfaceProcess; 
 

class InterfaceImpl : public Interface  
 
{   
 
public: 
 
  Try create(const Flags& flags); 
 

  virtual Future writeToFile(const char* data); 
 

  ~InterfaceImpl();
private:
 
  Owned process;  
 
};  
 

// -- cpp file  
 
Try create(const Flags& flags)
 
{   
 
  // Code to create the InterfaceProcess class. 
 
}   
 

Future Future InterfaceImpl::writeToFile(const char* data)  
 
{   
 
  process->dispatch(
 
::writeToFile, 
 
data);  
 
}   
 

InterfaceImpl::InterfaceImpl()  
 
{   
 
  // Code to spawn the process  
 
}   
 

InterfaceImpl::~InterfaceImpl() 
 
{   
 
  // Code to stop the process.  
 
}   
At the caller/client site, the code would look like:

Try in = InterfaceImpl::create(flags);
 
Future result = 
 
  in->writeToFile(data);
   
Proposal

We should use C++'s rich language semnatics to express the intent and avoid

the boilerplate we write manually.

The basic intent of the code that leads to all the boilerplate above is:

a. An interface that provides a set of functionality.

b. An implementation of the interface.

c. Ability to dispatch that interface asynchronously using actor.

C++ has a rich set of generics that can be used to express above.

Components

ProcessDispatcher

This component will "dispatch" an interface implementation asychronously using 
the process framework.

This component can be expressed as:

ProcessDispatcher   
DispatchInterface

Any interface that provides an implementation that can be "dispatched" can be

expressed using this component.

This component can be expressed as:

Dispatchable  
Usage:

Simple usage
Try> dispatcher =
 
  ProcessDispatcher::create(flags);   
 

Future result =   

[jira] [Updated] (MESOS-3455) Higher level construct for expressing process dispatch

2015-09-17 Thread Jojy Varghese (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jojy Varghese updated MESOS-3455:
-
Description: 
Since mesos code is based on the actor model and dispatching an interface

asynchronously is a large part of the code base, generalizing the concept of

asynchronously dispatching an interface would eliminate the need to manual

programming of the dispatch boilerplate.

An example usage:

For a simple interface like:

{code}
class Interface 
 
{   
 
  virtual Future writeToFile(const char* data) = 0; 
 
  virtual ~Interface(); 
 
}; 

{code}

Today the developer has to do the following:

a. Write a wrapper class that implements the same interface to add the

dispatching boilerplate.

b. Spend precious time in reviews.

c. Risk introducing bugs.

None of the above steps add any value to the executable binary.

The wrapper class would look like:

// -- hpp file  
 
class InterfaceProcess; 
 

class InterfaceImpl : public Interface  
 
{   
 
public: 
 
  Try create(const Flags& flags); 
 

  virtual Future writeToFile(const char* data); 
 

  ~InterfaceImpl();
private:
 
  Owned process;  
 
};  
 

// -- cpp file  
 
Try create(const Flags& flags)
 
{   
 
  // Code to create the InterfaceProcess class. 
 
}   
 

Future Future InterfaceImpl::writeToFile(const char* data)  
 
{   
 
  process->dispatch(
 
::writeToFile, 
 
data);  
 
}   
 

InterfaceImpl::InterfaceImpl()  
 
{   
 
  // Code to spawn the process  
 
}   
 

InterfaceImpl::~InterfaceImpl() 
 
{   
 
  // Code to stop the process.  
 
}   
At the caller/client site, the code would look like:

Try in = InterfaceImpl::create(flags);
 
Future result = 
 
  in->writeToFile(data);
   
Proposal

We should use C++'s rich language semnatics to express the intent and avoid

the boilerplate we write manually.

The basic intent of the code that leads to all the boilerplate above is:

a. An interface that provides a set of functionality.

b. An implementation of the interface.

c. Ability to dispatch that interface asynchronously using actor.

C++ has a rich set of generics that can be used to express above.

Components

ProcessDispatcher

This component will "dispatch" an interface implementation asychronously using 
the process framework.

This component can be expressed as:

ProcessDispatcher   
DispatchInterface

Any interface that provides an implementation that can be "dispatched" can be

expressed using this component.

This component can be expressed as:

Dispatchable  
Usage:

Simple usage
Try> dispatcher =
 
  ProcessDispatcher::create(flags);   
 

Future result = 
 
  dispatcher->dispatch( 
 
Interface::writeToFile, 

[jira] [Commented] (MESOS-2224) Add explanatory comments for Allocator interface

2015-09-17 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802882#comment-14802882
 ] 

Alexander Rukletsov commented on MESOS-2224:


I'm not sure, Niklas. I also don't think it's high priority for 0.25, we can 
postpone.

> Add explanatory comments for Allocator interface
> 
>
> Key: MESOS-2224
> URL: https://issues.apache.org/jira/browse/MESOS-2224
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Affects Versions: 0.25.0
>Reporter: Alexander Rukletsov
>Assignee: Guangya Liu
>Priority: Minor
> Fix For: 0.25.0
>
>
> Allocator is the public API and it would be great to have comments on all 
> calls to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3454) Remove duplicated logic in Flags::load

2015-09-17 Thread Klaus Ma (JIRA)
Klaus Ma created MESOS-3454:
---

 Summary: Remove duplicated logic in Flags::load
 Key: MESOS-3454
 URL: https://issues.apache.org/jira/browse/MESOS-3454
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Klaus Ma
Assignee: Klaus Ma
Priority: Minor


In {{flags.hpp}}, there are two functions with almost the same logic; this 
ticket is used to merge the duplicated part.

{code}
inline Try FlagsBase::load(
const Option& prefix,
int* argc,
char*** argv,
bool unknowns,
bool duplicates)
...
inline Try FlagsBase::load(
const Option& prefix,
int argc,
const char* const *argv,
bool unknowns,
bool duplicates)
...
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3136) COMMAND health checks with Marathon 0.10.0 are broken

2015-09-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-3136:

Attachment: MESOS-3136_0_24_0.patch

The backport patch for 0.24.0-rc1. Test with
{code}
sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="HealthCheckTest*" --verbose
{code}

> COMMAND health checks with Marathon 0.10.0 are broken
> -
>
> Key: MESOS-3136
> URL: https://issues.apache.org/jira/browse/MESOS-3136
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Dr. Stefan Schimanski
>Assignee: haosdent
>Priority: Critical
> Attachments: MESOS-3136_0_24_0.patch
>
>
> When deploying Mesos 0.23rc4 with latest Marathon 0.10.0 RC3 command health 
> check stop working. Rolling back to Mesos 0.22.1 fixes the problem.
> Containerizer is Docker.
> All packages are from official Mesosphere Ubuntu 14.04 sources.
> The issue must be analyzed further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3136) COMMAND health checks with Marathon 0.10.0 are broken

2015-09-17 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803296#comment-14803296
 ] 

haosdent edited comment on MESOS-3136 at 9/17/15 6:06 PM:
--

The backport patches for 0.23.0 and 0.24.0-rc1 are in attached files. Test with
{code}
sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="HealthCheckTest*" --verbose
{code}


was (Author: haosd...@gmail.com):
The backport patch for 0.24.0-rc1. Test with
{code}
sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="HealthCheckTest*" --verbose
{code}

> COMMAND health checks with Marathon 0.10.0 are broken
> -
>
> Key: MESOS-3136
> URL: https://issues.apache.org/jira/browse/MESOS-3136
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Dr. Stefan Schimanski
>Assignee: haosdent
>Priority: Critical
> Attachments: MESOS-3136_0_23_0.patch, MESOS-3136_0_24_0.patch
>
>
> When deploying Mesos 0.23rc4 with latest Marathon 0.10.0 RC3 command health 
> check stop working. Rolling back to Mesos 0.22.1 fixes the problem.
> Containerizer is Docker.
> All packages are from official Mesosphere Ubuntu 14.04 sources.
> The issue must be analyzed further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3280) Master fails to access replicated log after network partition

2015-09-17 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804028#comment-14804028
 ] 

Jie Yu commented on MESOS-3280:
---

I'd be happy to assist as well. Will be useful to attach the master's log 
(related to replicated log).

> Master fails to access replicated log after network partition
> -
>
> Key: MESOS-3280
> URL: https://issues.apache.org/jira/browse/MESOS-3280
> Project: Mesos
>  Issue Type: Bug
>  Components: master, replicated log
>Affects Versions: 0.23.0
> Environment: Zookeeper version 3.4.5--1
>Reporter: Bernd Mathiske
>Assignee: Neil Conway
>  Labels: mesosphere
>
> In a 5 node cluster with 3 masters and 2 slaves, and ZK on each node, when a 
> network partition is forced, all the masters apparently lose access to their 
> replicated log. The leading master halts. Unknown reasons, but presumably 
> related to replicated log access. The others fail to recover from the 
> replicated log. Unknown reasons. This could have to do with ZK setup, but it 
> might also be a Mesos bug. 
> This was observed in a Chronos test drive scenario described in detail here:
> https://github.com/mesos/chronos/issues/511
> With setup instructions here:
> https://github.com/mesos/chronos/issues/508



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2906) Slave : Synchronous Validation for Calls

2015-09-17 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2906:
--
Description: 
/call endpoint on the slave will return a 202 accepted code but has to do some 
basic validations before. In case of invalidation it will return a 
{{BadRequest}} back to the client.

- We need to create the required infrastructure to validate the request and 
then process it similar to {{src/master/validation.cpp}} in the {{namespace 
scheduler}} i.e. check if the protobuf is properly initialized, has the 
required attributes set pertaining to the call message etc.

  was:
/call endpoint on the slave will return a 202 accepted code but has to do some 
basic validations before. In case of invalidation it will return a 4xx code.  

- We need to create the required infrastructure to validate the request and 
then process it.


> Slave : Synchronous Validation for Calls
> 
>
> Key: MESOS-2906
> URL: https://issues.apache.org/jira/browse/MESOS-2906
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>  Labels: HTTP, mesosphere
>
> /call endpoint on the slave will return a 202 accepted code but has to do 
> some basic validations before. In case of invalidation it will return a 
> {{BadRequest}} back to the client.
> - We need to create the required infrastructure to validate the request and 
> then process it similar to {{src/master/validation.cpp}} in the {{namespace 
> scheduler}} i.e. check if the protobuf is properly initialized, has the 
> required attributes set pertaining to the call message etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3280) Master fails to access replicated log after network partition

2015-09-17 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-3280:
---
Component/s: replicated log

> Master fails to access replicated log after network partition
> -
>
> Key: MESOS-3280
> URL: https://issues.apache.org/jira/browse/MESOS-3280
> Project: Mesos
>  Issue Type: Bug
>  Components: master, replicated log
>Affects Versions: 0.23.0
> Environment: Zookeeper version 3.4.5--1
>Reporter: Bernd Mathiske
>Assignee: Neil Conway
>  Labels: mesosphere
>
> In a 5 node cluster with 3 masters and 2 slaves, and ZK on each node, when a 
> network partition is forced, all the masters apparently lose access to their 
> replicated log. The leading master halts. Unknown reasons, but presumably 
> related to replicated log access. The others fail to recover from the 
> replicated log. Unknown reasons. This could have to do with ZK setup, but it 
> might also be a Mesos bug. 
> This was observed in a Chronos test drive scenario described in detail here:
> https://github.com/mesos/chronos/issues/511
> With setup instructions here:
> https://github.com/mesos/chronos/issues/508



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3136) COMMAND health checks with Marathon 0.10.0 are broken

2015-09-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-3136:

Attachment: MESOS-3136_0_23_0.patch

> COMMAND health checks with Marathon 0.10.0 are broken
> -
>
> Key: MESOS-3136
> URL: https://issues.apache.org/jira/browse/MESOS-3136
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Dr. Stefan Schimanski
>Assignee: haosdent
>Priority: Critical
> Attachments: MESOS-3136_0_23_0.patch, MESOS-3136_0_24_0.patch
>
>
> When deploying Mesos 0.23rc4 with latest Marathon 0.10.0 RC3 command health 
> check stop working. Rolling back to Mesos 0.22.1 fixes the problem.
> Containerizer is Docker.
> All packages are from official Mesosphere Ubuntu 14.04 sources.
> The issue must be analyzed further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3177) Make Mesos own configuration of roles/weights

2015-09-17 Thread Cody Maloney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804289#comment-14804289
 ] 

Cody Maloney commented on MESOS-3177:
-

Currently the mesos master doesn't keep track of roles it knows of explicitly, 
just roles which it says it should know about passed in via the flag. Storing 
them in the replicated log would be my preferred place to put / persist them.

If they are persisted in the repliacted log and that is the authoritative 
source for them, I'd rather not have them be flags to the mesos master anymore, 
as after first mesos master start those flags would be meaningless and lead to 
a potentially bad user experience (I set the flags on mesos master but they 
aren't applying!?!?!). 

There is a `mesos-log` command that already exists, and it's been design 
discussed some that initialization of the replicated log shouldn't be implicit 
in master startup (Can potentially lead to bad cluster/error cases for some 
node replacement scenarios).

I would suggest only allowing adding roles in v1. Removing roles will require 
revoking offers, which sort of exists with inverse offers that recently became 
available, but is going to be a lot of engineering.

For other things you're going to need a Mesos Shepherd going forward for more 
design review, building out a proper design proposal, and getting things landed 
in time.

> Make Mesos own configuration of roles/weights
> -
>
> Key: MESOS-3177
> URL: https://issues.apache.org/jira/browse/MESOS-3177
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Cody Maloney
>Assignee: Thomas Rampelberg
>  Labels: mesosphere
>
> All roles and weights must currently be specified up-front when starting 
> Mesos masters currently. In addition, they should be consistent on every 
> master, otherwise unexpected behavior could occur (You can have them be 
> inconsistent for some upgrade paths / changing the set).
> This makes it hard to introduce new groups of machines under new roles 
> dynamically (Have to generate a new master configuration, deploy that, before 
> we can connect slaves with a new role to the cluster).
> Ideally an administrator can manually add / remove / edit roles and have the 
> settings replicated / passed to all masters in the cluster by Mesos. 
> Effectively Mesos takes ownership of the setting, rather than requiring it to 
> be done externally.
> In addition, if a new slave joins the cluster with an unexpected / new role 
> that should just work, making it much easier to introduce machines with new 
> roles. (Policy around whether or not a slave can cause creation of a new 
> role, a given slave can register with a given role, etc. is out of scope, and 
> would be controls in the general registration process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3015) Add hooks for Slave exits

2015-09-17 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-3015:
--
Assignee: Niklas Quarfot Nielsen  (was: Kapil Arya)
Target Version/s: 0.25.0

> Add hooks for Slave exits
> -
>
> Key: MESOS-3015
> URL: https://issues.apache.org/jira/browse/MESOS-3015
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Niklas Quarfot Nielsen
>  Labels: mesosphere
>
> The hook will be triggered on slave exits. A master hook module can use this 
> to do Slave-specific cleanups.
> In our particular use case, the hook would trigger cleanup of IPs assigned to 
> the given Slave (see the [design doc | 
> https://docs.google.com/document/d/17mXtAmdAXcNBwp_JfrxmZcQrs7EO6ancSbejrqjLQ0g/edit#]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2906) Slave : Synchronous Validation for Calls

2015-09-17 Thread Isabel Jimenez (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabel Jimenez reassigned MESOS-2906:
-

Assignee: Isabel Jimenez  (was: Anand Mazumdar)

> Slave : Synchronous Validation for Calls
> 
>
> Key: MESOS-2906
> URL: https://issues.apache.org/jira/browse/MESOS-2906
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Isabel Jimenez
>  Labels: HTTP, mesosphere
>
> /call endpoint on the slave will return a 202 accepted code but has to do 
> some basic validations before. In case of invalidation it will return a 
> {{BadRequest}} back to the client.
> - We need to create the required infrastructure to validate the request and 
> then process it similar to {{src/master/validation.cpp}} in the {{namespace 
> scheduler}} i.e. check if the protobuf is properly initialized, has the 
> required attributes set pertaining to the call message etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-17 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-3430:
-

Assignee: Jie Yu  (was: Michael Park)

> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Jie Yu
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-17 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-3430:
--
Sprint: Twitter Mesos Q3 Sprint 5  (was: Mesosphere Sprint 19)

> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Jie Yu
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3430) LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails on CentOS 7.1

2015-09-17 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804652#comment-14804652
 ] 

Jie Yu commented on MESOS-3430:
---

This is committed. Reopen if you still see the issue.

commit 5f4e1fadc012a833674c7894975e23b3761633c5
Author: Jie Yu 
Date:   Thu Sep 17 14:48:39 2015 -0700

Fixed MESOS-3430 by making the sandbox/work directory mount as shared and 
slave.

Review: https://reviews.apache.org/r/38471

> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem fails 
> on CentOS 7.1
> --
>
> Key: MESOS-3430
> URL: https://issues.apache.org/jira/browse/MESOS-3430
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Marco Massenzio
>Assignee: Jie Yu
>  Labels: ROOT_Tests, flaky-test
> Attachments: verbose.log
>
>
> Just ran ROOT tests on CentOS 7.1 and had the following failure (clean build, 
> just pulled from {{master}}):
> {noformat}
> [ RUN  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> ../../src/tests/containerizer/filesystem_isolator_tests.cpp:498: Failure
> (wait).failure(): Failed to clean up an isolator when destroying container 
> '366b6d37-b326-4ed1-8a5f-43d483dbbace' :Failed to unmount volume 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Failed to unmount 
> '/tmp/LinuxFilesystemIsolatorTest_ROOT_PersistentVolumeWithoutRootFilesystem_KXgvoH/sandbox/volume':
>  Invalid argument
> ../../src/tests/utils.cpp:75: Failure
> os::rmdir(sandbox.get()): Device or resource busy
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem (1943 
> ms)
> [--] 1 test from LinuxFilesystemIsolatorTest (1943 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (1951 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] 
> LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3451) `sudo make check` fails after Isolator changes with port mapping isolator

2015-09-17 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804636#comment-14804636
 ] 

Jie Yu commented on MESOS-3451:
---

ExampleTest.TestFramework
ExampleTest.JavaFramework

These two tests are failing when running tests with ROOT. I believe it has 
nothing to do with network isolator. In other words, these two tests will fail 
under ROOT on linux even if network isolator is not built in.

I believe these two tests will fail even if network port mapping isolator is 
not built in. You should be able to reproduce on a Linux host.

The reason is because linux launcher is used for those two tests which will 
break since there are multiple slaves running on the same machine both using 
the shared freezer cgroup. We cannot simply use LinuxLauncher if slave runs 
under ROOT.

> `sudo make check` fails after Isolator changes with port mapping isolator
> -
>
> Key: MESOS-3451
> URL: https://issues.apache.org/jira/browse/MESOS-3451
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>Priority: Blocker
>
> When configured with `--with-network-isolation`, `sudo make check` fails with 
> a few tests. This is related to the following recent commits :
> e047f7d69b5297cc787487b6093119a3be517e48
> fc541a9a97eb1d86c27452019ff217eed11ed5a3
> 6923bb3e8cfbddde9fbabc6ca4edc29d9fc96c06



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3457) Add flag to disable hostname lookup

2015-09-17 Thread Marco Massenzio (JIRA)
Marco Massenzio created MESOS-3457:
--

 Summary: Add flag to disable hostname lookup
 Key: MESOS-3457
 URL: https://issues.apache.org/jira/browse/MESOS-3457
 Project: Mesos
  Issue Type: Improvement
Reporter: Cody Maloney
Assignee: Marco Massenzio


In testing / buildinging DCOS we've found that we need to set --hostname 
explicitly on the masters. For our uses IP and `hostname` must always be the 
same thing. 

More in general, under certain circumstances, dynamic lookup of {{hostname}}, 
while successful, provides undesirable results; we would also like, in those 
circumstances, be able to just set the hostname to the chosen
IP address (possibly set via the {{\-\- ip_discovery_command}} method).

We suggest adding a {{\-\-no-hostname-lookup}}. 
Note that we can introduce this flag as {{--hostname-lookup}} with a default to 
'true' (which is the current semantics) and that way someone can do 
{{\-\-no-hostname-lookup}} or {{\-\-hostname-lookup=false}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3460) Update Java Test Framework Support QuiesceOffer and reviveOffer

2015-09-17 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-3460:
--

 Summary: Update Java Test Framework Support QuiesceOffer and 
reviveOffer
 Key: MESOS-3460
 URL: https://issues.apache.org/jira/browse/MESOS-3460
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.26.0
Reporter: Guangya Liu
 Fix For: 0.26.0


This is a follow up for https://reviews.apache.org/r/38120/ , we need to add 
Java framework support for quieseceOffers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2406) Add CLI tool for creating persistent volumes for pre-existing data

2015-09-17 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-2406:

Assignee: (was: Klaus Ma)

> Add CLI tool for creating persistent volumes for pre-existing data
> --
>
> Key: MESOS-2406
> URL: https://issues.apache.org/jira/browse/MESOS-2406
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>
> This is for the case where the user has some pre-existing data under a 
> certain directory (e.g., /var/lib/cassandra) and wants to expose that 
> directory as a persistent volume to the framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3046) Stout's UUID re-seeds a new random generator during each call to UUID::random.

2015-09-17 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804781#comment-14804781
 ] 

Klaus Ma commented on MESOS-3046:
-

[~bmahler], would you help to review the patch? maybe we can release it in 
0.25.0.

> Stout's UUID re-seeds a new random generator during each call to UUID::random.
> --
>
> Key: MESOS-3046
> URL: https://issues.apache.org/jira/browse/MESOS-3046
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Mahler
>Assignee: Klaus Ma
>  Labels: newbie, twitter
> Attachments: tl.cpp
>
>
> Per [~StephanErb] and [~kevints]'s observations on MESOS-2940, stout's UUID 
> abstraction is re-seeding the random generator during each call to 
> {{UUID::random()}}, which is really expensive.
> This is confirmed in the perf graph from MESOS-2940.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1279) Add resize task primitive

2015-09-17 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-1279:
---

Assignee: Klaus Ma

> Add resize task primitive
> -
>
> Key: MESOS-1279
> URL: https://issues.apache.org/jira/browse/MESOS-1279
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api, master, slave
>Reporter: Niklas Quarfot Nielsen
>Assignee: Klaus Ma
>  Labels: mesosphere, myriad
>
> As mentioned in MESOS-938, one way to support task replacement and scaling 
> could be to split the responsibility into several smaller primitives for 1) 
> reducing complexity 2) Make it easier to comprehend and 3) easier and 
> incremental in implementation.
> resizeTask() would be the primitive to either
> 1) Scale a running task's resources down
> 2) Scale a running task's resources up by using extra auxiliary offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2845) Command tasks lead to a mixing of revocable / non-revocable cpus and memory within the container.

2015-09-17 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804801#comment-14804801
 ] 

Klaus Ma commented on MESOS-2845:
-

[~vi...@twitter.com]/[~idownes], do you have comments on this proposal?

> Command tasks lead to a mixing of revocable / non-revocable cpus and memory 
> within the container.
> -
>
> Key: MESOS-2845
> URL: https://issues.apache.org/jira/browse/MESOS-2845
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Benjamin Mahler
>Assignee: Klaus Ma
>  Labels: twitter
>
> Due to the hack 
> [here|https://github.com/apache/mesos/blob/9a5788801e7fc95fce99749a23803fc52c67c0ce/src/slave/slave.cpp#L3101],
>  where we add a small set of resources into the command executor:
> {code}
> ExecutorInfo Slave::getExecutorInfo(
> const FrameworkID& frameworkId,
> const TaskInfo& task)
> {
>   if (task.has_command()) {
> ...
> // XXX: These are always non-revocable.
> // Add an allowance for the command executor. This does lead to a
> // small overcommit of resources.
> executor.mutable_resources()->MergeFrom(
> Resources::parse(
>   "cpus:" + stringify(DEFAULT_EXECUTOR_CPUS) + ";" +
>   "mem:" + stringify(DEFAULT_EXECUTOR_MEM.megabytes())).get());
>   }
>   ...
> }
> {code}
> The obvious extension here would be to make these revocable, but would be 
> great to remove this hack entirely.
> Seems to originate in [r/22251|https://reviews.apache.org/r/22251/] from 
> MESOS-1417.
> FYI [~idownes] [~jieyu]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3461) Update Python Test Framework Support QuiesceOffer and reviveOffer

2015-09-17 Thread Guangya Liu (JIRA)
Guangya Liu created MESOS-3461:
--

 Summary: Update Python Test Framework Support QuiesceOffer and 
reviveOffer
 Key: MESOS-3461
 URL: https://issues.apache.org/jira/browse/MESOS-3461
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.26.0
Reporter: Guangya Liu
 Fix For: 0.26.0


This is a follow up for https://reviews.apache.org/r/38121/ , we need to add  
Python framework support for quieseceOffers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3177) Make Mesos own configuration of roles/weights

2015-09-17 Thread Yong Qiao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804888#comment-14804888
 ] 

Yong Qiao Wang commented on MESOS-3177:
---

Thanks [~cmaloney] for your quickly reply.

[~thomasr], are your working on this ticket now? If you do not have time on 
this now, I want to re-assign this ticket to me, and try to propose a detailed 
design for this. Thanks! 

> Make Mesos own configuration of roles/weights
> -
>
> Key: MESOS-3177
> URL: https://issues.apache.org/jira/browse/MESOS-3177
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Cody Maloney
>Assignee: Thomas Rampelberg
>  Labels: mesosphere
>
> All roles and weights must currently be specified up-front when starting 
> Mesos masters currently. In addition, they should be consistent on every 
> master, otherwise unexpected behavior could occur (You can have them be 
> inconsistent for some upgrade paths / changing the set).
> This makes it hard to introduce new groups of machines under new roles 
> dynamically (Have to generate a new master configuration, deploy that, before 
> we can connect slaves with a new role to the cluster).
> Ideally an administrator can manually add / remove / edit roles and have the 
> settings replicated / passed to all masters in the cluster by Mesos. 
> Effectively Mesos takes ownership of the setting, rather than requiring it to 
> be done externally.
> In addition, if a new slave joins the cluster with an unexpected / new role 
> that should just work, making it much easier to introduce machines with new 
> roles. (Policy around whether or not a slave can cause creation of a new 
> role, a given slave can register with a given role, etc. is out of scope, and 
> would be controls in the general registration process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3458) Segfault when accepting or declining inverse offers

2015-09-17 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3458:


 Summary: Segfault when accepting or declining inverse offers
 Key: MESOS-3458
 URL: https://issues.apache.org/jira/browse/MESOS-3458
 Project: Mesos
  Issue Type: Bug
Reporter: Joseph Wu
Assignee: Joseph Wu


Discovered while writing a test for filters (in regards to inverse offers).

Fix here: https://reviews.apache.org/r/38470/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3458) Segfault when accepting or declining inverse offers

2015-09-17 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3458:
-
Priority: Blocker  (was: Major)

> Segfault when accepting or declining inverse offers
> ---
>
> Key: MESOS-3458
> URL: https://issues.apache.org/jira/browse/MESOS-3458
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Blocker
>
> Discovered while writing a test for filters (in regards to inverse offers).
> Fix here: https://reviews.apache.org/r/38470/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3459) Change /machine/up and /machine/down endpoints to take an array

2015-09-17 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3459:
-
Description: 
With [MESOS-3312] committed, the {{/machine/up}} and {{/machine/down}} 
endpoints should also take an input as an array.

It is important to change this before maintenance primitives are released:
https://reviews.apache.org/r/38011/

Also, a minor change to the error message from these endpoints:
https://reviews.apache.org/r/37969/

  was:
With [MESOS-3312] committed, the {{/machine/up}} and {{/machine/down}} 
endpoints should also take an input as an array.

It is important to change this before maintenance primitives are released.

https://reviews.apache.org/r/38011/


> Change /machine/up and /machine/down endpoints to take an array
> ---
>
> Key: MESOS-3459
> URL: https://issues.apache.org/jira/browse/MESOS-3459
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> With [MESOS-3312] committed, the {{/machine/up}} and {{/machine/down}} 
> endpoints should also take an input as an array.
> It is important to change this before maintenance primitives are released:
> https://reviews.apache.org/r/38011/
> Also, a minor change to the error message from these endpoints:
> https://reviews.apache.org/r/37969/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3458) Segfault when accepting or declining inverse offers

2015-09-17 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3458:
-
Labels: mesosphere  (was: )

> Segfault when accepting or declining inverse offers
> ---
>
> Key: MESOS-3458
> URL: https://issues.apache.org/jira/browse/MESOS-3458
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Blocker
>  Labels: mesosphere
>
> Discovered while writing a test for filters (in regards to inverse offers).
> Fix here: https://reviews.apache.org/r/38470/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3458) Segfault when accepting or declining inverse offers

2015-09-17 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3458:
-
Component/s: master

> Segfault when accepting or declining inverse offers
> ---
>
> Key: MESOS-3458
> URL: https://issues.apache.org/jira/browse/MESOS-3458
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Blocker
>  Labels: mesosphere
>
> Discovered while writing a test for filters (in regards to inverse offers).
> Fix here: https://reviews.apache.org/r/38470/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2875) Add containerId to ResourceUsage to enable QoS controller to target a container

2015-09-17 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-2875:

Fix Version/s: 0.25.0

> Add containerId to ResourceUsage to enable QoS controller to target a 
> container
> ---
>
> Key: MESOS-2875
> URL: https://issues.apache.org/jira/browse/MESOS-2875
> Project: Mesos
>  Issue Type: Improvement
>  Components: oversubscription, slave
>Affects Versions: 0.25.0
>Reporter: Niklas Quarfot Nielsen
>Assignee: Klaus Ma
>  Labels: race-condition, slave
> Fix For: 0.25.0
>
>
> We should ensure that we are addressing the _container_ which the QoS 
> controller intended to kill. Without this check, we may run into a scenario 
> where the executor has terminated and one with the same id has started in the 
> interim i.e. running in a different container than the one the QoS controller 
> targeted.
> This most likely requires us to add containerId to the ResourceUsage message 
> and encode the containerID in the QoS Correction message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3462) Containerization issues with mesos running on CoreOS

2015-09-17 Thread Francis Chuang (JIRA)
Francis Chuang created MESOS-3462:
-

 Summary: Containerization issues with mesos running on CoreOS
 Key: MESOS-3462
 URL: https://issues.apache.org/jira/browse/MESOS-3462
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.24.0
 Environment: CoreOS 801.0.0 64-bit
Reporter: Francis Chuang


These are the steps to I used to build mesos 0.24.0 on Ubuntu 15.04 64-bit:

wget http://www.apache.org/dist/mesos/0.24.0/mesos-0.24.0.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-1.5.2.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-util-1.5.4.tar.gz
wget http://mirror.ventraip.net.au/apache/subversion/subversion-1.9.0.tar.gz
wget http://www.sqlite.org/sqlite-amalgamation-3071501.zip
wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz

mkdir /tmp/mesos-build
cd /tmp/mesos-build

# Build apr
tar zxf apr-$APR_VERSION.tar.gz
cd apr-$APR_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr
make
make install
cd ..

# Build apr-util
tar zxf apr-util-$APR_UTIL_VERSION.tar.gz
cd apr-util-$APR_UTIL_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr-util 
--with-apr=/tmp/mesos-build/apr
make
make install
cd ..

# Build libsasl2
tar zxf cyrus-sasl-$SASL_VERSION.tar.gz
cd cyrus-sasl-$SASL_VERSION
./configure CC=gcc-4.8 CPPFLAGS=-I/usr/include/openssl 
--prefix=/tmp/mesos-build/sasl2 --enable-cram
make
make install
cd ..

# Build subversion
tar zxf subversion-$SVN_VERSION.tar.gz
unzip sqlite-amalgamation-$SQLITE_AMALGATION_VERSION.zip
mv sqlite-amalgamation-$SQLITE_AMALGATION_VERSION/ 
subversion-$SVN_VERSION/sqlite-amalgamation/
cd subversion-$SVN_VERSION
./configure CC=gcc-4.8 CXX=g++-4.8 --prefix=/tmp/mesos-build/svn 
--with-apr=/tmp/mesos-build/apr --with-apr-util=/tmp/mesos-build/apr-util 
--with-sasl=/tmp/mesos-build/sasl2
make
make install
cd ..

# Build curl
tar zxf curl-$CURL_VERSION.tar.gz
cd curl-$CURL_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/curl
make
make install
cd ..

# Build mesos
tar zxf mesos-$MESOS_VERSION.tar.gz
cd mesos-$MESOS_VERSION
mkdir build
cd build
../configure CC=gcc-4.8 CXX=g++-4.8 LD_LIBRARY_PATH=/tmp/mesos-build/sasl2/lib 
SASL_PATH=/tmp/mesos-build/sasl2/lib/sasl2 --prefix=/tmp/mesos-build/mesos 
--with-svn=/tmp/mesos-build/svn --with-apr=/tmp/mesos-build/apr 
--with-sasl=/tmp/mesos-build/sasl2/ --with-curl=/tmp/mesos-build/curl
make
make install
cd ..
cd ..

# Copy shared objects into mesos build
cp apr/lib/libapr-1.so.0.5.2 mesos/lib/libapr-1.so.0
cp apr-util/lib/libaprutil-1.so.0.5.4 mesos/lib/libaprutil-1.so.0
cp sasl2/lib/libsasl2.so.3.0.0 mesos/lib/libsasl2.so.3
cp svn/lib/libsvn_delta-1.so.0.0.0 mesos/lib/libsvn_delta-1.so.0
cp svn/lib/libsvn_subr-1.so.0.0.0 mesos/lib/libsvn_subr-1.so.0

I then compress the build into an archive and distributed it onto my CoreOS 
nodes.

Once I have the archive extracted on each node, I start the master and slaves:

/opt/mesos/sbin/mesos-master --zk=zk://192.168.33.10/mesos --quorum=1 
--hostname=192.168.33.10 --ip=192.168.33.10 
--webui_dir=/opt/mesos/share/mesos/webui --cluster=mesos

/opt/mesos/sbin/mesos-slave --hostname=192.168.33.11 --ip=192.168.33.11 
--master=zk://192.168.33.10/mesos 
--executor_environment_variables='{"LD_LIBRARY_PATH": "/opt/mesos/lib", "PATH": 
"/opt/java/bin:/usr/sbin:/usr/bin"}' --containerizers=docker,mesos 
--executor_registration_timeout=60mins --launcher_dir=/opt/mesos/libexec/mesos/

In addition, the following environment variables are set:
LD_LIBRARY_PATH=/opt/mesos/lib/
JAVA_HOME=/opt/java
MESOS_NATIVE_JAVA_LIBRARY=/opt/mesos/lib/libmesos.so

I am finding that when I run meso-hdfs from https://github.com/mesosphere/hdfs, 
the scheduler starts properly and launches the executors. However, the 
executors will fail and terminate without writing any error to stderr and 
stdout.

I have reproduced the same problem with mesos 0.24, 0.23 and 0.22.1

If I install mesos onto a Ubuntu machine (tried 14.04 and 15.04 64-bit) using 
the apt-repositories, this problem does not happen.

I am not well-versed with mesos internals, but it was pointed out that it's 
most likely a containerization issue. This issue on github documents the 
process of trying to get mesos-hdfs to work on my compiled mesos binaries: 
https://github.com/mesosphere/hdfs/issues/194



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3462) Containerization issues with mesos running on CoreOS

2015-09-17 Thread Francis Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Chuang updated MESOS-3462:
--
Description: 
These are the steps to I used to build mesos 0.24.0 on Ubuntu 15.04 64-bit:

wget http://www.apache.org/dist/mesos/0.24.0/mesos-0.24.0.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-1.5.2.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-util-1.5.4.tar.gz
wget http://mirror.ventraip.net.au/apache/subversion/subversion-1.9.0.tar.gz
wget http://www.sqlite.org/sqlite-amalgamation-3071501.zip
wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz

mkdir /tmp/mesos-build
cd /tmp/mesos-build

- Build apr
tar zxf apr-$APR_VERSION.tar.gz
cd apr-$APR_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr
make
make install
cd ..

- Build apr-util
tar zxf apr-util-$APR_UTIL_VERSION.tar.gz
cd apr-util-$APR_UTIL_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr-util 
--with-apr=/tmp/mesos-build/apr
make
make install
cd ..

# Build libsasl2
tar zxf cyrus-sasl-$SASL_VERSION.tar.gz
cd cyrus-sasl-$SASL_VERSION
./configure CC=gcc-4.8 CPPFLAGS=-I/usr/include/openssl 
--prefix=/tmp/mesos-build/sasl2 --enable-cram
make
make install
cd ..

- Build subversion
tar zxf subversion-$SVN_VERSION.tar.gz
unzip sqlite-amalgamation-$SQLITE_AMALGATION_VERSION.zip
mv sqlite-amalgamation-$SQLITE_AMALGATION_VERSION/ 
subversion-$SVN_VERSION/sqlite-amalgamation/
cd subversion-$SVN_VERSION
./configure CC=gcc-4.8 CXX=g++-4.8 --prefix=/tmp/mesos-build/svn 
--with-apr=/tmp/mesos-build/apr --with-apr-util=/tmp/mesos-build/apr-util 
--with-sasl=/tmp/mesos-build/sasl2
make
make install
cd ..

- Build curl
tar zxf curl-$CURL_VERSION.tar.gz
cd curl-$CURL_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/curl
make
make install
cd ..

# Build mesos
tar zxf mesos-$MESOS_VERSION.tar.gz
cd mesos-$MESOS_VERSION
mkdir build
cd build
../configure CC=gcc-4.8 CXX=g++-4.8 LD_LIBRARY_PATH=/tmp/mesos-build/sasl2/lib 
SASL_PATH=/tmp/mesos-build/sasl2/lib/sasl2 --prefix=/tmp/mesos-build/mesos 
--with-svn=/tmp/mesos-build/svn --with-apr=/tmp/mesos-build/apr 
--with-sasl=/tmp/mesos-build/sasl2/ --with-curl=/tmp/mesos-build/curl
make
make install
cd ..
cd ..

- Copy shared objects into mesos build
cp apr/lib/libapr-1.so.0.5.2 mesos/lib/libapr-1.so.0
cp apr-util/lib/libaprutil-1.so.0.5.4 mesos/lib/libaprutil-1.so.0
cp sasl2/lib/libsasl2.so.3.0.0 mesos/lib/libsasl2.so.3
cp svn/lib/libsvn_delta-1.so.0.0.0 mesos/lib/libsvn_delta-1.so.0
cp svn/lib/libsvn_subr-1.so.0.0.0 mesos/lib/libsvn_subr-1.so.0

I then compress the build into an archive and distributed it onto my CoreOS 
nodes.

Once I have the archive extracted on each node, I start the master and slaves:

/opt/mesos/sbin/mesos-master --zk=zk://192.168.33.10/mesos --quorum=1 
--hostname=192.168.33.10 --ip=192.168.33.10 
--webui_dir=/opt/mesos/share/mesos/webui --cluster=mesos

/opt/mesos/sbin/mesos-slave --hostname=192.168.33.11 --ip=192.168.33.11 
--master=zk://192.168.33.10/mesos 
--executor_environment_variables='{"LD_LIBRARY_PATH": "/opt/mesos/lib", "PATH": 
"/opt/java/bin:/usr/sbin:/usr/bin"}' --containerizers=docker,mesos 
--executor_registration_timeout=60mins --launcher_dir=/opt/mesos/libexec/mesos/

In addition, the following environment variables are set:
LD_LIBRARY_PATH=/opt/mesos/lib/
JAVA_HOME=/opt/java
MESOS_NATIVE_JAVA_LIBRARY=/opt/mesos/lib/libmesos.so

I am finding that when I run meso-hdfs from https://github.com/mesosphere/hdfs, 
the scheduler starts properly and launches the executors. However, the 
executors will fail and terminate without writing any error to stderr and 
stdout.

I have reproduced the same problem with mesos 0.24, 0.23 and 0.22.1

If I install mesos onto a Ubuntu machine (tried 14.04 and 15.04 64-bit) using 
the apt-repositories, this problem does not happen.

I am not well-versed with mesos internals, but it was pointed out that it's 
most likely a containerization issue. This issue on github documents the 
process of trying to get mesos-hdfs to work on my compiled mesos binaries: 
https://github.com/mesosphere/hdfs/issues/194

  was:
These are the steps to I used to build mesos 0.24.0 on Ubuntu 15.04 64-bit:

wget http://www.apache.org/dist/mesos/0.24.0/mesos-0.24.0.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-1.5.2.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-util-1.5.4.tar.gz
wget http://mirror.ventraip.net.au/apache/subversion/subversion-1.9.0.tar.gz
wget http://www.sqlite.org/sqlite-amalgamation-3071501.zip
wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz

mkdir /tmp/mesos-build
cd /tmp/mesos-build

# Build apr
tar zxf apr-$APR_VERSION.tar.gz
cd apr-$APR_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr
make
make install
cd ..

# Build apr-util
tar zxf apr-util-$APR_UTIL_VERSION.tar.gz
cd apr-util-$APR_UTIL_VERSION
./configure CC=gcc-4.8 

[jira] [Updated] (MESOS-3462) Containerization issues with mesos running on CoreOS

2015-09-17 Thread Francis Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Chuang updated MESOS-3462:
--
Description: 
These are the steps to I used to build mesos 0.24.0 on Ubuntu 15.04 64-bit:

wget http://www.apache.org/dist/mesos/0.24.0/mesos-0.24.0.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-1.5.2.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-util-1.5.4.tar.gz
wget http://mirror.ventraip.net.au/apache/subversion/subversion-1.9.0.tar.gz
wget http://www.sqlite.org/sqlite-amalgamation-3071501.zip
wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz

mkdir /tmp/mesos-build
cd /tmp/mesos-build

- Build apr
tar zxf apr-$APR_VERSION.tar.gz
cd apr-$APR_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr
make
make install
cd ..

- Build apr-util
tar zxf apr-util-$APR_UTIL_VERSION.tar.gz
cd apr-util-$APR_UTIL_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr-util 
--with-apr=/tmp/mesos-build/apr
make
make install
cd ..

- Build libsasl2
tar zxf cyrus-sasl-$SASL_VERSION.tar.gz
cd cyrus-sasl-$SASL_VERSION
./configure CC=gcc-4.8 CPPFLAGS=-I/usr/include/openssl 
--prefix=/tmp/mesos-build/sasl2 --enable-cram
make
make install
cd ..

- Build subversion
tar zxf subversion-$SVN_VERSION.tar.gz
unzip sqlite-amalgamation-$SQLITE_AMALGATION_VERSION.zip
mv sqlite-amalgamation-$SQLITE_AMALGATION_VERSION/ 
subversion-$SVN_VERSION/sqlite-amalgamation/
cd subversion-$SVN_VERSION
./configure CC=gcc-4.8 CXX=g++-4.8 --prefix=/tmp/mesos-build/svn 
--with-apr=/tmp/mesos-build/apr --with-apr-util=/tmp/mesos-build/apr-util 
--with-sasl=/tmp/mesos-build/sasl2
make
make install
cd ..

- Build curl
tar zxf curl-$CURL_VERSION.tar.gz
cd curl-$CURL_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/curl
make
make install
cd ..

- Build mesos
tar zxf mesos-$MESOS_VERSION.tar.gz
cd mesos-$MESOS_VERSION
mkdir build
cd build
../configure CC=gcc-4.8 CXX=g++-4.8 LD_LIBRARY_PATH=/tmp/mesos-build/sasl2/lib 
SASL_PATH=/tmp/mesos-build/sasl2/lib/sasl2 --prefix=/tmp/mesos-build/mesos 
--with-svn=/tmp/mesos-build/svn --with-apr=/tmp/mesos-build/apr 
--with-sasl=/tmp/mesos-build/sasl2/ --with-curl=/tmp/mesos-build/curl
make
make install
cd ..
cd ..

- Copy shared objects into mesos build
cp apr/lib/libapr-1.so.0.5.2 mesos/lib/libapr-1.so.0
cp apr-util/lib/libaprutil-1.so.0.5.4 mesos/lib/libaprutil-1.so.0
cp sasl2/lib/libsasl2.so.3.0.0 mesos/lib/libsasl2.so.3
cp svn/lib/libsvn_delta-1.so.0.0.0 mesos/lib/libsvn_delta-1.so.0
cp svn/lib/libsvn_subr-1.so.0.0.0 mesos/lib/libsvn_subr-1.so.0

I then compress the build into an archive and distributed it onto my CoreOS 
nodes.

Once I have the archive extracted on each node, I start the master and slaves:

/opt/mesos/sbin/mesos-master --zk=zk://192.168.33.10/mesos --quorum=1 
--hostname=192.168.33.10 --ip=192.168.33.10 
--webui_dir=/opt/mesos/share/mesos/webui --cluster=mesos

/opt/mesos/sbin/mesos-slave --hostname=192.168.33.11 --ip=192.168.33.11 
--master=zk://192.168.33.10/mesos 
--executor_environment_variables='{"LD_LIBRARY_PATH": "/opt/mesos/lib", "PATH": 
"/opt/java/bin:/usr/sbin:/usr/bin"}' --containerizers=docker,mesos 
--executor_registration_timeout=60mins --launcher_dir=/opt/mesos/libexec/mesos/

In addition, the following environment variables are set:
LD_LIBRARY_PATH=/opt/mesos/lib/
JAVA_HOME=/opt/java
MESOS_NATIVE_JAVA_LIBRARY=/opt/mesos/lib/libmesos.so

I am finding that when I run meso-hdfs from https://github.com/mesosphere/hdfs, 
the scheduler starts properly and launches the executors. However, the 
executors will fail and terminate without writing any error to stderr and 
stdout.

I have reproduced the same problem with mesos 0.24, 0.23 and 0.22.1

If I install mesos onto a Ubuntu machine (tried 14.04 and 15.04 64-bit) using 
the apt-repositories, this problem does not happen.

I am not well-versed with mesos internals, but it was pointed out that it's 
most likely a containerization issue. This issue on github documents the 
process of trying to get mesos-hdfs to work on my compiled mesos binaries: 
https://github.com/mesosphere/hdfs/issues/194

  was:
These are the steps to I used to build mesos 0.24.0 on Ubuntu 15.04 64-bit:

wget http://www.apache.org/dist/mesos/0.24.0/mesos-0.24.0.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-1.5.2.tar.gz
wget http://mirror.ventraip.net.au/apache/apr/apr-util-1.5.4.tar.gz
wget http://mirror.ventraip.net.au/apache/subversion/subversion-1.9.0.tar.gz
wget http://www.sqlite.org/sqlite-amalgamation-3071501.zip
wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz

mkdir /tmp/mesos-build
cd /tmp/mesos-build

- Build apr
tar zxf apr-$APR_VERSION.tar.gz
cd apr-$APR_VERSION
./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr
make
make install
cd ..

- Build apr-util
tar zxf apr-util-$APR_UTIL_VERSION.tar.gz
cd apr-util-$APR_UTIL_VERSION
./configure CC=gcc-4.8 

[jira] [Assigned] (MESOS-1961) Ensure executor state is correctly reconciled between master and slave.

2015-09-17 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-1961:
---

Assignee: Klaus Ma

> Ensure executor state is correctly reconciled between master and slave.
> ---
>
> Key: MESOS-1961
> URL: https://issues.apache.org/jira/browse/MESOS-1961
> Project: Mesos
>  Issue Type: Epic
>  Components: master, slave
>Reporter: Benjamin Mahler
>Assignee: Klaus Ma
>
> The master and slave should correctly reconcile the state of executors, much 
> like the master and slave now correctly reconcile task state (MESOS-1407).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3462) Containerization issues with mesos running on CoreOS

2015-09-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-3462:
---

Assignee: haosdent

> Containerization issues with mesos running on CoreOS
> 
>
> Key: MESOS-3462
> URL: https://issues.apache.org/jira/browse/MESOS-3462
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.24.0
> Environment: CoreOS 801.0.0 64-bit
>Reporter: Francis Chuang
>Assignee: haosdent
>
> These are the steps to I used to build mesos 0.24.0 on Ubuntu 15.04 64-bit:
> wget http://www.apache.org/dist/mesos/0.24.0/mesos-0.24.0.tar.gz
> wget http://mirror.ventraip.net.au/apache/apr/apr-1.5.2.tar.gz
> wget http://mirror.ventraip.net.au/apache/apr/apr-util-1.5.4.tar.gz
> wget http://mirror.ventraip.net.au/apache/subversion/subversion-1.9.0.tar.gz
> wget http://www.sqlite.org/sqlite-amalgamation-3071501.zip
> wget ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz
> mkdir /tmp/mesos-build
> cd /tmp/mesos-build
> - Build apr
> tar zxf apr-$APR_VERSION.tar.gz
> cd apr-$APR_VERSION
> ./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr
> make
> make install
> cd ..
> - Build apr-util
> tar zxf apr-util-$APR_UTIL_VERSION.tar.gz
> cd apr-util-$APR_UTIL_VERSION
> ./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/apr-util 
> --with-apr=/tmp/mesos-build/apr
> make
> make install
> cd ..
> - Build libsasl2
> tar zxf cyrus-sasl-$SASL_VERSION.tar.gz
> cd cyrus-sasl-$SASL_VERSION
> ./configure CC=gcc-4.8 CPPFLAGS=-I/usr/include/openssl 
> --prefix=/tmp/mesos-build/sasl2 --enable-cram
> make
> make install
> cd ..
> - Build subversion
> tar zxf subversion-$SVN_VERSION.tar.gz
> unzip sqlite-amalgamation-$SQLITE_AMALGATION_VERSION.zip
> mv sqlite-amalgamation-$SQLITE_AMALGATION_VERSION/ 
> subversion-$SVN_VERSION/sqlite-amalgamation/
> cd subversion-$SVN_VERSION
> ./configure CC=gcc-4.8 CXX=g++-4.8 --prefix=/tmp/mesos-build/svn 
> --with-apr=/tmp/mesos-build/apr --with-apr-util=/tmp/mesos-build/apr-util 
> --with-sasl=/tmp/mesos-build/sasl2
> make
> make install
> cd ..
> - Build curl
> tar zxf curl-$CURL_VERSION.tar.gz
> cd curl-$CURL_VERSION
> ./configure CC=gcc-4.8 --prefix=/tmp/mesos-build/curl
> make
> make install
> cd ..
> - Build mesos
> tar zxf mesos-$MESOS_VERSION.tar.gz
> cd mesos-$MESOS_VERSION
> mkdir build
> cd build
> ../configure CC=gcc-4.8 CXX=g++-4.8 
> LD_LIBRARY_PATH=/tmp/mesos-build/sasl2/lib 
> SASL_PATH=/tmp/mesos-build/sasl2/lib/sasl2 --prefix=/tmp/mesos-build/mesos 
> --with-svn=/tmp/mesos-build/svn --with-apr=/tmp/mesos-build/apr 
> --with-sasl=/tmp/mesos-build/sasl2/ --with-curl=/tmp/mesos-build/curl
> make
> make install
> cd ..
> cd ..
> - Copy shared objects into mesos build
> cp apr/lib/libapr-1.so.0.5.2 mesos/lib/libapr-1.so.0
> cp apr-util/lib/libaprutil-1.so.0.5.4 mesos/lib/libaprutil-1.so.0
> cp sasl2/lib/libsasl2.so.3.0.0 mesos/lib/libsasl2.so.3
> cp svn/lib/libsvn_delta-1.so.0.0.0 mesos/lib/libsvn_delta-1.so.0
> cp svn/lib/libsvn_subr-1.so.0.0.0 mesos/lib/libsvn_subr-1.so.0
> I then compress the build into an archive and distributed it onto my CoreOS 
> nodes.
> Once I have the archive extracted on each node, I start the master and slaves:
> /opt/mesos/sbin/mesos-master --zk=zk://192.168.33.10/mesos --quorum=1 
> --hostname=192.168.33.10 --ip=192.168.33.10 
> --webui_dir=/opt/mesos/share/mesos/webui --cluster=mesos
> /opt/mesos/sbin/mesos-slave --hostname=192.168.33.11 --ip=192.168.33.11 
> --master=zk://192.168.33.10/mesos 
> --executor_environment_variables='{"LD_LIBRARY_PATH": "/opt/mesos/lib", 
> "PATH": "/opt/java/bin:/usr/sbin:/usr/bin"}' --containerizers=docker,mesos 
> --executor_registration_timeout=60mins 
> --launcher_dir=/opt/mesos/libexec/mesos/
> In addition, the following environment variables are set:
> LD_LIBRARY_PATH=/opt/mesos/lib/
> JAVA_HOME=/opt/java
> MESOS_NATIVE_JAVA_LIBRARY=/opt/mesos/lib/libmesos.so
> I am finding that when I run meso-hdfs from 
> https://github.com/mesosphere/hdfs, the scheduler starts properly and 
> launches the executors. However, the executors will fail and terminate 
> without writing any error to stderr and stdout.
> I have reproduced the same problem with mesos 0.24, 0.23 and 0.22.1
> If I install mesos onto a Ubuntu machine (tried 14.04 and 15.04 64-bit) using 
> the apt-repositories, this problem does not happen.
> I am not well-versed with mesos internals, but it was pointed out that it's 
> most likely a containerization issue. This issue on github documents the 
> process of trying to get mesos-hdfs to work on my compiled mesos binaries: 
> https://github.com/mesosphere/hdfs/issues/194



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1961) Ensure executor state is correctly reconciled between master and slave.

2015-09-17 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804926#comment-14804926
 ] 

Klaus Ma commented on MESOS-1961:
-

[~bmahler], would you shepherd this enhancemen? i would like to draft an design 
doc.

> Ensure executor state is correctly reconciled between master and slave.
> ---
>
> Key: MESOS-1961
> URL: https://issues.apache.org/jira/browse/MESOS-1961
> Project: Mesos
>  Issue Type: Epic
>  Components: master, slave
>Reporter: Benjamin Mahler
>Assignee: Klaus Ma
>
> The master and slave should correctly reconcile the state of executors, much 
> like the master and slave now correctly reconcile task state (MESOS-1407).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2728) Introduce concept of cluster wide resources.

2015-09-17 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma reassigned MESOS-2728:
---

Assignee: Klaus Ma

> Introduce concept of cluster wide resources.
> 
>
> Key: MESOS-2728
> URL: https://issues.apache.org/jira/browse/MESOS-2728
> Project: Mesos
>  Issue Type: Epic
>Reporter: Joerg Schad
>Assignee: Klaus Ma
>  Labels: mesosphere
>
> There are resources which are not provided by a single node. Consider for 
> example a external Network Bandwidth of a cluster. Being a limited resource 
> it makes sense for Mesos to manage it but still it is not a resource being 
> offered by a single node. A cluster-wide resource is still consumed by a 
> task, and when that task completes, the resources are then available to be 
> allocated to another framework/task.
> Use Cases:
> 1. Network Bandwidth
> 2. IP Addresses
> 3. Global Service Ports
> 2. Distributed File System Storage
> 3. Software Licences



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2930) Allow the Resource Estimator to express over-allocation of revocable resources.

2015-09-17 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804942#comment-14804942
 ] 

Klaus Ma commented on MESOS-2930:
-

i'm thinking to use this to adjust resources dynamcally, for example,  if 
hardware changed, we gonna increase or decrease accordingly without restarting 
the slave.

> Allow the Resource Estimator to express over-allocation of revocable 
> resources.
> ---
>
> Key: MESOS-2930
> URL: https://issues.apache.org/jira/browse/MESOS-2930
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Benjamin Mahler
>Assignee: Klaus Ma
>
> Currently the resource estimator returns the amount of oversubscription 
> resources that are available, since resources cannot be negative, this allows 
> the resource estimator to express the following:
> (1) Return empty resources: We are fully allocated for oversubscription 
> resources.
> (2) Return non-empty resources: We are under-allocated for oversubscription 
> resources. In other words, some are available.
> However, there is an additional situation that we cannot express:
> (3) Analogous to returning non-empty "negative" resources: We are 
> over-allocated for oversubscription resources. Do not re-offer any of the 
> over-allocated oversubscription resources that are recovered.
> Without (3), the slave can only shrink the total pool of oversubscription 
> resources by returning (1) as resources are recovered, until the pool is 
> shrunk to the desired size. However, this approach is only best-effort, it's 
> possible for a framework to launch more tasks in the window of time (15 
> seconds by default) that the slave polls the estimator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2560) Remove RunTaskMessage.framework_id

2015-09-17 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804947#comment-14804947
 ] 

Adam B commented on MESOS-2560:
---

Not actively in progress and not critical for 0.25, so deferring.

> Remove RunTaskMessage.framework_id
> --
>
> Key: MESOS-2560
> URL: https://issues.apache.org/jira/browse/MESOS-2560
> Project: Mesos
>  Issue Type: Task
>  Components: framework
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> The previous release doesn't use framework_id and so it can be safely removed.
> This should land only after https://issues.apache.org/jira/browse/MESOS-2559 
> has been shipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)