[jira] [Commented] (MESOS-1930) Expose TASK_KILLED reason.

2014-11-04 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195985#comment-14195985
 ] 

Alexander Rukletsov commented on MESOS-1930:


Reason may be not a precise term for what I mean. If a task is killed, I 
would like to get feedback how it terminated: gracefully or hard by a timeout.  
In case of CommandExecutor this is supported via signal escalation. If an 
executor doesn't distinguish between soft and hard kill, it can simply ignore 
the second tier state. How about introducing something like 
{{REASON_KILL_TIMEOUT}}?

 Expose TASK_KILLED reason.
 --

 Key: MESOS-1930
 URL: https://issues.apache.org/jira/browse/MESOS-1930
 Project: Mesos
  Issue Type: Story
Reporter: Alexander Rukletsov
Assignee: Dominic Hamon
Priority: Minor

 A task process may be killed by a SIGTERM or SIGKILL. The only possibility to 
 check how the task process has exited is to examine the message: 
 {{status.message().find(Terminated)}}. However, a task may not run in its 
 own process, hence the executor may not be able to provide an exit status. 
 What we actually want is an artificial task exit status that is rendered by 
 the executor.
 This may be resolved by adding second tier states or state explanations. Here 
 is a link to a discussion: https://reviews.apache.org/r/26382/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2038) Remove dead code in Slave::_runTask

2014-11-04 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195993#comment-14195993
 ] 

Bernd Mathiske commented on MESOS-2038:
---

https://reviews.apache.org/r/27567/

 Remove dead code in Slave::_runTask
 ---

 Key: MESOS-2038
 URL: https://issues.apache.org/jira/browse/MESOS-2038
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.21.0
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
Priority: Trivial
  Labels: newbie
 Fix For: 0.22.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 In the course of fixing MESOS-947, it has been overlooked by me that the code 
 in question became dead code. Coverty caught this. 
 At the top of _runTask(), there is now a test whether framework is NULL and 
 in each case it ever is before reaching the code in question (see below) 
 there is a local exit from the method. So we should be able to remove this 
 code without effect:
 -  if (framework == NULL) {
 -framework = new Framework(this, frameworkId, frameworkInfo, pid);
 -frameworks[frameworkId] = framework;
 -  }
 -
 -  CHECK_NOTNULL(framework);
 -



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1935) Replace hard-coded reap interval with a constant

2014-11-04 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1935:
---
Description: With https://issues.apache.org/jira/browse/MESOS-1846 
implemented, replace the hard-coded value for the maximal reap interval (1s) 
with the constant from {{reap.hpp}}. This will mostly affect tests.  (was: With 
https://issues.apache.org/jira/browse/MESOS-1846 implemented, replace the 
hard-coded value for the maximal reap interval (1s) with the constant from 
{{reap.hpp}}.)

 Replace hard-coded reap interval with a constant
 

 Key: MESOS-1935
 URL: https://issues.apache.org/jira/browse/MESOS-1935
 Project: Mesos
  Issue Type: Task
  Components: test
Reporter: Alexander Rukletsov
Priority: Trivial
  Labels: newbie

 With https://issues.apache.org/jira/browse/MESOS-1846 implemented, replace 
 the hard-coded value for the maximal reap interval (1s) with the constant 
 from {{reap.hpp}}. This will mostly affect tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-1807) Disallow executors with cpu only or memory only resources

2014-11-04 Thread Timothy St. Clair (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy St. Clair updated MESOS-1807:
-
Comment: was deleted

(was: Was the original goal of cpu+memory only offers to enable resizing?  The 
behavior change had caused other frameworks to fail, namely Spark.  

Ideally I would love to have some integration testing on mods like this.  
[~tnachen] ^ 


)

 Disallow executors with cpu only or memory only resources
 -

 Key: MESOS-1807
 URL: https://issues.apache.org/jira/browse/MESOS-1807
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Vinod Kone
  Labels: newbie

 Currently master allows executors to be launched with either only cpus or 
 only memory but we shouldn't allow that.
 This is because executor is an actual unix process that is launched by the 
 slave. If an executor doesn't specify cpus, what should do the cpu limits be 
 for that executor when there are no tasks running on it? If no cpu limits are 
 set then it might starve other executors/tasks on the slave violating 
 isolation guarantees. Same goes with memory. Moreover, the current 
 containerizer/isolator code will throw failures when using such an executor, 
 e.g., when the last task on the executor finishes and Containerizer::update() 
 is called with 0 cpus or 0 mem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources

2014-11-04 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196173#comment-14196173
 ] 

Timothy St. Clair commented on MESOS-1807:
--

Was the original goal of cpu+memory only offers to enable resizing?  The 
behavior change had caused other frameworks to fail, namely Spark.  

Ideally I would love to have some integration testing on mods like this.  
[~tnachen] ^ 




 Disallow executors with cpu only or memory only resources
 -

 Key: MESOS-1807
 URL: https://issues.apache.org/jira/browse/MESOS-1807
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Vinod Kone
  Labels: newbie

 Currently master allows executors to be launched with either only cpus or 
 only memory but we shouldn't allow that.
 This is because executor is an actual unix process that is launched by the 
 slave. If an executor doesn't specify cpus, what should do the cpu limits be 
 for that executor when there are no tasks running on it? If no cpu limits are 
 set then it might starve other executors/tasks on the slave violating 
 isolation guarantees. Same goes with memory. Moreover, the current 
 containerizer/isolator code will throw failures when using such an executor, 
 e.g., when the last task on the executor finishes and Containerizer::update() 
 is called with 0 cpus or 0 mem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources

2014-11-04 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196174#comment-14196174
 ] 

Timothy St. Clair commented on MESOS-1807:
--

Was the original goal of cpu+memory only offers to enable resizing?  The 
behavior change had caused other frameworks to fail, namely Spark.  

Ideally I would love to have some integration testing on mods like this.  
[~tnachen] ^ 




 Disallow executors with cpu only or memory only resources
 -

 Key: MESOS-1807
 URL: https://issues.apache.org/jira/browse/MESOS-1807
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Vinod Kone
  Labels: newbie

 Currently master allows executors to be launched with either only cpus or 
 only memory but we shouldn't allow that.
 This is because executor is an actual unix process that is launched by the 
 slave. If an executor doesn't specify cpus, what should do the cpu limits be 
 for that executor when there are no tasks running on it? If no cpu limits are 
 set then it might starve other executors/tasks on the slave violating 
 isolation guarantees. Same goes with memory. Moreover, the current 
 containerizer/isolator code will throw failures when using such an executor, 
 e.g., when the last task on the executor finishes and Containerizer::update() 
 is called with 0 cpus or 0 mem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2037) Update docs/configuration.md

2014-11-04 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-2037:
--
Sprint: Mesosphere Q4 Sprint 2

 Update docs/configuration.md
 

 Key: MESOS-2037
 URL: https://issues.apache.org/jira/browse/MESOS-2037
 Project: Mesos
  Issue Type: Documentation
Reporter: Kapil Arya
Assignee: Kapil Arya
Priority: Blocker

 Update documentation for configuration flags (docs/configuration.md) to 
 reflect the current state.https://reviews.apache.org/r/27556/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1974) Refactor the C++ 'Resources' abstraction.

2014-11-04 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196425#comment-14196425
 ] 

Jie Yu commented on MESOS-1974:
---

https://reviews.apache.org/r/27555

 Refactor the C++ 'Resources' abstraction.
 -

 Key: MESOS-1974
 URL: https://issues.apache.org/jira/browse/MESOS-1974
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu
Assignee: Jie Yu

 The existing C++ 'Resources' interfaces are poorly designed. Some of them are 
 confusing and unintuitive. Some of them are overloaded with too many 
 functionalities. For instance,
 {noformat}
 bool operator = (const Resource left, const Resource right);
 {noformat}
 This interface in non-intuitive because A = B doesn't imply !(B = A).
 {noformat}
 Resource operator + (const Resource left, const Resource right);
 {noformat}
 This one is also non-intuitive because if 'left' is not compatible with 
 'right', the result is 'left' (why not right???). Similar for operator '-'.
 {noformat}
 OptionResource Resources::get(const Resource r) const;
 {noformat}
 This one assume Resources is flattened, but it might not be.
 As we start to introduce persistent disk resources (MESOS-1554), things will 
 get more complicated. For example, one may want to get two types of 'disk()' 
 functions: one returns the ephemeral disk bytes (with no disk info), one 
 returns the total disk bytes (including ones that have disk info). We may 
 wanna introduce a concept about Resource that indicates that a resource 
 cannot be merged or split (e.g., atomic?).
 Since we need to change this class anyway. I wanna take this chance to 
 refactor it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1768) Provide default image settings for containerizers

2014-11-04 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196519#comment-14196519
 ] 

Timothy Chen commented on MESOS-1768:
-

[~idownes], This is already merged right?

 Provide default image settings for containerizers
 -

 Key: MESOS-1768
 URL: https://issues.apache.org/jira/browse/MESOS-1768
 Project: Mesos
  Issue Type: Story
Reporter: Timothy Chen
Assignee: Ian Downes

 We want to be able to specify a default image setting for all the mesos 
 containerizers (external, mesos, docker, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1886) Allow docker pull on each run to be configurable

2014-11-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-1886:

Summary: Allow docker pull on each run to be configurable  (was: Always 
`docker pull` if explicit :latest tag is present)

 Allow docker pull on each run to be configurable
 

 Key: MESOS-1886
 URL: https://issues.apache.org/jira/browse/MESOS-1886
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Affects Versions: 0.20.1
Reporter: Chris Heller
Assignee: Timothy Chen
Priority: Minor
  Labels: docker

 With 0.20.1 the behavior of a docker container has changed (see MESOS-1762).
 This change brings the docker behavior more in line with that of {{docker 
 run}}.
 I propose,if the image given explicitly has the :latest tag, this should 
 signify to mesos that an unconditional `docker pull` should be done on the 
 image... and if it should fail for any reason (i.e. the registry is 
 unavailable) we fall back to the current behavior.
 This would break slightly with the semantics of how the docker command line 
 operates, but the alternative is to require explicit tags on every release -- 
 which is a hinderance when developing a new image, or one must log in to each 
 node and run an explicit `docker pull`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1886) Always `docker pull` if explicit :latest tag is present

2014-11-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen reassigned MESOS-1886:
---

Assignee: Timothy Chen

 Always `docker pull` if explicit :latest tag is present
 -

 Key: MESOS-1886
 URL: https://issues.apache.org/jira/browse/MESOS-1886
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Affects Versions: 0.20.1
Reporter: Chris Heller
Assignee: Timothy Chen
Priority: Minor
  Labels: docker

 With 0.20.1 the behavior of a docker container has changed (see MESOS-1762).
 This change brings the docker behavior more in line with that of {{docker 
 run}}.
 I propose,if the image given explicitly has the :latest tag, this should 
 signify to mesos that an unconditional `docker pull` should be done on the 
 image... and if it should fail for any reason (i.e. the registry is 
 unavailable) we fall back to the current behavior.
 This would break slightly with the semantics of how the docker command line 
 operates, but the alternative is to require explicit tags on every release -- 
 which is a hinderance when developing a new image, or one must log in to each 
 node and run an explicit `docker pull`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2018) Dynamic Reservations

2014-11-04 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-2018:

Epic Name: Dynamic Reservations  (was: Dynamic Resource Reservations)
  Summary: Dynamic Reservations  (was: Dynamic Resource Reservations)

 Dynamic Reservations
 

 Key: MESOS-2018
 URL: https://issues.apache.org/jira/browse/MESOS-2018
 Project: Mesos
  Issue Type: Epic
  Components: allocation, framework, master, slave
Reporter: Adam B
Assignee: Michael Park
  Labels: offer, persistence, reservations, resource, stateful, 
 storage

 This is a feature to provide better support for running stateful services on 
 Mesos such as HDFS (Distributed Filesystem), Cassandra (Distributed 
 Database), or MySQL (Local Database).
 Current resource reservations (henceforth called static reservations) are 
 statically determined by the slave operator at slave start time, and 
 individual frameworks have no authority to reserve resources themselves.
 Dynamic reservations allow a framework to dynamically/lazily reserve offered 
 resources at task launch time, such that when that task completes, those 
 resources will only be re-offered to the same framework (or other frameworks 
 with the same role).
 This is especially useful if the framework's task stored some state on the 
 slave, and needs a guaranteed set of resources reserved so that it can 
 re-launch a task on the same slave to recover that state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1886) Allow docker pull on each run to be configurable

2014-11-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-1886:

Sprint: Mesosphere Q4 Sprint 2 - 11/14

 Allow docker pull on each run to be configurable
 

 Key: MESOS-1886
 URL: https://issues.apache.org/jira/browse/MESOS-1886
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Affects Versions: 0.20.1
Reporter: Chris Heller
Assignee: Timothy Chen
Priority: Minor
  Labels: docker

 With 0.20.1 the behavior of a docker container has changed (see MESOS-1762).
 This change brings the docker behavior more in line with that of {{docker 
 run}}.
 I propose,if the image given explicitly has the :latest tag, this should 
 signify to mesos that an unconditional `docker pull` should be done on the 
 image... and if it should fail for any reason (i.e. the registry is 
 unavailable) we fall back to the current behavior.
 This would break slightly with the semantics of how the docker command line 
 operates, but the alternative is to require explicit tags on every release -- 
 which is a hinderance when developing a new image, or one must log in to each 
 node and run an explicit `docker pull`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2002) Module loading within frameworks

2014-11-04 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-2002:
--
Sprint: Mesosphere Q4 Sprint 2 - 11/14

 Module loading within frameworks 
 -

 Key: MESOS-2002
 URL: https://issues.apache.org/jira/browse/MESOS-2002
 Project: Mesos
  Issue Type: Improvement
  Components: framework, modules
Reporter: Till Toenshoff
Assignee: Kapil Arya
Priority: Blocker

 Frameworks should be granted the capability to load modules. 
 h4.Motivation
 Allowing a modularized Authenticatee to cover framework authentication 
 against the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2001) Authenticatee modules similar to Authenticator modules

2014-11-04 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff reassigned MESOS-2001:
-

Assignee: Till Toenshoff

 Authenticatee modules similar to Authenticator modules
 --

 Key: MESOS-2001
 URL: https://issues.apache.org/jira/browse/MESOS-2001
 Project: Mesos
  Issue Type: Epic
  Components: modules
Reporter: Till Toenshoff
Assignee: Till Toenshoff
  Labels: authentication, module

 For covering a complete modules based authentication, we will need to allow 
 for authenticatee modules just like we are with authenticator modules.
 h4.Motivation
 Allow for third parties to quickly develop and plug-in new authentication 
 methods. The modularized Authenticatee API will lower the barrier for the 
 community to provide new methods to Mesos. An example for such additional, 
 next step module could be PAM (LDAP, MySQL, NIS, UNIX) backed authentication. 
 cyrus-sasl2 itself already offers more than a half a dozen mechanisms via its 
 standard plugins and these could be triggered by additional Authenticator / 
 Authenticatee modules. cyrus-sasl2 does support even more mechanisms when 
 being custom built (about a full dozen) but we do not want to bundle 
 cyrus-sasl2 to enforce custom builds. Alternative authentication (especially 
 non-SASL based) methods may bring in new dependencies that we don't want to 
 enforce on all of our users. Mesos users may be required to use custom 
 authentication techniques due to strict security policies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1886) Allow docker pull on each run to be configurable

2014-11-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-1886:

Shepherd: Benjamin Hindman

 Allow docker pull on each run to be configurable
 

 Key: MESOS-1886
 URL: https://issues.apache.org/jira/browse/MESOS-1886
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Affects Versions: 0.20.1
Reporter: Chris Heller
Assignee: Timothy Chen
Priority: Minor
  Labels: docker

 With 0.20.1 the behavior of a docker container has changed (see MESOS-1762).
 This change brings the docker behavior more in line with that of {{docker 
 run}}.
 I propose,if the image given explicitly has the :latest tag, this should 
 signify to mesos that an unconditional `docker pull` should be done on the 
 image... and if it should fail for any reason (i.e. the registry is 
 unavailable) we fall back to the current behavior.
 This would break slightly with the semantics of how the docker command line 
 operates, but the alternative is to require explicit tags on every release -- 
 which is a hinderance when developing a new image, or one must log in to each 
 node and run an explicit `docker pull`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2016) docker_name_prefix is too generic

2014-11-04 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-2016:

Shepherd: Benjamin Hindman

 docker_name_prefix is too generic
 -

 Key: MESOS-2016
 URL: https://issues.apache.org/jira/browse/MESOS-2016
 Project: Mesos
  Issue Type: Bug
Reporter: Jay Buffington
Assignee: Timothy Chen

 From docker.hpp and docker.cpp:
 {quote}
 // Prefix used to name Docker containers in order to distinguish those
 // created by Mesos from those created manually.
 extern std::string DOCKER_NAME_PREFIX;
 // TODO(benh): At some point to run multiple slaves we'll need to make
 // the Docker container name creation include the slave ID.
 string DOCKER_NAME_PREFIX = mesos-;
 {quote}
 This name is too generic.  A common pattern in docker land is to run 
 everything in a container and use volume mounts to share sockets do RPC 
 between containers.  CoreOS has popularized this technique. 
 Inevitably, what people do is start a container named mesos-slave which 
 runs the docker containerizer recovery code which removes all containers that 
 start with mesos-  And then ask huh, why did my mesos-slave docker 
 container die? I don't see any error messages...
 Ideally, we should do what Ben suggested and add the slave id to the name 
 prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2039) Create a Design Doc

2014-11-04 Thread Michael Park (JIRA)
Michael Park created MESOS-2039:
---

 Summary: Create a Design Doc
 Key: MESOS-2039
 URL: https://issues.apache.org/jira/browse/MESOS-2039
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Michael Park
Assignee: Michael Park


A design doc to be shared with the community for the Dynamic Reservation epic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1291) Use clang-format to automatically format code to style

2014-11-04 Thread John Pampuch (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Pampuch updated MESOS-1291:

Sprint: Mesosphere Q4 Sprint 2 - 11/14

 Use clang-format to automatically format code to style
 --

 Key: MESOS-1291
 URL: https://issues.apache.org/jira/browse/MESOS-1291
 Project: Mesos
  Issue Type: Improvement
  Components: technical debt
Reporter: Dominic Hamon
Assignee: Michael Park
  Labels: style

 Instead of relying on a script to check and report style errors, we should 
 move to a workflow that allows people to write code how they feel comfortable 
 and then automatically format it to conform to our style guide.
 The Chromium style from clang-format 
 (http://clang.llvm.org/docs/ClangFormat.html) is very close to our style 
 except for the dropped braces on class, struct, and function definitions, and 
 two lines of whitespace between method definitions outside a class. As such, 
 we should consider adopting clang-format and patching it to include a Mesos 
 style variant.
 It can be run as part of post-reviews or as a git commit hook, or manually 
 from within editors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2002) Module loading within frameworks

2014-11-04 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2002:
--
Target Version/s: 0.22.0
Shepherd: Adam B

 Module loading within frameworks 
 -

 Key: MESOS-2002
 URL: https://issues.apache.org/jira/browse/MESOS-2002
 Project: Mesos
  Issue Type: Improvement
  Components: framework, modules
Reporter: Till Toenshoff
Assignee: Kapil Arya
Priority: Blocker

 Frameworks should be granted the capability to load modules. 
 h4.Motivation
 Allowing a modularized Authenticatee to cover framework authentication 
 against the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1839) Modify configure.ac to fix --with-sasl

2014-11-04 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park resolved MESOS-1839.
-
  Resolution: Fixed
   Fix Version/s: 0.22.0
Target Version/s: 0.22.0

 Modify configure.ac to fix --with-sasl
 --

 Key: MESOS-1839
 URL: https://issues.apache.org/jira/browse/MESOS-1839
 Project: Mesos
  Issue Type: Bug
  Components: build
Reporter: Michael Park
Assignee: Michael Park
Priority: Minor
 Fix For: 0.22.0


 Specifying a custom {{libcurl}} directory via {{\-\-with-curl}} works well, 
 but {{--with-sasl}} doesn't work.
 {{libcurl}} installed at {{$HOME/libcurl}} and {{libsasl2}} installed at 
 {{$HOME/libsasl}}.
 Ran: {{../configure --with-curl=$HOME/libcurl --with-sasl=$HOME/libsasl}}
 {quote}
 checking for curl_global_init in -lcurl... yes
 checking for sasl_done in -lsasl2... no
 configure: error: cannot find libsasl2
 ---
 We need libsasl2 for authentication!
 ---
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-1181) Improve cpplint rule coverage

2014-11-04 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B resolved MESOS-1181.
---
   Resolution: Fixed
Fix Version/s: 0.21.0

Closing now, since we have made significant progress in style-checking in the 
past few releases.
We can open new tickets for individual changes we'd like to make, new rules to 
add, etc.

 Improve cpplint rule coverage
 -

 Key: MESOS-1181
 URL: https://issues.apache.org/jira/browse/MESOS-1181
 Project: Mesos
  Issue Type: Improvement
Reporter: Adam B
Assignee: Adam B
Priority: Minor
  Labels: lint, style
 Fix For: 0.21.0


 ReviewBot is checking our patches' style for us, and we can check it 
 ourselves with support/mesos-style.py, but there are only a few rules enabled 
 right now. I plan to enable more rules, fixing lint errors as needed. I'd 
 also like to add new style rules for things like single/double line spacing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-1630) Remove framework from completedFrameworks if framework re-registers.

2014-11-04 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B closed MESOS-1630.
-
  Resolution: Won't Fix
Target Version/s:   (was: 0.20.0)

Closing in favor of MESOS-1219 and MESOS-1719.

 Remove framework from completedFrameworks if framework re-registers.
 

 Key: MESOS-1630
 URL: https://issues.apache.org/jira/browse/MESOS-1630
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.14.0, 0.14.1, 0.14.2, 0.17.0, 0.16.0, 0.15.0, 0.18.0, 
 0.18.1, 0.18.2, 0.19.0, 0.19.1
Reporter: Benjamin Hindman
Assignee: Bernd Mathiske
Priority: Critical

 If a framework gets removed, for example, because it unregisters with the 
 master (i.e., due to MESOS-1550), but then the same framework ID is reused 
 when a framework re-registers (which we currently allow) then we should 
 remove the framework from Master::frameworks.completed otherwise when a slave 
 re-registers then in Master::reconcile we'll notice that the slave is running 
 tasks from a completed framework and tell the slave to shutdown that 
 framework, thus shutting down all of the tasks.
 This should be easily fixed by removing the framework from 
 frameworks.completed when a framework re-registers with the same ID as a 
 completed framework. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1839) Modify configure.ac to fix --with-sasl

2014-11-04 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-1839:

Shepherd: Benjamin Hindman

 Modify configure.ac to fix --with-sasl
 --

 Key: MESOS-1839
 URL: https://issues.apache.org/jira/browse/MESOS-1839
 Project: Mesos
  Issue Type: Bug
  Components: build
Reporter: Michael Park
Assignee: Michael Park
Priority: Minor
 Fix For: 0.22.0


 Specifying a custom {{libcurl}} directory via {{\-\-with-curl}} works well, 
 but {{--with-sasl}} doesn't work.
 {{libcurl}} installed at {{$HOME/libcurl}} and {{libsasl2}} installed at 
 {{$HOME/libsasl}}.
 Ran: {{../configure --with-curl=$HOME/libcurl --with-sasl=$HOME/libsasl}}
 {quote}
 checking for curl_global_init in -lcurl... yes
 checking for sasl_done in -lsasl2... no
 configure: error: cannot find libsasl2
 ---
 We need libsasl2 for authentication!
 ---
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2039) Create a Design Doc for Dynamic Reservations

2014-11-04 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-2039:

Summary: Create a Design Doc for Dynamic Reservations  (was: Create a 
Design Doc)

 Create a Design Doc for Dynamic Reservations
 

 Key: MESOS-2039
 URL: https://issues.apache.org/jira/browse/MESOS-2039
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Michael Park
Assignee: Michael Park

 A design doc to be shared with the community for the Dynamic Reservation epic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2040) Authenticatee Module: Integrate authenticatee module in slave

2014-11-04 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2040:
-

 Summary: Authenticatee Module: Integrate authenticatee module in 
slave
 Key: MESOS-2040
 URL: https://issues.apache.org/jira/browse/MESOS-2040
 Project: Mesos
  Issue Type: Bug
Reporter: Till Toenshoff
Assignee: Till Toenshoff


Allow for slave authentication via authenticatee module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2041) Authenticatee Module: Integrate authenticatee module in tests

2014-11-04 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2041:
-

 Summary: Authenticatee Module: Integrate authenticatee module in 
tests
 Key: MESOS-2041
 URL: https://issues.apache.org/jira/browse/MESOS-2041
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff
Assignee: Till Toenshoff


Make the authenticatee module testable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2040) Authenticatee Module: Integrate authenticatee module in slave

2014-11-04 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-2040:
--
Issue Type: Improvement  (was: Bug)

 Authenticatee Module: Integrate authenticatee module in slave
 -

 Key: MESOS-2040
 URL: https://issues.apache.org/jira/browse/MESOS-2040
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff
Assignee: Till Toenshoff

 Allow for slave authentication via authenticatee module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2042) Authenticatee Module: Integrate authenticatee module in scheduler

2014-11-04 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-2042:
-

 Summary: Authenticatee Module: Integrate authenticatee module in 
scheduler
 Key: MESOS-2042
 URL: https://issues.apache.org/jira/browse/MESOS-2042
 Project: Mesos
  Issue Type: Improvement
  Components: modules
Reporter: Till Toenshoff
Assignee: Till Toenshoff


Allow for frameworks to use the authenticatee module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2034) Documentation for isolator namespaces/pid.

2014-11-04 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196658#comment-14196658
 ] 

Ian Downes commented on MESOS-2034:
---

https://reviews.apache.org/r/27585/

 Documentation for isolator namespaces/pid.
 --

 Key: MESOS-2034
 URL: https://issues.apache.org/jira/browse/MESOS-2034
 Project: Mesos
  Issue Type: Documentation
Affects Versions: 0.21.0
Reporter: Ian Downes
Assignee: Ian Downes





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2033) Documentation for isolator filesystem/shared.

2014-11-04 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196657#comment-14196657
 ] 

Ian Downes commented on MESOS-2033:
---

https://reviews.apache.org/r/27584/

 Documentation for isolator filesystem/shared.
 -

 Key: MESOS-2033
 URL: https://issues.apache.org/jira/browse/MESOS-2033
 Project: Mesos
  Issue Type: Documentation
Affects Versions: 0.21.0
Reporter: Ian Downes
Assignee: Ian Downes





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2043) framework auth fail with timeout error and never get authenticated

2014-11-04 Thread Bhuvan Arumugam (JIRA)
Bhuvan Arumugam created MESOS-2043:
--

 Summary: framework auth fail with timeout error and never get 
authenticated
 Key: MESOS-2043
 URL: https://issues.apache.org/jira/browse/MESOS-2043
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.21.0
Reporter: Bhuvan Arumugam


I'm facing this issue in master as of 
https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4

As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm 
running 1 master and 1 scheduler (aurora). The framework authentication fail 
due to time out:

error on mesos master:

{code}
I1104 19:37:17.741449  8329 master.cpp:3874] Authenticating 
scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
I1104 19:37:17.741585  8329 master.cpp:3885] Using default CRAM-MD5 
authenticator
I1104 19:37:17.742106  8336 authenticator.hpp:169] Creating new server SASL 
connection
W1104 19:37:22.742959  8329 master.cpp:3953] Authentication timed out
W1104 19:37:22.743548  8329 master.cpp:3930] Failed to authenticate 
scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: 
Authentication discarded
{code}

scheduler error:
{code}
I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master 
master@MASTER_IP:PORT
I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL 
connection
I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL authentication 
mechanisms: CRAM-MD5
I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate 
with mechanism 'CRAM-MD5'
W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out
I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master 
master@MASTER_IP:PORT: Authentication discarded
{code}

Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}}  
{{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is trying 
to authenticate and fail.
{code}
W1104 19:36:30.769420  8319 master.cpp:3930] Failed to authenticate 
scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to 
communicate with authenticatee
I1104 19:36:42.701441  8328 master.cpp:3860] Queuing up authentication request 
from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 because 
authentication is still in progress
{code}

Restarting master and scheduler didn't fix it. 

This particular issue happen with 1 master and 1 scheduler after MESOS-1866 is 
fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-830) ExamplesTest.JavaFramework is flaky

2014-11-04 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196845#comment-14196845
 ] 

Till Toenshoff commented on MESOS-830:
--

I also see this one failing a lot on OSX. 

Just ran a gtest_repeat=20 and got 19 failures, 1 pass.

{noformat}
[ RUN  ] ExamplesTest.JavaFramework
Using temporary directory '/tmp/ExamplesTest_JavaFramework_DCnIxN'
Enabling authentication for the framework
I1104 22:21:37.416721 112590848 leveldb.cpp:176] Opened db in 2428us
I1104 22:21:37.417207 112590848 leveldb.cpp:183] Compacted db in 454us
I1104 22:21:37.417244 112590848 leveldb.cpp:198] Created db iterator in 15us
I1104 22:21:37.417258 112590848 leveldb.cpp:204] Seeked to beginning of db in 
7us
I1104 22:21:37.417268 112590848 leveldb.cpp:273] Iterated through 0 keys in the 
db in 8us
I1104 22:21:37.417317 112590848 replica.cpp:741] Replica recovered with log 
positions 0 - 0 with 1 holes and 0 unlearned
I1104 22:21:37.417966 503451648 recover.cpp:437] Starting replica recovery
I1104 22:21:37.418251 503451648 recover.cpp:463] Replica is in EMPTY status
I1104 22:21:37.419044 502378496 replica.cpp:638] Replica in EMPTY status 
received a broadcasted recover request
I1104 22:21:37.419242 506134528 recover.cpp:188] Received a recover response 
from a replica in EMPTY status
I1104 22:21:37.419445 504524800 recover.cpp:554] Updating replica status to 
STARTING
I1104 22:21:37.419777 505597952 leveldb.cpp:306] Persisting metadata (8 bytes) 
to leveldb took 231us
I1104 22:21:37.419802 505597952 replica.cpp:320] Persisted replica status to 
STARTING
I1104 22:21:37.419909 503988224 recover.cpp:463] Replica is in STARTING status
I1104 22:21:37.420393 502378496 replica.cpp:638] Replica in STARTING status 
received a broadcasted recover request
I1104 22:21:37.420555 503988224 recover.cpp:188] Received a recover response 
from a replica in STARTING status
I1104 22:21:37.420811 502915072 recover.cpp:554] Updating replica status to 
VOTING
I1104 22:21:37.421128 505597952 leveldb.cpp:306] Persisting metadata (8 bytes) 
to leveldb took 193us
I1104 22:21:37.421161 505597952 replica.cpp:320] Persisted replica status to 
VOTING
I1104 22:21:37.421190 504524800 recover.cpp:568] Successfully joined the Paxos 
group
I1104 22:21:37.421301 504524800 recover.cpp:452] Recover process terminated
I1104 22:21:37.425765 502378496 master.cpp:318] Master 
20141104-222137-347252928-55703-8935 (lobomacpro2.fritz.box) started on 
192.168.178.20:55703
I1104 22:21:37.425830 502378496 master.cpp:364] Master only allowing 
authenticated frameworks to register
I1104 22:21:37.425843 502378496 master.cpp:371] Master allowing unauthenticated 
slaves to register
I1104 22:21:37.425854 502378496 credentials.hpp:36] Loading credentials for 
authentication from '/tmp/ExamplesTest_JavaFramework_DCnIxN/credentials'
W1104 22:21:37.425889 502378496 credentials.hpp:51] Permissions on credentials 
file '/tmp/ExamplesTest_JavaFramework_DCnIxN/credentials' are too open. It is 
recommended that your credentials file is NOT accessible by others.
I1104 22:21:37.425921 502378496 master.cpp:408] Authorization enabled
I1104 22:21:37.426417 112590848 containerizer.cpp:100] Using isolation: 
posix/cpu,posix/mem
I1104 22:21:37.427026 504524800 slave.cpp:169] Slave started on 
1)@192.168.178.20:55703
I1104 22:21:37.427248 504524800 slave.cpp:289] Slave resources: cpus(*):2; 
mem(*):10240; disk(*):470808; ports(*):[31000-32000]
I1104 22:21:37.427533 112590848 containerizer.cpp:100] Using isolation: 
posix/cpu,posix/mem
I1104 22:21:37.428071 503988224 slave.cpp:169] Slave started on 
2)@192.168.178.20:55703
I1104 22:21:37.428176 502378496 master.cpp:1258] The newly elected leader is 
master@192.168.178.20:55703 with id 20141104-222137-347252928-55703-8935
I1104 22:21:37.428205 502378496 master.cpp:1271] Elected as the leading master!
I1104 22:21:37.428220 502378496 master.cpp:1089] Recovering from registrar
I1104 22:21:37.428267 503988224 slave.cpp:289] Slave resources: cpus(*):2; 
mem(*):10240; disk(*):470808; ports(*):[31000-32000]
I1104 22:21:37.428318 502915072 registrar.cpp:313] Recovering registrar
I1104 22:21:37.428598 505061376 log.cpp:656] Attempting to start the writer
I1104 22:21:37.428805 112590848 containerizer.cpp:100] Using isolation: 
posix/cpu,posix/mem
I1104 22:21:37.428889 503988224 slave.cpp:318] Slave hostname: 
lobomacpro2.fritz.box
I1104 22:21:37.428892 504524800 slave.cpp:318] Slave hostname: 
lobomacpro2.fritz.box
I1104 22:21:37.428917 503988224 slave.cpp:319] Slave checkpoint: true
I1104 22:21:37.428927 504524800 slave.cpp:319] Slave checkpoint: true
I1104 22:21:37.429457 506134528 state.cpp:33] Recovering state from 
'/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos-XX.3EkfQ7TT/1/meta'
I1104 22:21:37.429478 505061376 state.cpp:33] Recovering state from 
'/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos-XX.3EkfQ7TT/0/meta

[jira] [Commented] (MESOS-2043) framework auth fail with timeout error and never get authenticated

2014-11-04 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196927#comment-14196927
 ] 

Vinod Kone commented on MESOS-2043:
---

Hey Bhuvan. Can you include more information in the logs? Right now the timings 
in the master and scheduler logs do not match. Also, were there failovers in 
the midst of this?

 framework auth fail with timeout error and never get authenticated
 --

 Key: MESOS-2043
 URL: https://issues.apache.org/jira/browse/MESOS-2043
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.21.0
Reporter: Bhuvan Arumugam

 I'm facing this issue in master as of 
 https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4
 As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm 
 running 1 master and 1 scheduler (aurora). The framework authentication fail 
 due to time out:
 error on mesos master:
 {code}
 I1104 19:37:17.741449  8329 master.cpp:3874] Authenticating 
 scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
 I1104 19:37:17.741585  8329 master.cpp:3885] Using default CRAM-MD5 
 authenticator
 I1104 19:37:17.742106  8336 authenticator.hpp:169] Creating new server SASL 
 connection
 W1104 19:37:22.742959  8329 master.cpp:3953] Authentication timed out
 W1104 19:37:22.743548  8329 master.cpp:3930] Failed to authenticate 
 scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: 
 Authentication discarded
 {code}
 scheduler error:
 {code}
 I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master 
 master@MASTER_IP:PORT
 I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL 
 connection
 I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL 
 authentication mechanisms: CRAM-MD5
 I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate 
 with mechanism 'CRAM-MD5'
 W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out
 I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master 
 master@MASTER_IP:PORT: Authentication discarded
 {code}
 Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}}  
 {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is 
 trying to authenticate and fail.
 {code}
 W1104 19:36:30.769420  8319 master.cpp:3930] Failed to authenticate 
 scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to 
 communicate with authenticatee
 I1104 19:36:42.701441  8328 master.cpp:3860] Queuing up authentication 
 request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 
 because authentication is still in progress
 {code}
 Restarting master and scheduler didn't fix it. 
 This particular issue happen with 1 master and 1 scheduler after MESOS-1866 
 is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1956) Add IPv6 ICMPv6 libnl traffic control U32 filters

2014-11-04 Thread Cong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196948#comment-14196948
 ] 

Cong Wang commented on MESOS-1956:
--

I am not sure how much sense this makes in the real world, because when using 
IPv6 it usually means we get enough IP addresses, therefore probably don't need 
these filters to do port range based routing. Instead we could just assign each 
container with a different IPv6 address.

 Add IPv6  ICMPv6 libnl traffic control U32 filters
 ---

 Key: MESOS-1956
 URL: https://issues.apache.org/jira/browse/MESOS-1956
 Project: Mesos
  Issue Type: Task
  Components: isolation
Reporter: Evelina Dumitrescu
Assignee: Evelina Dumitrescu

 For IPv6, the filtering should be done by source and destination ports, 
 destination IP, destination MAC.
 For ICMPv6, the filtering should be done by protocol and destination IP.
 The IPv6/IPv4 difference could be done by the source/destination IP type from 
 the classifier.
 IPv4 packets with options in the header are currently ignored due to a bug in 
 libnl. It should be investigated if the problem occurs in the case of IPv6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1143) Add a TASK_ERROR task status.

2014-11-04 Thread Dominic Hamon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Hamon updated MESOS-1143:
-
Target Version/s: 0.22.0

 Add a TASK_ERROR task status.
 -

 Key: MESOS-1143
 URL: https://issues.apache.org/jira/browse/MESOS-1143
 Project: Mesos
  Issue Type: Improvement
  Components: framework, master
Reporter: Benjamin Hindman
Assignee: Dominic Hamon

 During task validation we drop tasks that have errors and send TASK_LOST 
 status updates. In most circumstances a framework will want to relaunch a 
 task that has gone lost, and in the event the task is actually malformed 
 (thus invalid) this will result in an infinite loop of sending a task and 
 having it go lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1143) Add a TASK_ERROR task status.

2014-11-04 Thread Dominic Hamon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196959#comment-14196959
 ] 

Dominic Hamon commented on MESOS-1143:
--

Added status as part of

commit ca13594054b179bd36ce14323de3111f92bae3cd (HEAD, origin/master, 
origin/HEAD, master, MESOS-1830.task_lost_source)
Author: Dominic Hamon dha...@twitter.com
Commit: Dominic Hamon dha...@twitter.com

Add source and reason to TaskStatus.

Review: https://reviews.apache.org/r/26382/

but not in use until 0.22.0

 Add a TASK_ERROR task status.
 -

 Key: MESOS-1143
 URL: https://issues.apache.org/jira/browse/MESOS-1143
 Project: Mesos
  Issue Type: Improvement
  Components: framework, master
Reporter: Benjamin Hindman
Assignee: Dominic Hamon

 During task validation we drop tasks that have errors and send TASK_LOST 
 status updates. In most circumstances a framework will want to relaunch a 
 task that has gone lost, and in the event the task is actually malformed 
 (thus invalid) this will result in an infinite loop of sending a task and 
 having it go lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1830) Expose master stats differentiating between master-generated and slave-generated LOST tasks

2014-11-04 Thread Dominic Hamon (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196961#comment-14196961
 ] 

Dominic Hamon commented on MESOS-1830:
--

https://reviews.apache.org/r/27531/

 Expose master stats differentiating between master-generated and 
 slave-generated LOST tasks
 ---

 Key: MESOS-1830
 URL: https://issues.apache.org/jira/browse/MESOS-1830
 Project: Mesos
  Issue Type: Story
  Components: master
Reporter: Bill Farner
Assignee: Dominic Hamon
Priority: Minor

 The master exports a monotonically-increasing counter of tasks transitioned 
 to TASK_LOST.  This loses fidelity of the source of the lost task.  A first 
 step in exposing the source of lost tasks might be to just differentiate 
 between TASK_LOST transitions initiated by the master vs the slave (and maybe 
 bad input from the scheduler).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1919) Create IP address abstraction

2014-11-04 Thread Cong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196970#comment-14196970
 ] 

Cong Wang commented on MESOS-1919:
--

I don't think we need to worry about u32 filters for IPv4, because they never 
work for IPv6 (IPv6 packets are just ignored), nor it makes much sense to do 
port range based routing for IPv6. We should really use one IP address per 
container solution when using IPv6.

 Create IP address abstraction
 -

 Key: MESOS-1919
 URL: https://issues.apache.org/jira/browse/MESOS-1919
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Dominic Hamon
Assignee: Evelina Dumitrescu
Priority: Minor

 in the code many functions need only the ip address to be passed as a 
 parameter. I don't think it would be desirable to use a struct 
 SockaddrStorage (MESOS-1916).
 Consider using a {{std::vectorunsigned char}} (see {{typedef 
 std::vectorunsigned char IPAddressNumber;}} in the Chromium project)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2043) framework auth fail with timeout error and never get authenticated

2014-11-04 Thread Bhuvan Arumugam (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhuvan Arumugam updated MESOS-2043:
---
Attachment: aurora-scheduler.20141104-1606-1706.log
mesos-master.20141104-1606-1706.log

[~vinodkone] it wasn't a failover, but master was restarted after an upgrade.
I've attached the log both master and scheduler, during first 1hr.

master log snippet:
{code}
I1104 16:06:39.019181 35273 master.cpp:3874] Authenticating 
scheduler-8160bf27-7799-4b8c-921c-b2e87869475b@AURORA_IP:8083
I1104 16:06:39.019480 35273 master.cpp:3885] Using default CRAM-MD5 
authenticator
I1104 16:06:39.020884 35290 authenticator.hpp:107] Initializing server SASL 
I1104 16:06:39.022680 35290 authenticator.hpp:169] Creating new server SASL 
connection
W1104 16:06:44.022080 35275 master.cpp:3953] Authentication timed out   
{code}

scheduler log snippet:
{code}
I1104 16:06:34.006535 23272 detector.cpp:138] Detected a new leader: (id='115') 
I1104 16:06:34.007257 23270 group.cpp:659] Trying to get 
'/mesos/info_000115' in ZooKeeper
I1104 16:06:34.008654 23270 detector.cpp:433] A new leading master 
(UPID=master@MASTER_IP:PORT) is detected
W1104 16:06:34.009 THREAD3393 
org.apache.aurora.scheduler.MesosSchedulerImpl.disconnected: Framework 
disconnected.
I1104 16:06:34.010 THREAD3393 
org.apache.aurora.scheduler.async.OfferQueue$OfferQueueImpl.driverDisconnected: 
Clearing stale offers since the driver is disconnected.
I1104 16:06:34.010766 23281 sched.cpp:233] New master detected at 
master@MASTER_IP:PORT
I1104 16:06:34.010834 23281 sched.cpp:283] Authenticating with master 
master@MASTER_IP:PORT
I1104 16:06:34.011281 23263 authenticatee.hpp:133] Creating new client SASL 
connection
W1104 16:06:39.016166 23274 sched.cpp:378] Authentication timed out 
I1104 16:06:39.016585 23263 sched.cpp:338] Failed to authenticate with master 
master@MASTER_IP:PORT: Authentication discarded
I1104 16:06:39.016669 23263 sched.cpp:283] Authenticating with master 
master@MASTER_IP:PORT
I1104 16:06:39.017057 23282 authenticatee.hpp:133] Creating new client SASL 
connection
I1104 16:06:39.023083 23279 authenticatee.hpp:224] Received SASL authentication 
mechanisms: CRAM-MD5
I1104 16:06:39.023138 23279 authenticatee.hpp:250] Attempting to authenticate 
with mechanism 'CRAM-MD5'
W1104 16:06:44.022470 23268 sched.cpp:378] Authentication timed out 
{code}


 framework auth fail with timeout error and never get authenticated
 --

 Key: MESOS-2043
 URL: https://issues.apache.org/jira/browse/MESOS-2043
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.21.0
Reporter: Bhuvan Arumugam
 Attachments: aurora-scheduler.20141104-1606-1706.log, 
 mesos-master.20141104-1606-1706.log


 I'm facing this issue in master as of 
 https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4
 As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm 
 running 1 master and 1 scheduler (aurora). The framework authentication fail 
 due to time out:
 error on mesos master:
 {code}
 I1104 19:37:17.741449  8329 master.cpp:3874] Authenticating 
 scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
 I1104 19:37:17.741585  8329 master.cpp:3885] Using default CRAM-MD5 
 authenticator
 I1104 19:37:17.742106  8336 authenticator.hpp:169] Creating new server SASL 
 connection
 W1104 19:37:22.742959  8329 master.cpp:3953] Authentication timed out
 W1104 19:37:22.743548  8329 master.cpp:3930] Failed to authenticate 
 scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: 
 Authentication discarded
 {code}
 scheduler error:
 {code}
 I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master 
 master@MASTER_IP:PORT
 I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL 
 connection
 I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL 
 authentication mechanisms: CRAM-MD5
 I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate 
 with mechanism 'CRAM-MD5'
 W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out
 I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master 
 master@MASTER_IP:PORT: Authentication discarded
 {code}
 Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}}  
 {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is 
 trying to authenticate and fail.
 {code}
 W1104 19:36:30.769420  8319 master.cpp:3930] Failed to authenticate 
 scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to 
 communicate with authenticatee
 I1104 19:36:42.701441  8328 master.cpp:3860] Queuing up authentication 
 request from scheduler-d2d4437b-d375-4467-a583

[jira] [Closed] (MESOS-1909) Network monitoring and isolation using macvlan.

2014-11-04 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu closed MESOS-1909.
-
Resolution: Duplicate

 Network monitoring and isolation using macvlan.
 ---

 Key: MESOS-1909
 URL: https://issues.apache.org/jira/browse/MESOS-1909
 Project: Mesos
  Issue Type: Story
Reporter: Jie Yu

 Right now, the port mapping network isolator used by mesos containerizer is 
 based on the assumption that there is not enough public IPs (a unique IP per 
 container). That introduces a lot of the complexities and limits the number 
 ports that can be used by a container.
 If we have enough public IPs, we can use macvlan to bridge containers. That's 
 much more cleaner and efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1806) Substituting etcd or ReplicatedLog for Zookeeper

2014-11-04 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197411#comment-14197411
 ] 

Timothy St. Clair commented on MESOS-1806:
--

[~nnielsen] [~tnachen] - We should wrap the interface and create a mesos-module 
here.  

 Substituting etcd or ReplicatedLog for Zookeeper
 

 Key: MESOS-1806
 URL: https://issues.apache.org/jira/browse/MESOS-1806
 Project: Mesos
  Issue Type: Task
Reporter: Ed Ropple
Priority: Minor

 adam_mesos   eropple: Could you also file a new JIRA for Mesos to drop ZK 
 in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
 that one.
 --
 Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1806) Substituting etcd or ReplicatedLog for Zookeeper

2014-11-04 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197695#comment-14197695
 ] 

Adam B commented on MESOS-1806:
---

+1 Module should be as easy as MasterDetector + MasterContender, which are 
already very simple interfaces.
Might need to make the existing zkstring more generic, possibly adding more 
config than a URL for some alternate implementations.

 Substituting etcd or ReplicatedLog for Zookeeper
 

 Key: MESOS-1806
 URL: https://issues.apache.org/jira/browse/MESOS-1806
 Project: Mesos
  Issue Type: Task
Reporter: Ed Ropple
Priority: Minor

 adam_mesos   eropple: Could you also file a new JIRA for Mesos to drop ZK 
 in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
 that one.
 --
 Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)