[jira] [Commented] (MESOS-1930) Expose TASK_KILLED reason.
[ https://issues.apache.org/jira/browse/MESOS-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195985#comment-14195985 ] Alexander Rukletsov commented on MESOS-1930: Reason may be not a precise term for what I mean. If a task is killed, I would like to get feedback how it terminated: gracefully or hard by a timeout. In case of CommandExecutor this is supported via signal escalation. If an executor doesn't distinguish between soft and hard kill, it can simply ignore the second tier state. How about introducing something like {{REASON_KILL_TIMEOUT}}? Expose TASK_KILLED reason. -- Key: MESOS-1930 URL: https://issues.apache.org/jira/browse/MESOS-1930 Project: Mesos Issue Type: Story Reporter: Alexander Rukletsov Assignee: Dominic Hamon Priority: Minor A task process may be killed by a SIGTERM or SIGKILL. The only possibility to check how the task process has exited is to examine the message: {{status.message().find(Terminated)}}. However, a task may not run in its own process, hence the executor may not be able to provide an exit status. What we actually want is an artificial task exit status that is rendered by the executor. This may be resolved by adding second tier states or state explanations. Here is a link to a discussion: https://reviews.apache.org/r/26382/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2038) Remove dead code in Slave::_runTask
[ https://issues.apache.org/jira/browse/MESOS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195993#comment-14195993 ] Bernd Mathiske commented on MESOS-2038: --- https://reviews.apache.org/r/27567/ Remove dead code in Slave::_runTask --- Key: MESOS-2038 URL: https://issues.apache.org/jira/browse/MESOS-2038 Project: Mesos Issue Type: Improvement Components: slave Affects Versions: 0.21.0 Reporter: Bernd Mathiske Assignee: Bernd Mathiske Priority: Trivial Labels: newbie Fix For: 0.22.0 Original Estimate: 1h Remaining Estimate: 1h In the course of fixing MESOS-947, it has been overlooked by me that the code in question became dead code. Coverty caught this. At the top of _runTask(), there is now a test whether framework is NULL and in each case it ever is before reaching the code in question (see below) there is a local exit from the method. So we should be able to remove this code without effect: - if (framework == NULL) { -framework = new Framework(this, frameworkId, frameworkInfo, pid); -frameworks[frameworkId] = framework; - } - - CHECK_NOTNULL(framework); - -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1935) Replace hard-coded reap interval with a constant
[ https://issues.apache.org/jira/browse/MESOS-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-1935: --- Description: With https://issues.apache.org/jira/browse/MESOS-1846 implemented, replace the hard-coded value for the maximal reap interval (1s) with the constant from {{reap.hpp}}. This will mostly affect tests. (was: With https://issues.apache.org/jira/browse/MESOS-1846 implemented, replace the hard-coded value for the maximal reap interval (1s) with the constant from {{reap.hpp}}.) Replace hard-coded reap interval with a constant Key: MESOS-1935 URL: https://issues.apache.org/jira/browse/MESOS-1935 Project: Mesos Issue Type: Task Components: test Reporter: Alexander Rukletsov Priority: Trivial Labels: newbie With https://issues.apache.org/jira/browse/MESOS-1846 implemented, replace the hard-coded value for the maximal reap interval (1s) with the constant from {{reap.hpp}}. This will mostly affect tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy St. Clair updated MESOS-1807: - Comment: was deleted (was: Was the original goal of cpu+memory only offers to enable resizing? The behavior change had caused other frameworks to fail, namely Spark. Ideally I would love to have some integration testing on mods like this. [~tnachen] ^ ) Disallow executors with cpu only or memory only resources - Key: MESOS-1807 URL: https://issues.apache.org/jira/browse/MESOS-1807 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone Labels: newbie Currently master allows executors to be launched with either only cpus or only memory but we shouldn't allow that. This is because executor is an actual unix process that is launched by the slave. If an executor doesn't specify cpus, what should do the cpu limits be for that executor when there are no tasks running on it? If no cpu limits are set then it might starve other executors/tasks on the slave violating isolation guarantees. Same goes with memory. Moreover, the current containerizer/isolator code will throw failures when using such an executor, e.g., when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196173#comment-14196173 ] Timothy St. Clair commented on MESOS-1807: -- Was the original goal of cpu+memory only offers to enable resizing? The behavior change had caused other frameworks to fail, namely Spark. Ideally I would love to have some integration testing on mods like this. [~tnachen] ^ Disallow executors with cpu only or memory only resources - Key: MESOS-1807 URL: https://issues.apache.org/jira/browse/MESOS-1807 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone Labels: newbie Currently master allows executors to be launched with either only cpus or only memory but we shouldn't allow that. This is because executor is an actual unix process that is launched by the slave. If an executor doesn't specify cpus, what should do the cpu limits be for that executor when there are no tasks running on it? If no cpu limits are set then it might starve other executors/tasks on the slave violating isolation guarantees. Same goes with memory. Moreover, the current containerizer/isolator code will throw failures when using such an executor, e.g., when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196174#comment-14196174 ] Timothy St. Clair commented on MESOS-1807: -- Was the original goal of cpu+memory only offers to enable resizing? The behavior change had caused other frameworks to fail, namely Spark. Ideally I would love to have some integration testing on mods like this. [~tnachen] ^ Disallow executors with cpu only or memory only resources - Key: MESOS-1807 URL: https://issues.apache.org/jira/browse/MESOS-1807 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Vinod Kone Labels: newbie Currently master allows executors to be launched with either only cpus or only memory but we shouldn't allow that. This is because executor is an actual unix process that is launched by the slave. If an executor doesn't specify cpus, what should do the cpu limits be for that executor when there are no tasks running on it? If no cpu limits are set then it might starve other executors/tasks on the slave violating isolation guarantees. Same goes with memory. Moreover, the current containerizer/isolator code will throw failures when using such an executor, e.g., when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2037) Update docs/configuration.md
[ https://issues.apache.org/jira/browse/MESOS-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-2037: -- Sprint: Mesosphere Q4 Sprint 2 Update docs/configuration.md Key: MESOS-2037 URL: https://issues.apache.org/jira/browse/MESOS-2037 Project: Mesos Issue Type: Documentation Reporter: Kapil Arya Assignee: Kapil Arya Priority: Blocker Update documentation for configuration flags (docs/configuration.md) to reflect the current state.https://reviews.apache.org/r/27556/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1974) Refactor the C++ 'Resources' abstraction.
[ https://issues.apache.org/jira/browse/MESOS-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196425#comment-14196425 ] Jie Yu commented on MESOS-1974: --- https://reviews.apache.org/r/27555 Refactor the C++ 'Resources' abstraction. - Key: MESOS-1974 URL: https://issues.apache.org/jira/browse/MESOS-1974 Project: Mesos Issue Type: Improvement Reporter: Jie Yu Assignee: Jie Yu The existing C++ 'Resources' interfaces are poorly designed. Some of them are confusing and unintuitive. Some of them are overloaded with too many functionalities. For instance, {noformat} bool operator = (const Resource left, const Resource right); {noformat} This interface in non-intuitive because A = B doesn't imply !(B = A). {noformat} Resource operator + (const Resource left, const Resource right); {noformat} This one is also non-intuitive because if 'left' is not compatible with 'right', the result is 'left' (why not right???). Similar for operator '-'. {noformat} OptionResource Resources::get(const Resource r) const; {noformat} This one assume Resources is flattened, but it might not be. As we start to introduce persistent disk resources (MESOS-1554), things will get more complicated. For example, one may want to get two types of 'disk()' functions: one returns the ephemeral disk bytes (with no disk info), one returns the total disk bytes (including ones that have disk info). We may wanna introduce a concept about Resource that indicates that a resource cannot be merged or split (e.g., atomic?). Since we need to change this class anyway. I wanna take this chance to refactor it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1768) Provide default image settings for containerizers
[ https://issues.apache.org/jira/browse/MESOS-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196519#comment-14196519 ] Timothy Chen commented on MESOS-1768: - [~idownes], This is already merged right? Provide default image settings for containerizers - Key: MESOS-1768 URL: https://issues.apache.org/jira/browse/MESOS-1768 Project: Mesos Issue Type: Story Reporter: Timothy Chen Assignee: Ian Downes We want to be able to specify a default image setting for all the mesos containerizers (external, mesos, docker, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1886) Allow docker pull on each run to be configurable
[ https://issues.apache.org/jira/browse/MESOS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-1886: Summary: Allow docker pull on each run to be configurable (was: Always `docker pull` if explicit :latest tag is present) Allow docker pull on each run to be configurable Key: MESOS-1886 URL: https://issues.apache.org/jira/browse/MESOS-1886 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Chris Heller Assignee: Timothy Chen Priority: Minor Labels: docker With 0.20.1 the behavior of a docker container has changed (see MESOS-1762). This change brings the docker behavior more in line with that of {{docker run}}. I propose,if the image given explicitly has the :latest tag, this should signify to mesos that an unconditional `docker pull` should be done on the image... and if it should fail for any reason (i.e. the registry is unavailable) we fall back to the current behavior. This would break slightly with the semantics of how the docker command line operates, but the alternative is to require explicit tags on every release -- which is a hinderance when developing a new image, or one must log in to each node and run an explicit `docker pull`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1886) Always `docker pull` if explicit :latest tag is present
[ https://issues.apache.org/jira/browse/MESOS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen reassigned MESOS-1886: --- Assignee: Timothy Chen Always `docker pull` if explicit :latest tag is present - Key: MESOS-1886 URL: https://issues.apache.org/jira/browse/MESOS-1886 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Chris Heller Assignee: Timothy Chen Priority: Minor Labels: docker With 0.20.1 the behavior of a docker container has changed (see MESOS-1762). This change brings the docker behavior more in line with that of {{docker run}}. I propose,if the image given explicitly has the :latest tag, this should signify to mesos that an unconditional `docker pull` should be done on the image... and if it should fail for any reason (i.e. the registry is unavailable) we fall back to the current behavior. This would break slightly with the semantics of how the docker command line operates, but the alternative is to require explicit tags on every release -- which is a hinderance when developing a new image, or one must log in to each node and run an explicit `docker pull`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2018) Dynamic Reservations
[ https://issues.apache.org/jira/browse/MESOS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-2018: Epic Name: Dynamic Reservations (was: Dynamic Resource Reservations) Summary: Dynamic Reservations (was: Dynamic Resource Reservations) Dynamic Reservations Key: MESOS-2018 URL: https://issues.apache.org/jira/browse/MESOS-2018 Project: Mesos Issue Type: Epic Components: allocation, framework, master, slave Reporter: Adam B Assignee: Michael Park Labels: offer, persistence, reservations, resource, stateful, storage This is a feature to provide better support for running stateful services on Mesos such as HDFS (Distributed Filesystem), Cassandra (Distributed Database), or MySQL (Local Database). Current resource reservations (henceforth called static reservations) are statically determined by the slave operator at slave start time, and individual frameworks have no authority to reserve resources themselves. Dynamic reservations allow a framework to dynamically/lazily reserve offered resources at task launch time, such that when that task completes, those resources will only be re-offered to the same framework (or other frameworks with the same role). This is especially useful if the framework's task stored some state on the slave, and needs a guaranteed set of resources reserved so that it can re-launch a task on the same slave to recover that state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1886) Allow docker pull on each run to be configurable
[ https://issues.apache.org/jira/browse/MESOS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-1886: Sprint: Mesosphere Q4 Sprint 2 - 11/14 Allow docker pull on each run to be configurable Key: MESOS-1886 URL: https://issues.apache.org/jira/browse/MESOS-1886 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Chris Heller Assignee: Timothy Chen Priority: Minor Labels: docker With 0.20.1 the behavior of a docker container has changed (see MESOS-1762). This change brings the docker behavior more in line with that of {{docker run}}. I propose,if the image given explicitly has the :latest tag, this should signify to mesos that an unconditional `docker pull` should be done on the image... and if it should fail for any reason (i.e. the registry is unavailable) we fall back to the current behavior. This would break slightly with the semantics of how the docker command line operates, but the alternative is to require explicit tags on every release -- which is a hinderance when developing a new image, or one must log in to each node and run an explicit `docker pull`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2002) Module loading within frameworks
[ https://issues.apache.org/jira/browse/MESOS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-2002: -- Sprint: Mesosphere Q4 Sprint 2 - 11/14 Module loading within frameworks - Key: MESOS-2002 URL: https://issues.apache.org/jira/browse/MESOS-2002 Project: Mesos Issue Type: Improvement Components: framework, modules Reporter: Till Toenshoff Assignee: Kapil Arya Priority: Blocker Frameworks should be granted the capability to load modules. h4.Motivation Allowing a modularized Authenticatee to cover framework authentication against the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2001) Authenticatee modules similar to Authenticator modules
[ https://issues.apache.org/jira/browse/MESOS-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff reassigned MESOS-2001: - Assignee: Till Toenshoff Authenticatee modules similar to Authenticator modules -- Key: MESOS-2001 URL: https://issues.apache.org/jira/browse/MESOS-2001 Project: Mesos Issue Type: Epic Components: modules Reporter: Till Toenshoff Assignee: Till Toenshoff Labels: authentication, module For covering a complete modules based authentication, we will need to allow for authenticatee modules just like we are with authenticator modules. h4.Motivation Allow for third parties to quickly develop and plug-in new authentication methods. The modularized Authenticatee API will lower the barrier for the community to provide new methods to Mesos. An example for such additional, next step module could be PAM (LDAP, MySQL, NIS, UNIX) backed authentication. cyrus-sasl2 itself already offers more than a half a dozen mechanisms via its standard plugins and these could be triggered by additional Authenticator / Authenticatee modules. cyrus-sasl2 does support even more mechanisms when being custom built (about a full dozen) but we do not want to bundle cyrus-sasl2 to enforce custom builds. Alternative authentication (especially non-SASL based) methods may bring in new dependencies that we don't want to enforce on all of our users. Mesos users may be required to use custom authentication techniques due to strict security policies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1886) Allow docker pull on each run to be configurable
[ https://issues.apache.org/jira/browse/MESOS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-1886: Shepherd: Benjamin Hindman Allow docker pull on each run to be configurable Key: MESOS-1886 URL: https://issues.apache.org/jira/browse/MESOS-1886 Project: Mesos Issue Type: Improvement Components: containerization Affects Versions: 0.20.1 Reporter: Chris Heller Assignee: Timothy Chen Priority: Minor Labels: docker With 0.20.1 the behavior of a docker container has changed (see MESOS-1762). This change brings the docker behavior more in line with that of {{docker run}}. I propose,if the image given explicitly has the :latest tag, this should signify to mesos that an unconditional `docker pull` should be done on the image... and if it should fail for any reason (i.e. the registry is unavailable) we fall back to the current behavior. This would break slightly with the semantics of how the docker command line operates, but the alternative is to require explicit tags on every release -- which is a hinderance when developing a new image, or one must log in to each node and run an explicit `docker pull`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2016) docker_name_prefix is too generic
[ https://issues.apache.org/jira/browse/MESOS-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Chen updated MESOS-2016: Shepherd: Benjamin Hindman docker_name_prefix is too generic - Key: MESOS-2016 URL: https://issues.apache.org/jira/browse/MESOS-2016 Project: Mesos Issue Type: Bug Reporter: Jay Buffington Assignee: Timothy Chen From docker.hpp and docker.cpp: {quote} // Prefix used to name Docker containers in order to distinguish those // created by Mesos from those created manually. extern std::string DOCKER_NAME_PREFIX; // TODO(benh): At some point to run multiple slaves we'll need to make // the Docker container name creation include the slave ID. string DOCKER_NAME_PREFIX = mesos-; {quote} This name is too generic. A common pattern in docker land is to run everything in a container and use volume mounts to share sockets do RPC between containers. CoreOS has popularized this technique. Inevitably, what people do is start a container named mesos-slave which runs the docker containerizer recovery code which removes all containers that start with mesos- And then ask huh, why did my mesos-slave docker container die? I don't see any error messages... Ideally, we should do what Ben suggested and add the slave id to the name prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2039) Create a Design Doc
Michael Park created MESOS-2039: --- Summary: Create a Design Doc Key: MESOS-2039 URL: https://issues.apache.org/jira/browse/MESOS-2039 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Michael Park Assignee: Michael Park A design doc to be shared with the community for the Dynamic Reservation epic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1291) Use clang-format to automatically format code to style
[ https://issues.apache.org/jira/browse/MESOS-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Pampuch updated MESOS-1291: Sprint: Mesosphere Q4 Sprint 2 - 11/14 Use clang-format to automatically format code to style -- Key: MESOS-1291 URL: https://issues.apache.org/jira/browse/MESOS-1291 Project: Mesos Issue Type: Improvement Components: technical debt Reporter: Dominic Hamon Assignee: Michael Park Labels: style Instead of relying on a script to check and report style errors, we should move to a workflow that allows people to write code how they feel comfortable and then automatically format it to conform to our style guide. The Chromium style from clang-format (http://clang.llvm.org/docs/ClangFormat.html) is very close to our style except for the dropped braces on class, struct, and function definitions, and two lines of whitespace between method definitions outside a class. As such, we should consider adopting clang-format and patching it to include a Mesos style variant. It can be run as part of post-reviews or as a git commit hook, or manually from within editors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2002) Module loading within frameworks
[ https://issues.apache.org/jira/browse/MESOS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-2002: -- Target Version/s: 0.22.0 Shepherd: Adam B Module loading within frameworks - Key: MESOS-2002 URL: https://issues.apache.org/jira/browse/MESOS-2002 Project: Mesos Issue Type: Improvement Components: framework, modules Reporter: Till Toenshoff Assignee: Kapil Arya Priority: Blocker Frameworks should be granted the capability to load modules. h4.Motivation Allowing a modularized Authenticatee to cover framework authentication against the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-1839) Modify configure.ac to fix --with-sasl
[ https://issues.apache.org/jira/browse/MESOS-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park resolved MESOS-1839. - Resolution: Fixed Fix Version/s: 0.22.0 Target Version/s: 0.22.0 Modify configure.ac to fix --with-sasl -- Key: MESOS-1839 URL: https://issues.apache.org/jira/browse/MESOS-1839 Project: Mesos Issue Type: Bug Components: build Reporter: Michael Park Assignee: Michael Park Priority: Minor Fix For: 0.22.0 Specifying a custom {{libcurl}} directory via {{\-\-with-curl}} works well, but {{--with-sasl}} doesn't work. {{libcurl}} installed at {{$HOME/libcurl}} and {{libsasl2}} installed at {{$HOME/libsasl}}. Ran: {{../configure --with-curl=$HOME/libcurl --with-sasl=$HOME/libsasl}} {quote} checking for curl_global_init in -lcurl... yes checking for sasl_done in -lsasl2... no configure: error: cannot find libsasl2 --- We need libsasl2 for authentication! --- {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-1181) Improve cpplint rule coverage
[ https://issues.apache.org/jira/browse/MESOS-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B resolved MESOS-1181. --- Resolution: Fixed Fix Version/s: 0.21.0 Closing now, since we have made significant progress in style-checking in the past few releases. We can open new tickets for individual changes we'd like to make, new rules to add, etc. Improve cpplint rule coverage - Key: MESOS-1181 URL: https://issues.apache.org/jira/browse/MESOS-1181 Project: Mesos Issue Type: Improvement Reporter: Adam B Assignee: Adam B Priority: Minor Labels: lint, style Fix For: 0.21.0 ReviewBot is checking our patches' style for us, and we can check it ourselves with support/mesos-style.py, but there are only a few rules enabled right now. I plan to enable more rules, fixing lint errors as needed. I'd also like to add new style rules for things like single/double line spacing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-1630) Remove framework from completedFrameworks if framework re-registers.
[ https://issues.apache.org/jira/browse/MESOS-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B closed MESOS-1630. - Resolution: Won't Fix Target Version/s: (was: 0.20.0) Closing in favor of MESOS-1219 and MESOS-1719. Remove framework from completedFrameworks if framework re-registers. Key: MESOS-1630 URL: https://issues.apache.org/jira/browse/MESOS-1630 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.14.0, 0.14.1, 0.14.2, 0.17.0, 0.16.0, 0.15.0, 0.18.0, 0.18.1, 0.18.2, 0.19.0, 0.19.1 Reporter: Benjamin Hindman Assignee: Bernd Mathiske Priority: Critical If a framework gets removed, for example, because it unregisters with the master (i.e., due to MESOS-1550), but then the same framework ID is reused when a framework re-registers (which we currently allow) then we should remove the framework from Master::frameworks.completed otherwise when a slave re-registers then in Master::reconcile we'll notice that the slave is running tasks from a completed framework and tell the slave to shutdown that framework, thus shutting down all of the tasks. This should be easily fixed by removing the framework from frameworks.completed when a framework re-registers with the same ID as a completed framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1839) Modify configure.ac to fix --with-sasl
[ https://issues.apache.org/jira/browse/MESOS-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-1839: Shepherd: Benjamin Hindman Modify configure.ac to fix --with-sasl -- Key: MESOS-1839 URL: https://issues.apache.org/jira/browse/MESOS-1839 Project: Mesos Issue Type: Bug Components: build Reporter: Michael Park Assignee: Michael Park Priority: Minor Fix For: 0.22.0 Specifying a custom {{libcurl}} directory via {{\-\-with-curl}} works well, but {{--with-sasl}} doesn't work. {{libcurl}} installed at {{$HOME/libcurl}} and {{libsasl2}} installed at {{$HOME/libsasl}}. Ran: {{../configure --with-curl=$HOME/libcurl --with-sasl=$HOME/libsasl}} {quote} checking for curl_global_init in -lcurl... yes checking for sasl_done in -lsasl2... no configure: error: cannot find libsasl2 --- We need libsasl2 for authentication! --- {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2039) Create a Design Doc for Dynamic Reservations
[ https://issues.apache.org/jira/browse/MESOS-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-2039: Summary: Create a Design Doc for Dynamic Reservations (was: Create a Design Doc) Create a Design Doc for Dynamic Reservations Key: MESOS-2039 URL: https://issues.apache.org/jira/browse/MESOS-2039 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Michael Park Assignee: Michael Park A design doc to be shared with the community for the Dynamic Reservation epic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2040) Authenticatee Module: Integrate authenticatee module in slave
Till Toenshoff created MESOS-2040: - Summary: Authenticatee Module: Integrate authenticatee module in slave Key: MESOS-2040 URL: https://issues.apache.org/jira/browse/MESOS-2040 Project: Mesos Issue Type: Bug Reporter: Till Toenshoff Assignee: Till Toenshoff Allow for slave authentication via authenticatee module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2041) Authenticatee Module: Integrate authenticatee module in tests
Till Toenshoff created MESOS-2041: - Summary: Authenticatee Module: Integrate authenticatee module in tests Key: MESOS-2041 URL: https://issues.apache.org/jira/browse/MESOS-2041 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff Assignee: Till Toenshoff Make the authenticatee module testable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2040) Authenticatee Module: Integrate authenticatee module in slave
[ https://issues.apache.org/jira/browse/MESOS-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Toenshoff updated MESOS-2040: -- Issue Type: Improvement (was: Bug) Authenticatee Module: Integrate authenticatee module in slave - Key: MESOS-2040 URL: https://issues.apache.org/jira/browse/MESOS-2040 Project: Mesos Issue Type: Improvement Reporter: Till Toenshoff Assignee: Till Toenshoff Allow for slave authentication via authenticatee module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2042) Authenticatee Module: Integrate authenticatee module in scheduler
Till Toenshoff created MESOS-2042: - Summary: Authenticatee Module: Integrate authenticatee module in scheduler Key: MESOS-2042 URL: https://issues.apache.org/jira/browse/MESOS-2042 Project: Mesos Issue Type: Improvement Components: modules Reporter: Till Toenshoff Assignee: Till Toenshoff Allow for frameworks to use the authenticatee module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2034) Documentation for isolator namespaces/pid.
[ https://issues.apache.org/jira/browse/MESOS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196658#comment-14196658 ] Ian Downes commented on MESOS-2034: --- https://reviews.apache.org/r/27585/ Documentation for isolator namespaces/pid. -- Key: MESOS-2034 URL: https://issues.apache.org/jira/browse/MESOS-2034 Project: Mesos Issue Type: Documentation Affects Versions: 0.21.0 Reporter: Ian Downes Assignee: Ian Downes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2033) Documentation for isolator filesystem/shared.
[ https://issues.apache.org/jira/browse/MESOS-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196657#comment-14196657 ] Ian Downes commented on MESOS-2033: --- https://reviews.apache.org/r/27584/ Documentation for isolator filesystem/shared. - Key: MESOS-2033 URL: https://issues.apache.org/jira/browse/MESOS-2033 Project: Mesos Issue Type: Documentation Affects Versions: 0.21.0 Reporter: Ian Downes Assignee: Ian Downes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2043) framework auth fail with timeout error and never get authenticated
Bhuvan Arumugam created MESOS-2043: -- Summary: framework auth fail with timeout error and never get authenticated Key: MESOS-2043 URL: https://issues.apache.org/jira/browse/MESOS-2043 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.21.0 Reporter: Bhuvan Arumugam I'm facing this issue in master as of https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm running 1 master and 1 scheduler (aurora). The framework authentication fail due to time out: error on mesos master: {code} I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 authenticator I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL connection W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: Authentication discarded {code} scheduler error: {code} I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL connection I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master master@MASTER_IP:PORT: Authentication discarded {code} Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is trying to authenticate and fail. {code} W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to communicate with authenticatee I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 because authentication is still in progress {code} Restarting master and scheduler didn't fix it. This particular issue happen with 1 master and 1 scheduler after MESOS-1866 is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-830) ExamplesTest.JavaFramework is flaky
[ https://issues.apache.org/jira/browse/MESOS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196845#comment-14196845 ] Till Toenshoff commented on MESOS-830: -- I also see this one failing a lot on OSX. Just ran a gtest_repeat=20 and got 19 failures, 1 pass. {noformat} [ RUN ] ExamplesTest.JavaFramework Using temporary directory '/tmp/ExamplesTest_JavaFramework_DCnIxN' Enabling authentication for the framework I1104 22:21:37.416721 112590848 leveldb.cpp:176] Opened db in 2428us I1104 22:21:37.417207 112590848 leveldb.cpp:183] Compacted db in 454us I1104 22:21:37.417244 112590848 leveldb.cpp:198] Created db iterator in 15us I1104 22:21:37.417258 112590848 leveldb.cpp:204] Seeked to beginning of db in 7us I1104 22:21:37.417268 112590848 leveldb.cpp:273] Iterated through 0 keys in the db in 8us I1104 22:21:37.417317 112590848 replica.cpp:741] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I1104 22:21:37.417966 503451648 recover.cpp:437] Starting replica recovery I1104 22:21:37.418251 503451648 recover.cpp:463] Replica is in EMPTY status I1104 22:21:37.419044 502378496 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request I1104 22:21:37.419242 506134528 recover.cpp:188] Received a recover response from a replica in EMPTY status I1104 22:21:37.419445 504524800 recover.cpp:554] Updating replica status to STARTING I1104 22:21:37.419777 505597952 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 231us I1104 22:21:37.419802 505597952 replica.cpp:320] Persisted replica status to STARTING I1104 22:21:37.419909 503988224 recover.cpp:463] Replica is in STARTING status I1104 22:21:37.420393 502378496 replica.cpp:638] Replica in STARTING status received a broadcasted recover request I1104 22:21:37.420555 503988224 recover.cpp:188] Received a recover response from a replica in STARTING status I1104 22:21:37.420811 502915072 recover.cpp:554] Updating replica status to VOTING I1104 22:21:37.421128 505597952 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 193us I1104 22:21:37.421161 505597952 replica.cpp:320] Persisted replica status to VOTING I1104 22:21:37.421190 504524800 recover.cpp:568] Successfully joined the Paxos group I1104 22:21:37.421301 504524800 recover.cpp:452] Recover process terminated I1104 22:21:37.425765 502378496 master.cpp:318] Master 20141104-222137-347252928-55703-8935 (lobomacpro2.fritz.box) started on 192.168.178.20:55703 I1104 22:21:37.425830 502378496 master.cpp:364] Master only allowing authenticated frameworks to register I1104 22:21:37.425843 502378496 master.cpp:371] Master allowing unauthenticated slaves to register I1104 22:21:37.425854 502378496 credentials.hpp:36] Loading credentials for authentication from '/tmp/ExamplesTest_JavaFramework_DCnIxN/credentials' W1104 22:21:37.425889 502378496 credentials.hpp:51] Permissions on credentials file '/tmp/ExamplesTest_JavaFramework_DCnIxN/credentials' are too open. It is recommended that your credentials file is NOT accessible by others. I1104 22:21:37.425921 502378496 master.cpp:408] Authorization enabled I1104 22:21:37.426417 112590848 containerizer.cpp:100] Using isolation: posix/cpu,posix/mem I1104 22:21:37.427026 504524800 slave.cpp:169] Slave started on 1)@192.168.178.20:55703 I1104 22:21:37.427248 504524800 slave.cpp:289] Slave resources: cpus(*):2; mem(*):10240; disk(*):470808; ports(*):[31000-32000] I1104 22:21:37.427533 112590848 containerizer.cpp:100] Using isolation: posix/cpu,posix/mem I1104 22:21:37.428071 503988224 slave.cpp:169] Slave started on 2)@192.168.178.20:55703 I1104 22:21:37.428176 502378496 master.cpp:1258] The newly elected leader is master@192.168.178.20:55703 with id 20141104-222137-347252928-55703-8935 I1104 22:21:37.428205 502378496 master.cpp:1271] Elected as the leading master! I1104 22:21:37.428220 502378496 master.cpp:1089] Recovering from registrar I1104 22:21:37.428267 503988224 slave.cpp:289] Slave resources: cpus(*):2; mem(*):10240; disk(*):470808; ports(*):[31000-32000] I1104 22:21:37.428318 502915072 registrar.cpp:313] Recovering registrar I1104 22:21:37.428598 505061376 log.cpp:656] Attempting to start the writer I1104 22:21:37.428805 112590848 containerizer.cpp:100] Using isolation: posix/cpu,posix/mem I1104 22:21:37.428889 503988224 slave.cpp:318] Slave hostname: lobomacpro2.fritz.box I1104 22:21:37.428892 504524800 slave.cpp:318] Slave hostname: lobomacpro2.fritz.box I1104 22:21:37.428917 503988224 slave.cpp:319] Slave checkpoint: true I1104 22:21:37.428927 504524800 slave.cpp:319] Slave checkpoint: true I1104 22:21:37.429457 506134528 state.cpp:33] Recovering state from '/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos-XX.3EkfQ7TT/1/meta' I1104 22:21:37.429478 505061376 state.cpp:33] Recovering state from '/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos-XX.3EkfQ7TT/0/meta
[jira] [Commented] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196927#comment-14196927 ] Vinod Kone commented on MESOS-2043: --- Hey Bhuvan. Can you include more information in the logs? Right now the timings in the master and scheduler logs do not match. Also, were there failovers in the midst of this? framework auth fail with timeout error and never get authenticated -- Key: MESOS-2043 URL: https://issues.apache.org/jira/browse/MESOS-2043 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.21.0 Reporter: Bhuvan Arumugam I'm facing this issue in master as of https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm running 1 master and 1 scheduler (aurora). The framework authentication fail due to time out: error on mesos master: {code} I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 authenticator I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL connection W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: Authentication discarded {code} scheduler error: {code} I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL connection I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master master@MASTER_IP:PORT: Authentication discarded {code} Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is trying to authenticate and fail. {code} W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to communicate with authenticatee I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 because authentication is still in progress {code} Restarting master and scheduler didn't fix it. This particular issue happen with 1 master and 1 scheduler after MESOS-1866 is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1956) Add IPv6 ICMPv6 libnl traffic control U32 filters
[ https://issues.apache.org/jira/browse/MESOS-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196948#comment-14196948 ] Cong Wang commented on MESOS-1956: -- I am not sure how much sense this makes in the real world, because when using IPv6 it usually means we get enough IP addresses, therefore probably don't need these filters to do port range based routing. Instead we could just assign each container with a different IPv6 address. Add IPv6 ICMPv6 libnl traffic control U32 filters --- Key: MESOS-1956 URL: https://issues.apache.org/jira/browse/MESOS-1956 Project: Mesos Issue Type: Task Components: isolation Reporter: Evelina Dumitrescu Assignee: Evelina Dumitrescu For IPv6, the filtering should be done by source and destination ports, destination IP, destination MAC. For ICMPv6, the filtering should be done by protocol and destination IP. The IPv6/IPv4 difference could be done by the source/destination IP type from the classifier. IPv4 packets with options in the header are currently ignored due to a bug in libnl. It should be investigated if the problem occurs in the case of IPv6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1143) Add a TASK_ERROR task status.
[ https://issues.apache.org/jira/browse/MESOS-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Hamon updated MESOS-1143: - Target Version/s: 0.22.0 Add a TASK_ERROR task status. - Key: MESOS-1143 URL: https://issues.apache.org/jira/browse/MESOS-1143 Project: Mesos Issue Type: Improvement Components: framework, master Reporter: Benjamin Hindman Assignee: Dominic Hamon During task validation we drop tasks that have errors and send TASK_LOST status updates. In most circumstances a framework will want to relaunch a task that has gone lost, and in the event the task is actually malformed (thus invalid) this will result in an infinite loop of sending a task and having it go lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1143) Add a TASK_ERROR task status.
[ https://issues.apache.org/jira/browse/MESOS-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196959#comment-14196959 ] Dominic Hamon commented on MESOS-1143: -- Added status as part of commit ca13594054b179bd36ce14323de3111f92bae3cd (HEAD, origin/master, origin/HEAD, master, MESOS-1830.task_lost_source) Author: Dominic Hamon dha...@twitter.com Commit: Dominic Hamon dha...@twitter.com Add source and reason to TaskStatus. Review: https://reviews.apache.org/r/26382/ but not in use until 0.22.0 Add a TASK_ERROR task status. - Key: MESOS-1143 URL: https://issues.apache.org/jira/browse/MESOS-1143 Project: Mesos Issue Type: Improvement Components: framework, master Reporter: Benjamin Hindman Assignee: Dominic Hamon During task validation we drop tasks that have errors and send TASK_LOST status updates. In most circumstances a framework will want to relaunch a task that has gone lost, and in the event the task is actually malformed (thus invalid) this will result in an infinite loop of sending a task and having it go lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1830) Expose master stats differentiating between master-generated and slave-generated LOST tasks
[ https://issues.apache.org/jira/browse/MESOS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196961#comment-14196961 ] Dominic Hamon commented on MESOS-1830: -- https://reviews.apache.org/r/27531/ Expose master stats differentiating between master-generated and slave-generated LOST tasks --- Key: MESOS-1830 URL: https://issues.apache.org/jira/browse/MESOS-1830 Project: Mesos Issue Type: Story Components: master Reporter: Bill Farner Assignee: Dominic Hamon Priority: Minor The master exports a monotonically-increasing counter of tasks transitioned to TASK_LOST. This loses fidelity of the source of the lost task. A first step in exposing the source of lost tasks might be to just differentiate between TASK_LOST transitions initiated by the master vs the slave (and maybe bad input from the scheduler). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1919) Create IP address abstraction
[ https://issues.apache.org/jira/browse/MESOS-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196970#comment-14196970 ] Cong Wang commented on MESOS-1919: -- I don't think we need to worry about u32 filters for IPv4, because they never work for IPv6 (IPv6 packets are just ignored), nor it makes much sense to do port range based routing for IPv6. We should really use one IP address per container solution when using IPv6. Create IP address abstraction - Key: MESOS-1919 URL: https://issues.apache.org/jira/browse/MESOS-1919 Project: Mesos Issue Type: Task Components: libprocess Reporter: Dominic Hamon Assignee: Evelina Dumitrescu Priority: Minor in the code many functions need only the ip address to be passed as a parameter. I don't think it would be desirable to use a struct SockaddrStorage (MESOS-1916). Consider using a {{std::vectorunsigned char}} (see {{typedef std::vectorunsigned char IPAddressNumber;}} in the Chromium project) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2043) framework auth fail with timeout error and never get authenticated
[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhuvan Arumugam updated MESOS-2043: --- Attachment: aurora-scheduler.20141104-1606-1706.log mesos-master.20141104-1606-1706.log [~vinodkone] it wasn't a failover, but master was restarted after an upgrade. I've attached the log both master and scheduler, during first 1hr. master log snippet: {code} I1104 16:06:39.019181 35273 master.cpp:3874] Authenticating scheduler-8160bf27-7799-4b8c-921c-b2e87869475b@AURORA_IP:8083 I1104 16:06:39.019480 35273 master.cpp:3885] Using default CRAM-MD5 authenticator I1104 16:06:39.020884 35290 authenticator.hpp:107] Initializing server SASL I1104 16:06:39.022680 35290 authenticator.hpp:169] Creating new server SASL connection W1104 16:06:44.022080 35275 master.cpp:3953] Authentication timed out {code} scheduler log snippet: {code} I1104 16:06:34.006535 23272 detector.cpp:138] Detected a new leader: (id='115') I1104 16:06:34.007257 23270 group.cpp:659] Trying to get '/mesos/info_000115' in ZooKeeper I1104 16:06:34.008654 23270 detector.cpp:433] A new leading master (UPID=master@MASTER_IP:PORT) is detected W1104 16:06:34.009 THREAD3393 org.apache.aurora.scheduler.MesosSchedulerImpl.disconnected: Framework disconnected. I1104 16:06:34.010 THREAD3393 org.apache.aurora.scheduler.async.OfferQueue$OfferQueueImpl.driverDisconnected: Clearing stale offers since the driver is disconnected. I1104 16:06:34.010766 23281 sched.cpp:233] New master detected at master@MASTER_IP:PORT I1104 16:06:34.010834 23281 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT I1104 16:06:34.011281 23263 authenticatee.hpp:133] Creating new client SASL connection W1104 16:06:39.016166 23274 sched.cpp:378] Authentication timed out I1104 16:06:39.016585 23263 sched.cpp:338] Failed to authenticate with master master@MASTER_IP:PORT: Authentication discarded I1104 16:06:39.016669 23263 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT I1104 16:06:39.017057 23282 authenticatee.hpp:133] Creating new client SASL connection I1104 16:06:39.023083 23279 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1104 16:06:39.023138 23279 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' W1104 16:06:44.022470 23268 sched.cpp:378] Authentication timed out {code} framework auth fail with timeout error and never get authenticated -- Key: MESOS-2043 URL: https://issues.apache.org/jira/browse/MESOS-2043 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.21.0 Reporter: Bhuvan Arumugam Attachments: aurora-scheduler.20141104-1606-1706.log, mesos-master.20141104-1606-1706.log I'm facing this issue in master as of https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm running 1 master and 1 scheduler (aurora). The framework authentication fail due to time out: error on mesos master: {code} I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 authenticator I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL connection W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: Authentication discarded {code} scheduler error: {code} I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL connection I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5 I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5' W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master master@MASTER_IP:PORT: Authentication discarded {code} Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is trying to authenticate and fail. {code} W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to communicate with authenticatee I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication request from scheduler-d2d4437b-d375-4467-a583
[jira] [Closed] (MESOS-1909) Network monitoring and isolation using macvlan.
[ https://issues.apache.org/jira/browse/MESOS-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu closed MESOS-1909. - Resolution: Duplicate Network monitoring and isolation using macvlan. --- Key: MESOS-1909 URL: https://issues.apache.org/jira/browse/MESOS-1909 Project: Mesos Issue Type: Story Reporter: Jie Yu Right now, the port mapping network isolator used by mesos containerizer is based on the assumption that there is not enough public IPs (a unique IP per container). That introduces a lot of the complexities and limits the number ports that can be used by a container. If we have enough public IPs, we can use macvlan to bridge containers. That's much more cleaner and efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd or ReplicatedLog for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197411#comment-14197411 ] Timothy St. Clair commented on MESOS-1806: -- [~nnielsen] [~tnachen] - We should wrap the interface and create a mesos-module here. Substituting etcd or ReplicatedLog for Zookeeper Key: MESOS-1806 URL: https://issues.apache.org/jira/browse/MESOS-1806 Project: Mesos Issue Type: Task Reporter: Ed Ropple Priority: Minor adam_mesos eropple: Could you also file a new JIRA for Mesos to drop ZK in favor of etcd or ReplicatedLog? Would love to get some momentum going on that one. -- Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1806) Substituting etcd or ReplicatedLog for Zookeeper
[ https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197695#comment-14197695 ] Adam B commented on MESOS-1806: --- +1 Module should be as easy as MasterDetector + MasterContender, which are already very simple interfaces. Might need to make the existing zkstring more generic, possibly adding more config than a URL for some alternate implementations. Substituting etcd or ReplicatedLog for Zookeeper Key: MESOS-1806 URL: https://issues.apache.org/jira/browse/MESOS-1806 Project: Mesos Issue Type: Task Reporter: Ed Ropple Priority: Minor adam_mesos eropple: Could you also file a new JIRA for Mesos to drop ZK in favor of etcd or ReplicatedLog? Would love to get some momentum going on that one. -- Consider it filed. =) -- This message was sent by Atlassian JIRA (v6.3.4#6332)