[jira] [Created] (MESOS-7653) Support launching slave using unprivileged user.
Jie Yu created MESOS-7653: - Summary: Support launching slave using unprivileged user. Key: MESOS-7653 URL: https://issues.apache.org/jira/browse/MESOS-7653 Project: Mesos Issue Type: Improvement Reporter: Jie Yu Priority: Minor This ticket captures the work needed to support launching agent using unprivileged user. 1) The agent binary needs to have file capabilities set. Given agent needs to manipulate cgroups (if using linux launcher or cgroups isolator) and clone namespaces (if using linux launcher), CAP_SYS_ADMIN capability must be in agent process's effective set. Either the "Effective" bit should be set on the agent binary so that the permitted capabilities gained on exec'ing the binary will be put into the effective set of the agent process automatically, or the agent will raise the capability itself as long as the capabilities are in the permitted set. 2) Since the launch of the user task will be done by `mesos-containerizer` binary. Either the agent will raise ambient capabilities (using prctl PR_CAP_AMBIENT_RAISE), or we need to make sure `mesos-containerizer` binary have file capabilities set so that it is able to do thing like `setuid` after agent exec'ed the helper. That means the agent process should have those required capabilities in its inheritable set (at least) and permitted set if ambient capabilities route is chosen. 3) If linux capabilities isolator is enabled, in order for the framework to gain any capabilities they like, the process launching the agent process should have all capabilities in its inheritable set and its bounding set so that those capabilities can be regain later. http://man7.org/linux/man-pages/man7/capabilities.7.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7652) docker image not working with universal containerizer
michael beisiegel created MESOS-7652: Summary: docker image not working with universal containerizer Key: MESOS-7652 URL: https://issues.apache.org/jira/browse/MESOS-7652 Project: Mesos Issue Type: Bug Components: containerization Affects Versions: 1.2.1 Reporter: michael beisiegel Priority: Minor hello, used the following docker image recently quay.io/spinnaker/front50:master https://quay.io/repository/spinnaker/front50 Here the link to the Dockerfile https://github.com/spinnaker/front50/blob/master/Dockerfile The image works fine with the docker containerizer, but the universal containerizer shows the following in stderr. "Failed to chdir into current working directory '/workdir': No such file or directory" The problem comes from the fact that the Dockerfile creates a WORKDIR but then later removes the created dir as part of a RUN. The docker containerizer has no problem with it if you do docker run -ti --rm quay.io/spinnaker/front50:master bash you get into the working dir, but the universal containerizer fails with the error. thanks for your help, Michael -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7651) Consider a more explicit way to bind reservations / volumes to a framework.
[ https://issues.apache.org/jira/browse/MESOS-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045268#comment-16045268 ] Benjamin Mahler commented on MESOS-7651: [~xujyan] Updated the description to mention lifecycle. > Consider a more explicit way to bind reservations / volumes to a framework. > --- > > Key: MESOS-7651 > URL: https://issues.apache.org/jira/browse/MESOS-7651 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler > > Currently, when a framework creates a reservation or a persistent volume, and > it wants exclusive access to this volume or reservation, it must take a few > steps: > * Ensure that no other frameworks are running within the reservation role (or > the other frameworks are co-operative). > * With hierarchical roles, frameworks must also ensure that the role is a > leaf so that no descendant roles will have access to the reservation/volume. > This could be done by generating a role (e.g. eng/kafka/). > It's not easy for the framework to ensure these things, since role ACLs are > controlled by the operator. > We should consider a more direct way for a framework to ensure that their > reservation/volume cannot be shared. E.g. by binding it to their framework id > (perhaps re-using roles for this rather than introducing something new?) > We should also consider binding the reservation / volumes, much like other > objects (tasks, executors), to the framework's lifecycle. So that if the > framework is removed, the reservations / volumes it left behind are cleaned > up. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7651) Consider a more explicit way to bind reservations / volumes to a framework.
[ https://issues.apache.org/jira/browse/MESOS-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7651: --- Description: Currently, when a framework creates a reservation or a persistent volume, and it wants exclusive access to this volume or reservation, it must take a few steps: * Ensure that no other frameworks are running within the reservation role (or the other frameworks are co-operative). * With hierarchical roles, frameworks must also ensure that the role is a leaf so that no descendant roles will have access to the reservation/volume. This could be done by generating a role (e.g. eng/kafka/). It's not easy for the framework to ensure these things, since role ACLs are controlled by the operator. We should consider a more direct way for a framework to ensure that their reservation/volume cannot be shared. E.g. by binding it to their framework id (perhaps re-using roles for this rather than introducing something new?) We should also consider binding the reservation / volumes, much like other objects (tasks, executors), to the framework's lifecycle. So that if the framework is removed, the reservations / volumes it left behind are cleaned up. was: Currently, when a framework creates a reservation or a persistent volume, and it wants exclusive access to this volume or reservation, it must take a few steps: * Ensure that no other frameworks are running within the reservation role (or the other frameworks are co-operative). * With hierarchical roles, frameworks must also ensure that the role is a leaf so that no descendant roles will have access to the reservation/volume. This could be done by generating a role (e.g. eng/kafka/). It's not easy for the framework to ensure these things, since role ACLs are controlled by the operator. We should consider a more direct way for a framework to ensure that their reservation/volume cannot be shared. E.g. by binding it to their framework id (perhaps re-using roles for this rather than introducing something new?) > Consider a more explicit way to bind reservations / volumes to a framework. > --- > > Key: MESOS-7651 > URL: https://issues.apache.org/jira/browse/MESOS-7651 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler > > Currently, when a framework creates a reservation or a persistent volume, and > it wants exclusive access to this volume or reservation, it must take a few > steps: > * Ensure that no other frameworks are running within the reservation role (or > the other frameworks are co-operative). > * With hierarchical roles, frameworks must also ensure that the role is a > leaf so that no descendant roles will have access to the reservation/volume. > This could be done by generating a role (e.g. eng/kafka/). > It's not easy for the framework to ensure these things, since role ACLs are > controlled by the operator. > We should consider a more direct way for a framework to ensure that their > reservation/volume cannot be shared. E.g. by binding it to their framework id > (perhaps re-using roles for this rather than introducing something new?) > We should also consider binding the reservation / volumes, much like other > objects (tasks, executors), to the framework's lifecycle. So that if the > framework is removed, the reservations / volumes it left behind are cleaned > up. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem
[ https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045210#comment-16045210 ] Jason Lai commented on MESOS-6162: -- I had a long time diff that I didn't get to submit yet. Now rebased to the master and squashed into one commit at: https://reviews.apache.org/r/59960/ [~gilbert] [~jieyu] > Add support for cgroups blkio subsystem > --- > > Key: MESOS-6162 > URL: https://issues.apache.org/jira/browse/MESOS-6162 > Project: Mesos > Issue Type: Task >Reporter: haosdent >Assignee: Jason Lai > > Noted that cgroups blkio subsystem may have performance issue, refer to > https://github.com/opencontainers/runc/issues/861 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-7524) Basic fetcher success metrics
[ https://issues.apache.org/jira/browse/MESOS-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039578#comment-16039578 ] James Peach edited comment on MESOS-7524 at 6/9/17 6:58 PM: | [r/59952|https://reviews.apache.org/r/59952] | Split FetcherProcess into its own source files. | | [r/59467|https://reviews.apache.org/r/59467] | Document new Fetcher metrics. | | [r/59466|https://reviews.apache.org/r/59466] | Add metrics check to Fetcher tests. | | [r/59464|https://reviews.apache.org/r/59464] | Add Fetcher task total and failed fetch metrics. | | [r/59855|https://reviews.apache.org/r/59855] | Set the fetcher cache size at construction time. | | [r/59854|https://reviews.apache.org/r/59854] | Make additional Fetcher and FetcherProcess methods const. | was (Author: jamespeach): | [r/59467|https://reviews.apache.org/r/59467] | Document new Fetcher metrics. | | [r/59464|https://reviews.apache.org/r/59464] | Add Fetcher task total and failed fetch metrics. | | [r/59466|https://reviews.apache.org/r/59466] | Add metrics check to Fetcher tests. | | [r/59855|https://reviews.apache.org/r/59855] | Set the fetcher cache size at construction time. | | [r/59854|https://reviews.apache.org/r/59854] | Make additional Fetcher and FetcherProcess methods const. | > Basic fetcher success metrics > - > > Key: MESOS-7524 > URL: https://issues.apache.org/jira/browse/MESOS-7524 > Project: Mesos > Issue Type: Bug > Components: fetcher >Reporter: James Peach >Assignee: James Peach > > There are no metrics for the fetcher. As minimum we should have counters for: > * successful fetcher invocations > * failed fetcher invocations > It would also be useful to know the fetch time, though that could be highly > variable depending on the cluster usage. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7651) Consider a more explicit way to bind reservations / volumes to a framework.
[ https://issues.apache.org/jira/browse/MESOS-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044848#comment-16044848 ] Yan Xu commented on MESOS-7651: --- +1. Related to this is the headaches around the lifecycle of reservations and volumes. Not sure what you meant by "perhaps re-using roles for this" above but I think as part of this we should bind the lifecycle of reservations to the lifecycle of the framework the same way tasks are bound to the lifecycle of the framework. > Consider a more explicit way to bind reservations / volumes to a framework. > --- > > Key: MESOS-7651 > URL: https://issues.apache.org/jira/browse/MESOS-7651 > Project: Mesos > Issue Type: Improvement >Reporter: Benjamin Mahler > > Currently, when a framework creates a reservation or a persistent volume, and > it wants exclusive access to this volume or reservation, it must take a few > steps: > * Ensure that no other frameworks are running within the reservation role (or > the other frameworks are co-operative). > * With hierarchical roles, frameworks must also ensure that the role is a > leaf so that no descendant roles will have access to the reservation/volume. > This could be done by generating a role (e.g. eng/kafka/). > It's not easy for the framework to ensure these things, since role ACLs are > controlled by the operator. > We should consider a more direct way for a framework to ensure that their > reservation/volume cannot be shared. E.g. by binding it to their framework id > (perhaps re-using roles for this rather than introducing something new?) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-3826) Add an optional unique identifier for resource reservations
[ https://issues.apache.org/jira/browse/MESOS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044833#comment-16044833 ] Benjamin Mahler commented on MESOS-3826: Filed a related issue: https://issues.apache.org/jira/browse/MESOS-7651 > Add an optional unique identifier for resource reservations > --- > > Key: MESOS-3826 > URL: https://issues.apache.org/jira/browse/MESOS-3826 > Project: Mesos > Issue Type: Improvement >Reporter: Sargun Dhillon > Labels: mesosphere, reservations > > Thanks to the resource reservation primitives, frameworks can reserve > resources. These reservations are per role, which means multiple frameworks > can share reservations. This can get very hairy, as multiple reservations can > occur on each agent. > It would be nice to be able to optionally, uniquely identify reservations by > ID, much like persistent volumes are today. This could be done by adding a > new protobuf field, such as Resource.ReservationInfo.id, that if set upon > reservation time, would come back when the reservation is advertised. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7651) Consider a more explicit way to bind reservations / volumes to a framework.
Benjamin Mahler created MESOS-7651: -- Summary: Consider a more explicit way to bind reservations / volumes to a framework. Key: MESOS-7651 URL: https://issues.apache.org/jira/browse/MESOS-7651 Project: Mesos Issue Type: Improvement Reporter: Benjamin Mahler Currently, when a framework creates a reservation or a persistent volume, and it wants exclusive access to this volume or reservation, it must take a few steps: * Ensure that no other frameworks are running within the reservation role (or the other frameworks are co-operative). * With hierarchical roles, frameworks must also ensure that the role is a leaf so that no descendant roles will have access to the reservation/volume. This could be done by generating a role (e.g. eng/kafka/). It's not easy for the framework to ensure these things, since role ACLs are controlled by the operator. We should consider a more direct way for a framework to ensure that their reservation/volume cannot be shared. E.g. by binding it to their framework id (perhaps re-using roles for this rather than introducing something new?) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7630) Add simple filtering to unversioned operator API
[ https://issues.apache.org/jira/browse/MESOS-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quinn updated MESOS-7630: - Epic Name: Operator API filtering Issue Type: Improvement (was: Epic) > Add simple filtering to unversioned operator API > > > Key: MESOS-7630 > URL: https://issues.apache.org/jira/browse/MESOS-7630 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Quinn >Assignee: Quinn > Labels: agent, api, http, master, mesosphere > > Add filtering for the following endpoints: > - {{/frameworks}} > - {{/slaves}} > - {{/tasks}} > - {{/containers}} > We should investigate whether we should use RESTful style or query string to > filter the specific resource. We should also figure out whether it's > necessary to filter a list of resources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7630) Add simple filtering to unversioned operator API
[ https://issues.apache.org/jira/browse/MESOS-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quinn updated MESOS-7630: - Issue Type: Epic (was: Improvement) > Add simple filtering to unversioned operator API > > > Key: MESOS-7630 > URL: https://issues.apache.org/jira/browse/MESOS-7630 > Project: Mesos > Issue Type: Epic > Components: agent, master >Reporter: Quinn >Assignee: Quinn > Labels: agent, api, http, master, mesosphere > > Add filtering for the following endpoints: > - {{/frameworks}} > - {{/slaves}} > - {{/tasks}} > - {{/containers}} > We should investigate whether we should use RESTful style or query string to > filter the specific resource. We should also figure out whether it's > necessary to filter a list of resources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-7630) Add simple filtering to unversioned operator API
[ https://issues.apache.org/jira/browse/MESOS-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quinn reassigned MESOS-7630: Assignee: Quinn > Add simple filtering to unversioned operator API > > > Key: MESOS-7630 > URL: https://issues.apache.org/jira/browse/MESOS-7630 > Project: Mesos > Issue Type: Improvement > Components: agent, master >Reporter: Quinn >Assignee: Quinn > Labels: agent, api, http, master, mesosphere > > Add filtering for the following endpoints: > - {{/frameworks}} > - {{/slaves}} > - {{/tasks}} > - {{/containers}} > We should investigate whether we should use RESTful style or query string to > filter the specific resource. We should also figure out whether it's > necessary to filter a list of resources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7033) Update documentation for hierarchical roles.
[ https://issues.apache.org/jira/browse/MESOS-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7033: --- Description: A few things to be sure cover: * How to ensure that a volume is not shared with other frameworks. Previously, this meant running only 1 framework in the role and using ACLs to prevent other frameworks from running in the role. With hierarchical roles, this now also includes using ACLs to prevent any child roles from being created beneath the role (as these children would be able to obtain the reserved resources). We've been advising frameworks to generate a role (e.g. eng/kafka/) to ensure that they own their reservations (but the dynamic nature of this makes setting up ACLs difficult). Longer term, we may need a more explicit way to bind reservations or volumes to frameworks. > Update documentation for hierarchical roles. > > > Key: MESOS-7033 > URL: https://issues.apache.org/jira/browse/MESOS-7033 > Project: Mesos > Issue Type: Task > Components: documentation >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > A few things to be sure cover: > * How to ensure that a volume is not shared with other frameworks. > Previously, this meant running only 1 framework in the role and using ACLs to > prevent other frameworks from running in the role. With hierarchical roles, > this now also includes using ACLs to prevent any child roles from being > created beneath the role (as these children would be able to obtain the > reserved resources). We've been advising frameworks to generate a role (e.g. > eng/kafka/) to ensure that they own their reservations (but the > dynamic nature of this makes setting up ACLs difficult). Longer term, we may > need a more explicit way to bind reservations or volumes to frameworks. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7650) Timer::cancel doesn't completely prevent spurious agent reregister loops
[ https://issues.apache.org/jira/browse/MESOS-7650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-7650: -- Affects Version/s: 1.3.0 1.2.0 > Timer::cancel doesn't completely prevent spurious agent reregister loops > > > Key: MESOS-7650 > URL: https://issues.apache.org/jira/browse/MESOS-7650 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.2.0, 1.3.0 >Reporter: Yan Xu > > See MESOS-6803 for the previous attempt to address this issue but Timer > cancellation does prevent the already dispatched {{doReliableRegistration}} > event from being executed and thus creating spurious agent reregister loops. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7650) Timer::cancel doesn't completely prevent spurious agent reregister loops
Yan Xu created MESOS-7650: - Summary: Timer::cancel doesn't completely prevent spurious agent reregister loops Key: MESOS-7650 URL: https://issues.apache.org/jira/browse/MESOS-7650 Project: Mesos Issue Type: Bug Components: agent Reporter: Yan Xu See MESOS-6803 for the previous attempt to address this issue but Timer cancellation does prevent the already dispatched {{doReliableRegistration}} event from being executed and thus creating spurious agent reregister loops. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7649) GPF in mesos-executor
[ https://issues.apache.org/jira/browse/MESOS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated MESOS-7649: - Description: We are running mesos 1.2.0 on a CoreOS system and having the following gpf show up: {code} [57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a sp:7ffdafce3500 error:0 [57807.648470] in libstdc++.so.6.0.20[7f4bdfc2+155000] {code} Stack trace: {code} #0 0x7f59c20cd054 in std::basic_string, std::allocator >::basic_string(std::string const&) () from /media/root/lib64/libstdc++.so.6 #1 0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () from /media/root/lib64/libmesos-1.2.0.so #2 0x7f59c403e623 in process::SocketManager::close(int) () from /media/root/lib64/libmesos-1.2.0.so #3 0x7f59c403f904 in process::SocketManager::finalize() () from /media/root/lib64/libmesos-1.2.0.so #4 0x7f59c403fc59 in process::finalize(bool) () from /media/root/lib64/libmesos-1.2.0.so #5 0x55c02473c1bd in ?? () #6 0x7f59c172b93c in __libc_start_main () from /media/root/lib64/libc.so.6 #7 0x55c02473c789 in ?? () {code} was: We are running mesos 1.2.0 on a CoreOS system and having the following gpf show up: {code} [57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a sp:7ffdafce3500 error:0 [57807.648470] in libstdc++.so.6.0.20[7f4bdfc2+155000] {code} I have the core dumps and am working on getting more info. > GPF in mesos-executor > - > > Key: MESOS-7649 > URL: https://issues.apache.org/jira/browse/MESOS-7649 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 1.2.0 >Reporter: Charles Allen > > We are running mesos 1.2.0 on a CoreOS system and having the following gpf > show up: > {code} > [57807.639274] traps: mesos-executor[63400] general protection > ip:7f4bdfd1b05a sp:7ffdafce3500 error:0 > [57807.648470] in libstdc++.so.6.0.20[7f4bdfc2+155000] > {code} > Stack trace: > {code} > #0 0x7f59c20cd054 in std::basic_string, > std::allocator >::basic_string(std::string const&) () from > /media/root/lib64/libstdc++.so.6 > #1 0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () > from /media/root/lib64/libmesos-1.2.0.so > #2 0x7f59c403e623 in process::SocketManager::close(int) () from > /media/root/lib64/libmesos-1.2.0.so > #3 0x7f59c403f904 in process::SocketManager::finalize() () from > /media/root/lib64/libmesos-1.2.0.so > #4 0x7f59c403fc59 in process::finalize(bool) () from > /media/root/lib64/libmesos-1.2.0.so > #5 0x55c02473c1bd in ?? () > #6 0x7f59c172b93c in __libc_start_main () from > /media/root/lib64/libc.so.6 > #7 0x55c02473c789 in ?? () > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-7649) GPF in mesos-executor
[ https://issues.apache.org/jira/browse/MESOS-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Allen updated MESOS-7649: - Description: We are running mesos 1.2.0 on a CoreOS system and having the following gpf show up on occasion: {code} [57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a sp:7ffdafce3500 error:0 [57807.648470] in libstdc++.so.6.0.20[7f4bdfc2+155000] {code} Stack trace: {code} #0 0x7f59c20cd054 in std::basic_string, std::allocator >::basic_string(std::string const&) () from /media/root/lib64/libstdc++.so.6 #1 0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () from /media/root/lib64/libmesos-1.2.0.so #2 0x7f59c403e623 in process::SocketManager::close(int) () from /media/root/lib64/libmesos-1.2.0.so #3 0x7f59c403f904 in process::SocketManager::finalize() () from /media/root/lib64/libmesos-1.2.0.so #4 0x7f59c403fc59 in process::finalize(bool) () from /media/root/lib64/libmesos-1.2.0.so #5 0x55c02473c1bd in ?? () #6 0x7f59c172b93c in __libc_start_main () from /media/root/lib64/libc.so.6 #7 0x55c02473c789 in ?? () {code} was: We are running mesos 1.2.0 on a CoreOS system and having the following gpf show up: {code} [57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a sp:7ffdafce3500 error:0 [57807.648470] in libstdc++.so.6.0.20[7f4bdfc2+155000] {code} Stack trace: {code} #0 0x7f59c20cd054 in std::basic_string, std::allocator >::basic_string(std::string const&) () from /media/root/lib64/libstdc++.so.6 #1 0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () from /media/root/lib64/libmesos-1.2.0.so #2 0x7f59c403e623 in process::SocketManager::close(int) () from /media/root/lib64/libmesos-1.2.0.so #3 0x7f59c403f904 in process::SocketManager::finalize() () from /media/root/lib64/libmesos-1.2.0.so #4 0x7f59c403fc59 in process::finalize(bool) () from /media/root/lib64/libmesos-1.2.0.so #5 0x55c02473c1bd in ?? () #6 0x7f59c172b93c in __libc_start_main () from /media/root/lib64/libc.so.6 #7 0x55c02473c789 in ?? () {code} > GPF in mesos-executor > - > > Key: MESOS-7649 > URL: https://issues.apache.org/jira/browse/MESOS-7649 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 1.2.0 >Reporter: Charles Allen > > We are running mesos 1.2.0 on a CoreOS system and having the following gpf > show up on occasion: > {code} > [57807.639274] traps: mesos-executor[63400] general protection > ip:7f4bdfd1b05a sp:7ffdafce3500 error:0 > [57807.648470] in libstdc++.so.6.0.20[7f4bdfc2+155000] > {code} > Stack trace: > {code} > #0 0x7f59c20cd054 in std::basic_string, > std::allocator >::basic_string(std::string const&) () from > /media/root/lib64/libstdc++.so.6 > #1 0x7f59c401150d in process::UPID::UPID(process::ProcessBase const&) () > from /media/root/lib64/libmesos-1.2.0.so > #2 0x7f59c403e623 in process::SocketManager::close(int) () from > /media/root/lib64/libmesos-1.2.0.so > #3 0x7f59c403f904 in process::SocketManager::finalize() () from > /media/root/lib64/libmesos-1.2.0.so > #4 0x7f59c403fc59 in process::finalize(bool) () from > /media/root/lib64/libmesos-1.2.0.so > #5 0x55c02473c1bd in ?? () > #6 0x7f59c172b93c in __libc_start_main () from > /media/root/lib64/libc.so.6 > #7 0x55c02473c789 in ?? () > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MESOS-7342) Port Docker tests
[ https://issues.apache.org/jira/browse/MESOS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044673#comment-16044673 ] Andrew Schwartzmeyer commented on MESOS-7342: - This is odd. I have two separate builds of Mesos on Windows right now, and in one of them, these tests try to run: {{.\src\mesos-tests.exe --gtest_filter="ROOT_DOCKER*"}} {noformat} [==] Running 8 tests from 1 test case. [--] Global test environment set-up. [--] 8 tests from ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest [ RUN ] ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskRunning/0 C:\Users\andschwa\src\mesos\3rdparty\libprocess\include\process/gmock.hpp(209): ERROR: this mock object (used in test RO OT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskRunning/0) should be deleted but never is. Its address is @02024AEB5888. C:\Users\andschwa\src\mesos\src\tests\default_executor_tests.cpp(131): ERROR: this mock object (used in test ROOT_DOCKER _DockerAndMesosContainerizers/DefaultExecutorTest.TaskRunning/0) should be deleted but never is. Its address is @020 24CBAF220. C:\Users\andschwa\src\mesos\src\tests\mock_registrar.cpp(54): ERROR: this mock object (used in test ROOT_DOCKER_DockerAn dMesosContainerizers/DefaultExecutorTest.TaskRunning/0) should be deleted but never is. Its address is @02024D6B4E70 . ERROR: 3 leaked mock objects found at program exit. {noformat} And in the other: {noformat} [==] Running 0 tests from 0 test cases. [==] 0 tests from 0 test cases ran. (16 ms total) [ PASSED ] 0 tests. {noformat} I'm trying to identify the difference between the two builds that is causing this. > Port Docker tests > - > > Key: MESOS-7342 > URL: https://issues.apache.org/jira/browse/MESOS-7342 > Project: Mesos > Issue Type: Bug > Components: docker > Environment: Windows 10 >Reporter: Andrew Schwartzmeyer >Assignee: John Kordich > Labels: microsoft, windows > > While one of Daniel Pravat's last acts was introducing the the Docker > containerizer for Windows, we don't have tests. We need to port > `docker_tests.cpp` and `docker_containerizer_tests.cpp` to Windows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7649) GPF in mesos-executor
Charles Allen created MESOS-7649: Summary: GPF in mesos-executor Key: MESOS-7649 URL: https://issues.apache.org/jira/browse/MESOS-7649 Project: Mesos Issue Type: Bug Components: executor Affects Versions: 1.2.0 Reporter: Charles Allen We are running mesos 1.2.0 on a CoreOS system and having the following gpf show up: {code} [57807.639274] traps: mesos-executor[63400] general protection ip:7f4bdfd1b05a sp:7ffdafce3500 error:0 [57807.648470] in libstdc++.so.6.0.20[7f4bdfc2+155000] {code} I have the core dumps and am working on getting more info. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MESOS-7648) Mesos master should not return `/state` before finishing recovering agents from registry
Zhitao Li created MESOS-7648: Summary: Mesos master should not return `/state` before finishing recovering agents from registry Key: MESOS-7648 URL: https://issues.apache.org/jira/browse/MESOS-7648 Project: Mesos Issue Type: Bug Reporter: Zhitao Li We are working on relying on {{recovered_agents}} in MESOS-6177. However, we discovered that master could start to respond to {{/state.json}} endpoint before it finishes processing result from registry::recover. The sequence seems to be registry was recovered -> /state query comes in -> recovered agents from registry. See the following logs: {noformat} I0608 22:29:57.147212 6407 master.cpp:2124] Elected as the leading master! I0608 22:29:57.147274 6407 master.cpp:1646] Recovering from registrar I0608 22:29:57.148114 6412 log.cpp:553] Attempting to start the writer I0608 22:29:57.149339 6411 replica.cpp:495] Replica received implicit promise request from __req_res__(2)@10.162.9.54:5050 with proposal 105 I0608 22:29:57.149860 6411 replica.cpp:344] Persisted promised to 105 I0608 22:29:57.151495 6410 coordinator.cpp:238] Coordinator attempting to fill missing positions I0608 22:29:57.151595 6412 log.cpp:569] Writer started with ending position 36816 I0608 22:29:58.111565 6423 registrar.cpp:362] Successfully fetched the registry (1200222B) in 934048us I0608 22:29:58.214422 6423 registrar.cpp:461] Applied 1 operations in 25.893664ms; attempting to update the registry I0608 22:29:58.300578 6421 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 36817 I0608 22:29:58.307567 6410 replica.cpp:539] Replica received write request for position 36817 from __req_res__(7)@10.162.9.54:5050 I0608 22:29:58.344857 6421 replica.cpp:693] Replica received learned notice for position 36817 from @0.0.0.0:0 I0608 22:29:58.378731 6408 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 36818 I0608 22:29:58.382043 6416 replica.cpp:539] Replica received write request for position 36818 from __req_res__(12)@10.162.9.54:5050 I0608 22:29:58.384946 6410 replica.cpp:693] Replica received learned notice for position 36818 from @0.0.0.0:0 I0608 22:29:59.507297 6423 registrar.cpp:506] Successfully updated the registry in 1.282937088secs I0608 22:29:59.580960 6423 registrar.cpp:392] Successfully recovered registrar I0608 22:29:59.940066 6415 http.cpp:420] HTTP GET for /master/state from 10.67.139.161:57197 with User-Agent='mesos-uns-bridge' I0608 22:30:00.342932 6425 master.cpp:1762] Recovered 3549 agents from the registry (1200220B); allowing 15mins for agents to re-register {noformat} We found that the request corresponding to second to last line above returns 0 registered or recovered agents, thus incorrectly rendered its client to think it's an empty cluster. [~anandmazumdar] [~vinodkone] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-6916) Improve health checks validation.
[ https://issues.apache.org/jira/browse/MESOS-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-6916: --- Summary: Improve health checks validation. (was: Improve health checks validation) > Improve health checks validation. > - > > Key: MESOS-6916 > URL: https://issues.apache.org/jira/browse/MESOS-6916 > Project: Mesos > Issue Type: Bug >Reporter: Gastón Kleiman >Assignee: Gastón Kleiman > Labels: health-check, mesosphere > > The "general" fields should also be validated (i.e., `timeout_seconds`), > similar to what's done in https://reviews.apache.org/r/55458/ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MESOS-5886) FUTURE_DISPATCH may react on irrelevant dispatch.
[ https://issues.apache.org/jira/browse/MESOS-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrei Budnik reassigned MESOS-5886: Assignee: Andrei Budnik > FUTURE_DISPATCH may react on irrelevant dispatch. > - > > Key: MESOS-5886 > URL: https://issues.apache.org/jira/browse/MESOS-5886 > Project: Mesos > Issue Type: Bug >Reporter: Alexander Rukletsov >Assignee: Andrei Budnik > Labels: mesosphere, tech-debt, tech-debt-test > > [{{FUTURE_DISPATCH}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L50] > uses > [{{DispatchMatcher}}|https://github.com/apache/mesos/blob/e8ebbe5fe4189ef7ab046da2276a6abee41deeb2/3rdparty/libprocess/include/process/gmock.hpp#L350] > to figure out whether a processed {{DispatchEvent}} is the same the user is > waiting for. However, comparing {{std::type_info}} of function pointers is > not enough: different class methods with same signatures will be matched. > Here is the test that proves this: > {noformat} > class DispatchProcess : public Process > { > public: > MOCK_METHOD0(func0, void()); > MOCK_METHOD1(func1, bool(bool)); > MOCK_METHOD1(func1_same_but_different, bool(bool)); > MOCK_METHOD1(func2, Future(bool)); > MOCK_METHOD1(func3, int(int)); > MOCK_METHOD2(func4, Future(bool, int)); > }; > {noformat} > {noformat} > TEST(ProcessTest, DispatchMatch) > { > DispatchProcess process; > PID pid = spawn(&process); > Future future = FUTURE_DISPATCH( > pid, > &DispatchProcess::func1_same_but_different); > EXPECT_CALL(process, func1(_)) > .WillOnce(ReturnArg<0>()); > dispatch(pid, &DispatchProcess::func1, true); > AWAIT_READY(future); > terminate(pid); > wait(pid); > } > {noformat} > The test passes: > {noformat} > [ RUN ] ProcessTest.DispatchMatch > [ OK ] ProcessTest.DispatchMatch (1 ms) > {noformat} > This change was introduced in https://reviews.apache.org/r/28052/. -- This message was sent by Atlassian JIRA (v6.3.15#6346)