[jira] [Commented] (MESOS-8933) Stop sending offers from agents in draining mode
[ https://issues.apache.org/jira/browse/MESOS-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639645#comment-16639645 ] Matthew Mead-Briggs commented on MESOS-8933: Any updated thoughts on this? Since [~sagar8192] first posted this we've decided to move forward with this option. And it's been running safely on several clusters at Yelp for some time now. It seems in practice it's going to be a long time before all the frameworks support inverse offers. And so in the short term at least the best strategy for us is to have the master not send offers for hosts that are draining. Then it is up to us to know how this affects currently running tasks for each framework. It turns out that we do need to recover the resources from the allocator so the patch ended up looking more like this: [https://gist.github.com/mattmb/88859c4a40b655d8be8bbd2d59204cf5] I agree that the best thing would be for frameworks to update and support maintenance natively but I think it would be worth having this option upstream behind a config flag as suggested. > Stop sending offers from agents in draining mode > > > Key: MESOS-8933 > URL: https://issues.apache.org/jira/browse/MESOS-8933 > Project: Mesos > Issue Type: Improvement >Reporter: Sagar Sadashiv Patwardhan >Priority: Minor > > *Background:* > At Yelp, we use mesos to run microservices(marathon), batch jobs(chronos and > custom frameworks), spark(spark mesos framework) etc. We also autoscale the > number of agents in our cluster based on the current demand and some other > metrics. We use mesos maintenance primitives to gracefully shut down mesos > agents. > *Problem:* > When we want to shut down an agent for some reason, we first move the agent > into draining mode. This allows us to gracefully terminate the micro-services > and other tasks. But, mesos continues to send offers from that agent with > unavailability set. Frameworks such as marathon, chronos, and spark ignore > the unavailability and schedule the tasks on the agent. To prevent this from > happening, we allocate all the available resources on that agent to a role > that is not used by any framework. But, this approach is not fool-proof. > There is still a race condition between when we move the agent into draining > mode and when we allocate all the available resources on the agent to > maintenance role. > *Proposal:* > It would be nice if mesos stops sending offers from the agents in draining > mode. Something like this: > [https://gist.github.com/sagar8192/0b9dbccc908818f8f9f5a18d1f634513] I don't > know if this affects the allocator or not. We can put this behind a > flag(something like --do-not-send-offers-from-agents-in-draining-mode) and > make it optional. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-7966) check for maintenance on agent causes fatal error
[ https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496309#comment-16496309 ] Matthew Mead-Briggs commented on MESOS-7966: This is great sleuthing! Probably of note here is that for PaaSTA we do use dynamic reservations via the API to attempt to prevent tasks getting scheduled on maintenanced hosts. I'm actually looking at a way to change how we do this but the rough idea of how we do it now is: * mark host for maintenance * reserve all the resources with a dummy role * paasta scales up affected marathon apps and kills off tasks on the affected host * after each task is killed we reserve the resources we've just freed up I wasn't aware that Marathon had its own reasons for doing dynamic reservations. Do you have any details you can share on why it does or a link to some code? > check for maintenance on agent causes fatal error > - > > Key: MESOS-7966 > URL: https://issues.apache.org/jira/browse/MESOS-7966 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.1.0 >Reporter: Rob Johnson >Assignee: Benno Evers >Priority: Critical > Labels: mesosphere, reliability > > We interact with the maintenance API frequently to orchestrate gracefully > draining agents of tasks without impacting service availability. > Occasionally we seem to trigger a fatal error in Mesos when interacting with > the api. This happens relatively frequently, and impacts us when downstream > frameworks (marathon) react badly to leader elections. > Here is the log line that we see when the master dies: > {code} > F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: > slaves[slaveId].maintenance.isSome() > {code} > It's quite possibly we're using the maintenance API in the wrong way. We're > happy to provide any other logs you need - please let me know what would be > useful for debugging. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-7966) check for maintenance on agent causes fatal error
[ https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493528#comment-16493528 ] Matthew Mead-Briggs commented on MESOS-7966: Thanks for taking a look at this [~bennoe], I'll have a read of the code and see if I can follow what you describe. I think the logs I shared already contain those log lines unless I've missed something? I've also dumped the unfiltered logs in a private Slack channel on the mesosphere slack if you prefer to filter yourself. Also, we are running 1.4.1 although I don't expect that makes a lot of difference. > check for maintenance on agent causes fatal error > - > > Key: MESOS-7966 > URL: https://issues.apache.org/jira/browse/MESOS-7966 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.1.0 >Reporter: Rob Johnson >Assignee: Joseph Wu >Priority: Critical > Labels: mesosphere, reliability > > We interact with the maintenance API frequently to orchestrate gracefully > draining agents of tasks without impacting service availability. > Occasionally we seem to trigger a fatal error in Mesos when interacting with > the api. This happens relatively frequently, and impacts us when downstream > frameworks (marathon) react badly to leader elections. > Here is the log line that we see when the master dies: > {code} > F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: > slaves[slaveId].maintenance.isSome() > {code} > It's quite possibly we're using the maintenance API in the wrong way. We're > happy to provide any other logs you need - please let me know what would be > useful for debugging. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-7966) check for maintenance on agent causes fatal error
[ https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470200#comment-16470200 ] Matthew Mead-Briggs commented on MESOS-7966: I've recently started looking at this again and managed to gather some logs this time. I'll post a filtered version here that might be helpful and then I'll share the full master logs privately (just incase they contain something sensitive). Filtered master logs: https://gist.github.com/mattmb/d2bb103b162da75c4e25c2dc0eadad4e > check for maintenance on agent causes fatal error > - > > Key: MESOS-7966 > URL: https://issues.apache.org/jira/browse/MESOS-7966 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.1.0 >Reporter: Rob Johnson >Assignee: Joseph Wu >Priority: Critical > Labels: mesosphere, reliability > > We interact with the maintenance API frequently to orchestrate gracefully > draining agents of tasks without impacting service availability. > Occasionally we seem to trigger a fatal error in Mesos when interacting with > the api. This happens relatively frequently, and impacts us when downstream > frameworks (marathon) react badly to leader elections. > Here is the log line that we see when the master dies: > {code} > F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: > slaves[slaveId].maintenance.isSome() > {code} > It's quite possibly we're using the maintenance API in the wrong way. We're > happy to provide any other logs you need - please let me know what would be > useful for debugging. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-7886) Add master hook for setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224609#comment-16224609 ] Matthew Mead-Briggs commented on MESOS-7886: [~vinodkone] here's the ticket that we discussed at MesosCon. I'm going to be rolling this approach out at Yelp pretty soon. The diff above is heavily inspired by the hook for setting task labels. After your talk I was thinking about how I would get the volume based secrets working too. I wonder if it makes sense to have a hook that just allows the user to override the whole TaskInfo object? Rather than a hook for each thing within it. Its a longer term goal anyway as it'll only be easy if we move to the Mesos containerizer. Maybe you have some better ideas about how we could hook secret resolvers into the master side more properly? > Add master hook for setting environment variables > - > > Key: MESOS-7886 > URL: https://issues.apache.org/jira/browse/MESOS-7886 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Matthew Mead-Briggs > > At Yelp we're planning to integrate our secret store with our platform as a > service which runs on Mesos. > I was hoping to write a module to "inject" environment variables on the > master side but the necessary hook doesn't currently exist. Such a hook > already exists on the slave side. However, for this integration that would > require me to give all the agents access to the secret store and I'd much > prefer to limit this to the master side. > There is already a hook for adding labels: > https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 > So it seems it should be pretty easy to add one for setting environment > variables too? I had a crack the other day but although I got my code to > compile something was not working at runtime (note: I'm not a C++ dev). Is > there any reason why we wouldn't want such a hook? If anyone can confirm that > it's a sane thing to add then I'd be happy to spend some time trying to get > it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-7886) Add master hook for setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173437#comment-16173437 ] Matthew Mead-Briggs edited comment on MESOS-7886 at 9/20/17 4:23 PM: - [~kaysoky] here's a sample diff to show what this might look like: https://gist.github.com/mattmb/a273801c43ce127fe1c311b9fdc8d8f2 I haven't written any tests and really this is just copy paste from the masterLaunchTaskLabelDecorator. What are your thoughts? I'm still convinced this is the way for us to go at Yelp. Do you think it's worth tidying this up and submitting a patch? I have tried this out by compiling it locally and writing a simple hook to inject an env var and it seems to work fine. was (Author: mmb): [~kaysoky] here's a sample diff to show what this might look like: https://gist.github.com/mattmb/a273801c43ce127fe1c311b9fdc8d8f2 I haven't written any tests and really this is just copy paste from the masterLaunchTaskLabelDecorator. What are your thoughts? I'm still convinced this is the way for us to go at Yelp. Do you think it's worth tidying this up and submitting a patch? > Add master hook for setting environment variables > - > > Key: MESOS-7886 > URL: https://issues.apache.org/jira/browse/MESOS-7886 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Matthew Mead-Briggs > > At Yelp we're planning to integrate our secret store with our platform as a > service which runs on Mesos. > I was hoping to write a module to "inject" environment variables on the > master side but the necessary hook doesn't currently exist. Such a hook > already exists on the slave side. However, for this integration that would > require me to give all the agents access to the secret store and I'd much > prefer to limit this to the master side. > There is already a hook for adding labels: > https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 > So it seems it should be pretty easy to add one for setting environment > variables too? I had a crack the other day but although I got my code to > compile something was not working at runtime (note: I'm not a C++ dev). Is > there any reason why we wouldn't want such a hook? If anyone can confirm that > it's a sane thing to add then I'd be happy to spend some time trying to get > it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7886) Add master hook for setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173437#comment-16173437 ] Matthew Mead-Briggs commented on MESOS-7886: [~kaysoky] here's a sample diff to show what this might look like: https://gist.github.com/mattmb/a273801c43ce127fe1c311b9fdc8d8f2 I haven't written any tests and really this is just copy paste from the masterLaunchTaskLabelDecorator. What are your thoughts? I'm still convinced this is the way for us to go at Yelp. Do you think it's worth tidying this up and submitting a patch? > Add master hook for setting environment variables > - > > Key: MESOS-7886 > URL: https://issues.apache.org/jira/browse/MESOS-7886 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Matthew Mead-Briggs > > At Yelp we're planning to integrate our secret store with our platform as a > service which runs on Mesos. > I was hoping to write a module to "inject" environment variables on the > master side but the necessary hook doesn't currently exist. Such a hook > already exists on the slave side. However, for this integration that would > require me to give all the agents access to the secret store and I'd much > prefer to limit this to the master side. > There is already a hook for adding labels: > https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 > So it seems it should be pretty easy to add one for setting environment > variables too? I had a crack the other day but although I got my code to > compile something was not working at runtime (note: I'm not a C++ dev). Is > there any reason why we wouldn't want such a hook? If anyone can confirm that > it's a sane thing to add then I'd be happy to spend some time trying to get > it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7886) Add master hook for setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145113#comment-16145113 ] Matthew Mead-Briggs commented on MESOS-7886: [~kaysoky] do you have any more thoughts? I'm hoping to find some more time this week to have another look at getting this working in my local environment. If you think this approach is a complete non-starter then please shout up. > Add master hook for setting environment variables > - > > Key: MESOS-7886 > URL: https://issues.apache.org/jira/browse/MESOS-7886 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Matthew Mead-Briggs > > At Yelp we're planning to integrate our secret store with our platform as a > service which runs on Mesos. > I was hoping to write a module to "inject" environment variables on the > master side but the necessary hook doesn't currently exist. Such a hook > already exists on the slave side. However, for this integration that would > require me to give all the agents access to the secret store and I'd much > prefer to limit this to the master side. > There is already a hook for adding labels: > https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 > So it seems it should be pretty easy to add one for setting environment > variables too? I had a crack the other day but although I got my code to > compile something was not working at runtime (note: I'm not a C++ dev). Is > there any reason why we wouldn't want such a hook? If anyone can confirm that > it's a sane thing to add then I'd be happy to spend some time trying to get > it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7886) Add master hook for setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127266#comment-16127266 ] Matthew Mead-Briggs commented on MESOS-7886: Ah yes, I had thought about the command line issue when I spotted that the docker executor just passes "-e blah" to the docker run command. Luckily, linux gives us some options to hide the command line args: https://www.linux-dev.org/2012/09/hide-process-information-for-other-users/ That said, I think it isn't too bad for a compromised agent to give up all the secrets of the tasks running on it. Compared to giving the agent permission to fetch/decrypt any secret for any task that it needs to start. That's the real reason I want to pursue the master based decryption option. Maintaining the hook is certainly a concern, I'm expecting to have to build and test it against each Mesos release. I guess you are also concerned about maintaining it in the Mesos code base itself? i.e. if the way TaskInfo/Env vars are handled changes then you may have to update the hook code? > Add master hook for setting environment variables > - > > Key: MESOS-7886 > URL: https://issues.apache.org/jira/browse/MESOS-7886 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Matthew Mead-Briggs > > At Yelp we're planning to integrate our secret store with our platform as a > service which runs on Mesos. > I was hoping to write a module to "inject" environment variables on the > master side but the necessary hook doesn't currently exist. Such a hook > already exists on the slave side. However, for this integration that would > require me to give all the agents access to the secret store and I'd much > prefer to limit this to the master side. > There is already a hook for adding labels: > https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 > So it seems it should be pretty easy to add one for setting environment > variables too? I had a crack the other day but although I got my code to > compile something was not working at runtime (note: I'm not a C++ dev). Is > there any reason why we wouldn't want such a hook? If anyone can confirm that > it's a sane thing to add then I'd be happy to spend some time trying to get > it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7886) Add master hook for setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126141#comment-16126141 ] Matthew Mead-Briggs commented on MESOS-7886: Answering my own question by RTFM: http://mesos.apache.org/documentation/latest/ssl/ Looks like that would make things safe between the master and the agents! > Add master hook for setting environment variables > - > > Key: MESOS-7886 > URL: https://issues.apache.org/jira/browse/MESOS-7886 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Matthew Mead-Briggs > > At Yelp we're planning to integrate our secret store with our platform as a > service which runs on Mesos. > I was hoping to write a module to "inject" environment variables on the > master side but the necessary hook doesn't currently exist. Such a hook > already exists on the slave side. However, for this integration that would > require me to give all the agents access to the secret store and I'd much > prefer to limit this to the master side. > There is already a hook for adding labels: > https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 > So it seems it should be pretty easy to add one for setting environment > variables too? I had a crack the other day but although I got my code to > compile something was not working at runtime (note: I'm not a C++ dev). Is > there any reason why we wouldn't want such a hook? If anyone can confirm that > it's a sane thing to add then I'd be happy to spend some time trying to get > it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7886) Add master hook for setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126127#comment-16126127 ] Matthew Mead-Briggs commented on MESOS-7886: Another concern I have is that things aren't encrypted "over the wire" from what I can tell? Am I right in saying that mesos doesn't currently support any encryption between the masters and the agents? > Add master hook for setting environment variables > - > > Key: MESOS-7886 > URL: https://issues.apache.org/jira/browse/MESOS-7886 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Matthew Mead-Briggs > > At Yelp we're planning to integrate our secret store with our platform as a > service which runs on Mesos. > I was hoping to write a module to "inject" environment variables on the > master side but the necessary hook doesn't currently exist. Such a hook > already exists on the slave side. However, for this integration that would > require me to give all the agents access to the secret store and I'd much > prefer to limit this to the master side. > There is already a hook for adding labels: > https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 > So it seems it should be pretty easy to add one for setting environment > variables too? I had a crack the other day but although I got my code to > compile something was not working at runtime (note: I'm not a C++ dev). Is > there any reason why we wouldn't want such a hook? If anyone can confirm that > it's a sane thing to add then I'd be happy to spend some time trying to get > it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7886) Add master hook for setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126113#comment-16126113 ] Matthew Mead-Briggs commented on MESOS-7886: Thanks for the comments [~kaysoky] I'm glad to get some context before I decide for sure on the way to go :-) Security wise, is the logging your only concern? It's something we'd have to consider but also something that we might be able to mitigate ourselves. Regarding limiting business logic on the master: this would be adding a hook to allow the user to add some logic right? I think I'm not fully grasping your concern. Is there a case where we break something because of different agent configurations, they all support environment variables right? > Add master hook for setting environment variables > - > > Key: MESOS-7886 > URL: https://issues.apache.org/jira/browse/MESOS-7886 > Project: Mesos > Issue Type: Improvement > Components: modules >Reporter: Matthew Mead-Briggs > > At Yelp we're planning to integrate our secret store with our platform as a > service which runs on Mesos. > I was hoping to write a module to "inject" environment variables on the > master side but the necessary hook doesn't currently exist. Such a hook > already exists on the slave side. However, for this integration that would > require me to give all the agents access to the secret store and I'd much > prefer to limit this to the master side. > There is already a hook for adding labels: > https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 > So it seems it should be pretty easy to add one for setting environment > variables too? I had a crack the other day but although I got my code to > compile something was not working at runtime (note: I'm not a C++ dev). Is > there any reason why we wouldn't want such a hook? If anyone can confirm that > it's a sane thing to add then I'd be happy to spend some time trying to get > it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-7886) Add master hook for setting environment variables
Matthew Mead-Briggs created MESOS-7886: -- Summary: Add master hook for setting environment variables Key: MESOS-7886 URL: https://issues.apache.org/jira/browse/MESOS-7886 Project: Mesos Issue Type: Improvement Components: modules Reporter: Matthew Mead-Briggs At Yelp we're planning to integrate our secret store with our platform as a service which runs on Mesos. I was hoping to write a module to "inject" environment variables on the master side but the necessary hook doesn't currently exist. Such a hook already exists on the slave side. However, for this integration that would require me to give all the agents access to the secret store and I'd much prefer to limit this to the master side. There is already a hook for adding labels: https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/include/mesos/hook.hpp#L44-L48 So it seems it should be pretty easy to add one for setting environment variables too? I had a crack the other day but although I got my code to compile something was not working at runtime (note: I'm not a C++ dev). Is there any reason why we wouldn't want such a hook? If anyone can confirm that it's a sane thing to add then I'd be happy to spend some time trying to get it working (although I may need some help)! -- This message was sent by Atlassian JIRA (v6.4.14#64029)