Stephan and Joshua, I am working on design of an external component which will consume DiscoveryInfo from Mesos and announce it to Zookeeper. (You see why I added DiscoveryInfo to Mesos now :) )
The reason I like to delegate this component to a third party component (calling it a Mesos-ZK-Bridge) here: - it maintains loose coupling between deployment and service discovery, which are related but not the same thing; - it makes operations like reconfiguring zk path, rotating ACL and so on a bit easier; - this design enables us to support service discovery for other mesos frameworks in a multi-tenancy mesos cluster. On Tue, Apr 12, 2016 at 7:49 AM, Joshua Cohen <jco...@apache.org> wrote: > As things stand today, once a task is scheduled, the scheduler can die, or > be shut down for maintenance, etc. with no impact to running tasks. If the > scheduler were responsible for announcing and it maintained the current > practice of creating znodes as ephemeral, it would need to maintain a > persistent ZK connection to ensure all announceable tasks are discoverable. > This means if the scheduler went down for any reason (or just whenever it > failed over normally) all serversets would be lost. That's obviously not > acceptable, so the alternative, if we wanted the scheduler to manage > announcing tasks, would be to stop using ephemeral nodes and instead have > the scheduler create persistent znodes and manually tear them down when > tasks finish. While not as problematic as the above, this is still > potentially a degradation from the current behavior in that there can be a > long delay between a task exiting and the scheduler becoming aware of this > (e.g. task finishes during a netsplit). This might be equivalent in scope > to the issue of a zombie instance that's possible today (i.e. scheduler > thinks task has been killed and restarts it elsewhere, but Mesos fails to > clean up the cgroup for whatever reason leading to double registration for > some time). However, today, due to their ephemeral nature, cleaning up > duplicate znodes is guaranteed as soon as the executor exits. If the znodes > were persistent, we'd have to manually seek out and clean up duplicate > znodes (or build tooling to support this). > > It's certainly possible to transition responsibility for announcing from > the executor to the scheduler, but I think it would be a fairly significant > amount of work and add a fair amount of complexity. > > On Tue, Apr 12, 2016 at 7:35 AM, Erb, Stephan <stephan....@blue-yonder.com > > > wrote: > > > I have recently run into another issue due to Thermos running as root ( > > https://issues.apache.org/jira/browse/MESOS-5187). I would therefore > like > > to re-open the discussion. > > > > IIRC there was a JIRA ticket proposing that the scheduler should be > > responsible for announcing services in Zookeeper. > > > > Would that work? It looks like this could solve a couple of issues at > once: > > > > * no need for credentials on slaves, we could therefore run the executor > > as a normal user > > * the announcement would also work for tasks using alternative executors > > > > > > ________________________________________ > > From: John Sirois <j...@conductant.com> > > Sent: Tuesday, March 29, 2016 23:55 > > To: Bill Farner > > Cc: dev@aurora.apache.org; John Sirois > > Subject: Re: Looking for feedback - Setting CommandInfo.user by default > > when launching tasks. > > > > On Tue, Mar 29, 2016 at 3:50 PM, Bill Farner <wfar...@apache.org> wrote: > > > > > Aha, i think we have different notions of the proposal. I was under > the > > > impression that the executor itself would run as the target user (e.g. > > > steve), not as a system user (e.g. aurora). I find the former more > > > appealing, with the exception that it leaves us without a solution for > > > concealing the credentials file. > > > > > > > My sketch was nonsense anyhow, since thermos running as aurora couldn't > > launch a task as www-data. Fundamentally we need a priviledged executor > > (thermos) that can 1: read the creds and announce 2: run the task and > > health checks nnoth as the targeted un-privileged user. > > > > > > > > > > On Tue, Mar 29, 2016 at 2:39 PM, John Sirois <j...@conductant.com> > > wrote: > > > > > >> On Tue, Mar 29, 2016 at 3:26 PM, Bill Farner <wfar...@apache.org> > > wrote: > > >> > > >> > If i'm understanding you correctly, that doesn't address preventing > > >> users > > >> > from reading the credentials though. > > >> > > > >> > > >> It does: > > >> > > >> Say: > > >> /var/lib/aurora/creds 400 root > > >> > > >> And then if the CommandInfo has user: aurora (executor user as Steve > > >> suggested), it will get a copy of /var/lib/aurora/creds in its > sandbox > > >> chowned to 400 aurora > > >> > > >> Now aurora's executor (thermos), launches a task in role www-data > > >> announcing for it using the cred. The www-data user will not be able > to > > >> read the local sandbox 400 aurora creds. > > >> > > >> > > >> > On Tue, Mar 29, 2016 at 1:52 PM, John Sirois <jsir...@apache.org> > > >> wrote: > > >> > > > >> > > On Tue, Mar 29, 2016 at 2:31 PM, Steve Niemitz < > sniem...@apache.org > > > > > >> > > wrote: > > >> > > > > >> > > > So maybe we add it, but don't change the current default > behavior? > > >> > > > > > >> > > > > >> > > Could we use the CommandInfo.uris [1] to solve this? IE: the > > >> scheduler > > >> > > would need to learn the credential file path, and with that > > knowledge, > > >> > the > > >> > > local mesos (root) readable credential file could be specified as > a > > >> uir > > >> > > dependency for all launched executors (or bare commands). IIUC, > if > > >> the > > >> > URI > > >> > > if a file:// the local secured credentails file will be copied > into > > >> the > > >> > > sandbox where it can be read by the executor (as aurora). > > >> > > > > >> > > [1] > > >> > > > > >> > > > >> > > > https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L422 > > >> > > > > >> > > > > >> > > > > > >> > > > On Tue, Mar 29, 2016 at 4:26 PM, Bill Farner < > wfar...@apache.org> > > >> > wrote: > > >> > > > > > >> > > > > I'm in favor of moving forward. There's no requirement to use > > the > > >> > > > > Announcer, and a non-root executor seems like a useful option. > > >> > > > > > > >> > > > > On Tue, Mar 29, 2016 at 1:00 PM, Steve Niemitz < > > >> sniem...@apache.org> > > >> > > > > wrote: > > >> > > > > > > >> > > > > > Makes sense, I guess it can be up to the cluster operator > > which > > >> > model > > >> > > > to > > >> > > > > > choose. Is there any interest in the feature I proposed or > > >> should > > >> > I > > >> > > > just > > >> > > > > > drop it? It's not a lot of code, but also it's not a > > >> requirement > > >> > for > > >> > > > > > anything we're working on either (the docker stuff however, > > is). > > >> > > > > > > > >> > > > > > On Tue, Mar 29, 2016 at 1:39 PM, Bill Farner < > > >> wfar...@apache.org> > > >> > > > wrote: > > >> > > > > > > > >> > > > > > > That's correct - those credentials should require > privileged > > >> > > access. > > >> > > > > > > > > >> > > > > > > On Tue, Mar 29, 2016 at 10:25 AM, Steve Niemitz < > > >> > > > > > > sniem...@twitter.com.invalid> wrote: > > >> > > > > > > > > >> > > > > > > > Re: ZK credential files, thats an interesting issue, I > > >> assume > > >> > you > > >> > > > > don't > > >> > > > > > > > want the role user to be able to read it either, and > only > > >> root > > >> > or > > >> > > > > some > > >> > > > > > > > other privileged user? > > >> > > > > > > > > > >> > > > > > > > On Tue, Mar 29, 2016 at 12:14 PM, Erb, Stephan < > > >> > > > > > > > stephan....@blue-yonder.com> > > >> > > > > > > > wrote: > > >> > > > > > > > > > >> > > > > > > > > I am in favor of your proposal. We offer less attack > > >> surface > > >> > if > > >> > > > the > > >> > > > > > > > > executor is not running as root. > > >> > > > > > > > > > > >> > > > > > > > > Interesting though, this introduces another security > > >> problem: > > >> > > The > > >> > > > > > > > > credentials file in the incoming Zookeeper ACL patch > ( > > >> > > > > > > > > https://reviews.apache.org/r/45042/) will have to be > > >> > readable > > >> > > by > > >> > > > > > > > > everyone. That feels a little bit like being back to > > >> square > > >> > > one. > > >> > > > > > > > > ________________________________________ > > >> > > > > > > > > From: Steve Niemitz <sniem...@apache.org> > > >> > > > > > > > > Sent: Tuesday, March 29, 2016 17:34 > > >> > > > > > > > > To: dev@aurora.apache.org > > >> > > > > > > > > Subject: Looking for feedback - Setting > CommandInfo.user > > >> by > > >> > > > default > > >> > > > > > > when > > >> > > > > > > > > launching tasks. > > >> > > > > > > > > > > >> > > > > > > > > I've been working on some changes to how aurora > submits > > >> tasks > > >> > > to > > >> > > > > > mesos, > > >> > > > > > > > > specifically around Docker tasks, but I'd also like to > > see > > >> > how > > >> > > > > people > > >> > > > > > > > feel > > >> > > > > > > > > about making it more general. > > >> > > > > > > > > > > >> > > > > > > > > Currently, when Aurora submits a task to mesos, it > does > > >> NOT > > >> > set > > >> > > > > > > > > command.user on the ExecutorInfo, this means that > mesos > > >> > > > configures > > >> > > > > > the > > >> > > > > > > > > sandbox (mesos sandbox that is) as root, and launches > > the > > >> > > > executor > > >> > > > > > > > > (thermos_executor in our case) as root as well. > > >> > > > > > > > > > > >> > > > > > > > > What then happens is that the executor then chown()s > the > > >> > > sandbox > > >> > > > it > > >> > > > > > > > creates > > >> > > > > > > > > to the aurora role/user, and also setuid()s the > runners > > it > > >> > > forks > > >> > > > to > > >> > > > > > > that > > >> > > > > > > > > role/user. However, the executor itself is still > > running > > >> as > > >> > > > root. > > >> > > > > > > > > > > >> > > > > > > > > My proposal / change is to set command.user to the > > aurora > > >> > role > > >> > > by > > >> > > > > > > > default, > > >> > > > > > > > > which will cause the executor to run as that user. > I've > > >> > tested > > >> > > > > this > > >> > > > > > > > > already, and no changes are needed to the executor, it > > >> will > > >> > > still > > >> > > > > try > > >> > > > > > > to > > >> > > > > > > > > chown the sandbox (which is fine since it already owns > > >> it), > > >> > and > > >> > > > > > > setuid() > > >> > > > > > > > > the runners it forks (again, fine, since they're > already > > >> > > running > > >> > > > as > > >> > > > > > > that > > >> > > > > > > > > user). > > >> > > > > > > > > > > >> > > > > > > > > *The controversial part of this* however is I'd like > to > > >> > enable > > >> > > > this > > >> > > > > > > > > behavior BY DEFAULT, and allow disabling it (reverting > > to > > >> the > > >> > > > > current > > >> > > > > > > > > behavior now) via a flag to the scheduler. My > reasoning > > >> here > > >> > > is > > >> > > > > two > > >> > > > > > > > fold. > > >> > > > > > > > > 1) It's a more secure default, preventing a > compromised > > >> > > executor > > >> > > > > > from > > >> > > > > > > > > doing things it shouldn't, and 2) we already have a > lot > > of > > >> > > "flag > > >> > > > > > > bloat", > > >> > > > > > > > > and flags are hard enough to discover as they are. > > >> However, > > >> > I > > >> > > do > > >> > > > > > > believe > > >> > > > > > > > > this should be considered as a "breaking change", > > >> > particularly > > >> > > > > > because > > >> > > > > > > of > > >> > > > > > > > > finicky PEX extraction for the executor. > > >> > > > > > > > > > > >> > > > > > > > > I'd like to hear people's thoughts on this. > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> -- > > >> John Sirois > > >> 303-512-3301 > > >> > > > > > > > > > > > > -- > > John Sirois > > 303-512-3301 > > > -- Cheers, Zhitao Li