As things stand today, once a task is scheduled, the scheduler can die, or be shut down for maintenance, etc. with no impact to running tasks. If the scheduler were responsible for announcing and it maintained the current practice of creating znodes as ephemeral, it would need to maintain a persistent ZK connection to ensure all announceable tasks are discoverable. This means if the scheduler went down for any reason (or just whenever it failed over normally) all serversets would be lost. That's obviously not acceptable, so the alternative, if we wanted the scheduler to manage announcing tasks, would be to stop using ephemeral nodes and instead have the scheduler create persistent znodes and manually tear them down when tasks finish. While not as problematic as the above, this is still potentially a degradation from the current behavior in that there can be a long delay between a task exiting and the scheduler becoming aware of this (e.g. task finishes during a netsplit). This might be equivalent in scope to the issue of a zombie instance that's possible today (i.e. scheduler thinks task has been killed and restarts it elsewhere, but Mesos fails to clean up the cgroup for whatever reason leading to double registration for some time). However, today, due to their ephemeral nature, cleaning up duplicate znodes is guaranteed as soon as the executor exits. If the znodes were persistent, we'd have to manually seek out and clean up duplicate znodes (or build tooling to support this).
It's certainly possible to transition responsibility for announcing from the executor to the scheduler, but I think it would be a fairly significant amount of work and add a fair amount of complexity. On Tue, Apr 12, 2016 at 7:35 AM, Erb, Stephan <stephan....@blue-yonder.com> wrote: > I have recently run into another issue due to Thermos running as root ( > https://issues.apache.org/jira/browse/MESOS-5187). I would therefore like > to re-open the discussion. > > IIRC there was a JIRA ticket proposing that the scheduler should be > responsible for announcing services in Zookeeper. > > Would that work? It looks like this could solve a couple of issues at once: > > * no need for credentials on slaves, we could therefore run the executor > as a normal user > * the announcement would also work for tasks using alternative executors > > > ________________________________________ > From: John Sirois <j...@conductant.com> > Sent: Tuesday, March 29, 2016 23:55 > To: Bill Farner > Cc: dev@aurora.apache.org; John Sirois > Subject: Re: Looking for feedback - Setting CommandInfo.user by default > when launching tasks. > > On Tue, Mar 29, 2016 at 3:50 PM, Bill Farner <wfar...@apache.org> wrote: > > > Aha, i think we have different notions of the proposal. I was under the > > impression that the executor itself would run as the target user (e.g. > > steve), not as a system user (e.g. aurora). I find the former more > > appealing, with the exception that it leaves us without a solution for > > concealing the credentials file. > > > > My sketch was nonsense anyhow, since thermos running as aurora couldn't > launch a task as www-data. Fundamentally we need a priviledged executor > (thermos) that can 1: read the creds and announce 2: run the task and > health checks nnoth as the targeted un-privileged user. > > > > > > On Tue, Mar 29, 2016 at 2:39 PM, John Sirois <j...@conductant.com> > wrote: > > > >> On Tue, Mar 29, 2016 at 3:26 PM, Bill Farner <wfar...@apache.org> > wrote: > >> > >> > If i'm understanding you correctly, that doesn't address preventing > >> users > >> > from reading the credentials though. > >> > > >> > >> It does: > >> > >> Say: > >> /var/lib/aurora/creds 400 root > >> > >> And then if the CommandInfo has user: aurora (executor user as Steve > >> suggested), it will get a copy of /var/lib/aurora/creds in its sandbox > >> chowned to 400 aurora > >> > >> Now aurora's executor (thermos), launches a task in role www-data > >> announcing for it using the cred. The www-data user will not be able to > >> read the local sandbox 400 aurora creds. > >> > >> > >> > On Tue, Mar 29, 2016 at 1:52 PM, John Sirois <jsir...@apache.org> > >> wrote: > >> > > >> > > On Tue, Mar 29, 2016 at 2:31 PM, Steve Niemitz <sniem...@apache.org > > > >> > > wrote: > >> > > > >> > > > So maybe we add it, but don't change the current default behavior? > >> > > > > >> > > > >> > > Could we use the CommandInfo.uris [1] to solve this? IE: the > >> scheduler > >> > > would need to learn the credential file path, and with that > knowledge, > >> > the > >> > > local mesos (root) readable credential file could be specified as a > >> uir > >> > > dependency for all launched executors (or bare commands). IIUC, if > >> the > >> > URI > >> > > if a file:// the local secured credentails file will be copied into > >> the > >> > > sandbox where it can be read by the executor (as aurora). > >> > > > >> > > [1] > >> > > > >> > > >> > https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L422 > >> > > > >> > > > >> > > > > >> > > > On Tue, Mar 29, 2016 at 4:26 PM, Bill Farner <wfar...@apache.org> > >> > wrote: > >> > > > > >> > > > > I'm in favor of moving forward. There's no requirement to use > the > >> > > > > Announcer, and a non-root executor seems like a useful option. > >> > > > > > >> > > > > On Tue, Mar 29, 2016 at 1:00 PM, Steve Niemitz < > >> sniem...@apache.org> > >> > > > > wrote: > >> > > > > > >> > > > > > Makes sense, I guess it can be up to the cluster operator > which > >> > model > >> > > > to > >> > > > > > choose. Is there any interest in the feature I proposed or > >> should > >> > I > >> > > > just > >> > > > > > drop it? It's not a lot of code, but also it's not a > >> requirement > >> > for > >> > > > > > anything we're working on either (the docker stuff however, > is). > >> > > > > > > >> > > > > > On Tue, Mar 29, 2016 at 1:39 PM, Bill Farner < > >> wfar...@apache.org> > >> > > > wrote: > >> > > > > > > >> > > > > > > That's correct - those credentials should require privileged > >> > > access. > >> > > > > > > > >> > > > > > > On Tue, Mar 29, 2016 at 10:25 AM, Steve Niemitz < > >> > > > > > > sniem...@twitter.com.invalid> wrote: > >> > > > > > > > >> > > > > > > > Re: ZK credential files, thats an interesting issue, I > >> assume > >> > you > >> > > > > don't > >> > > > > > > > want the role user to be able to read it either, and only > >> root > >> > or > >> > > > > some > >> > > > > > > > other privileged user? > >> > > > > > > > > >> > > > > > > > On Tue, Mar 29, 2016 at 12:14 PM, Erb, Stephan < > >> > > > > > > > stephan....@blue-yonder.com> > >> > > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > > I am in favor of your proposal. We offer less attack > >> surface > >> > if > >> > > > the > >> > > > > > > > > executor is not running as root. > >> > > > > > > > > > >> > > > > > > > > Interesting though, this introduces another security > >> problem: > >> > > The > >> > > > > > > > > credentials file in the incoming Zookeeper ACL patch ( > >> > > > > > > > > https://reviews.apache.org/r/45042/) will have to be > >> > readable > >> > > by > >> > > > > > > > > everyone. That feels a little bit like being back to > >> square > >> > > one. > >> > > > > > > > > ________________________________________ > >> > > > > > > > > From: Steve Niemitz <sniem...@apache.org> > >> > > > > > > > > Sent: Tuesday, March 29, 2016 17:34 > >> > > > > > > > > To: dev@aurora.apache.org > >> > > > > > > > > Subject: Looking for feedback - Setting CommandInfo.user > >> by > >> > > > default > >> > > > > > > when > >> > > > > > > > > launching tasks. > >> > > > > > > > > > >> > > > > > > > > I've been working on some changes to how aurora submits > >> tasks > >> > > to > >> > > > > > mesos, > >> > > > > > > > > specifically around Docker tasks, but I'd also like to > see > >> > how > >> > > > > people > >> > > > > > > > feel > >> > > > > > > > > about making it more general. > >> > > > > > > > > > >> > > > > > > > > Currently, when Aurora submits a task to mesos, it does > >> NOT > >> > set > >> > > > > > > > > command.user on the ExecutorInfo, this means that mesos > >> > > > configures > >> > > > > > the > >> > > > > > > > > sandbox (mesos sandbox that is) as root, and launches > the > >> > > > executor > >> > > > > > > > > (thermos_executor in our case) as root as well. > >> > > > > > > > > > >> > > > > > > > > What then happens is that the executor then chown()s the > >> > > sandbox > >> > > > it > >> > > > > > > > creates > >> > > > > > > > > to the aurora role/user, and also setuid()s the runners > it > >> > > forks > >> > > > to > >> > > > > > > that > >> > > > > > > > > role/user. However, the executor itself is still > running > >> as > >> > > > root. > >> > > > > > > > > > >> > > > > > > > > My proposal / change is to set command.user to the > aurora > >> > role > >> > > by > >> > > > > > > > default, > >> > > > > > > > > which will cause the executor to run as that user. I've > >> > tested > >> > > > > this > >> > > > > > > > > already, and no changes are needed to the executor, it > >> will > >> > > still > >> > > > > try > >> > > > > > > to > >> > > > > > > > > chown the sandbox (which is fine since it already owns > >> it), > >> > and > >> > > > > > > setuid() > >> > > > > > > > > the runners it forks (again, fine, since they're already > >> > > running > >> > > > as > >> > > > > > > that > >> > > > > > > > > user). > >> > > > > > > > > > >> > > > > > > > > *The controversial part of this* however is I'd like to > >> > enable > >> > > > this > >> > > > > > > > > behavior BY DEFAULT, and allow disabling it (reverting > to > >> the > >> > > > > current > >> > > > > > > > > behavior now) via a flag to the scheduler. My reasoning > >> here > >> > > is > >> > > > > two > >> > > > > > > > fold. > >> > > > > > > > > 1) It's a more secure default, preventing a compromised > >> > > executor > >> > > > > > from > >> > > > > > > > > doing things it shouldn't, and 2) we already have a lot > of > >> > > "flag > >> > > > > > > bloat", > >> > > > > > > > > and flags are hard enough to discover as they are. > >> However, > >> > I > >> > > do > >> > > > > > > believe > >> > > > > > > > > this should be considered as a "breaking change", > >> > particularly > >> > > > > > because > >> > > > > > > of > >> > > > > > > > > finicky PEX extraction for the executor. > >> > > > > > > > > > >> > > > > > > > > I'd like to hear people's thoughts on this. > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >> > >> -- > >> John Sirois > >> 303-512-3301 > >> > > > > > > > -- > John Sirois > 303-512-3301 >