Stephan and Joshua,

I am working on design of an external component which will consume
DiscoveryInfo from Mesos and announce it to Zookeeper. (You see why I added
DiscoveryInfo to Mesos now :) )

The reason I like to delegate this component to a third party component
(calling it a Mesos-ZK-Bridge) here:
- it maintains loose coupling between deployment and service discovery,
which are related but not the same thing;
- it makes operations like reconfiguring zk path, rotating ACL and so on a
bit easier;
- this design enables us to support service discovery for other mesos
frameworks in a multi-tenancy mesos cluster.


On Tue, Apr 12, 2016 at 7:49 AM, Joshua Cohen <jco...@apache.org> wrote:

> As things stand today, once a task is scheduled, the scheduler can die, or
> be shut down for maintenance, etc. with no impact to running tasks. If the
> scheduler were responsible for announcing and it maintained the current
> practice of creating znodes as ephemeral, it would need to maintain a
> persistent ZK connection to ensure all announceable tasks are discoverable.
> This means if the scheduler went down for any reason (or just whenever it
> failed over normally) all serversets would be lost. That's obviously not
> acceptable, so the alternative, if we wanted the scheduler to manage
> announcing tasks, would be to stop using ephemeral nodes and instead have
> the scheduler create persistent znodes and manually tear them down when
> tasks finish. While not as problematic as the above, this is still
> potentially a degradation from the current behavior in that there can be a
> long delay between a task exiting and the scheduler becoming aware of this
> (e.g. task finishes during a netsplit). This might be equivalent in scope
> to the issue of a zombie instance that's possible today (i.e. scheduler
> thinks task has been killed and restarts it elsewhere, but Mesos fails to
> clean up the cgroup for whatever reason leading to double registration for
> some time). However, today, due to their ephemeral nature, cleaning up
> duplicate znodes is guaranteed as soon as the executor exits. If the znodes
> were persistent, we'd have to manually seek out and clean up duplicate
> znodes (or build tooling to support this).
>
> It's certainly possible to transition responsibility for announcing from
> the executor to the scheduler, but I think it would be a fairly significant
> amount of work and add a fair amount of complexity.
>
> On Tue, Apr 12, 2016 at 7:35 AM, Erb, Stephan <stephan....@blue-yonder.com
> >
> wrote:
>
> > I have recently run into another issue due to Thermos running as root (
> > https://issues.apache.org/jira/browse/MESOS-5187). I would therefore
> like
> > to re-open the discussion.
> >
> > IIRC there was a JIRA ticket proposing that the scheduler should be
> > responsible for announcing services in Zookeeper.
> >
> > Would that work? It looks like this could solve a couple of issues at
> once:
> >
> > * no need for credentials on slaves, we could therefore run the executor
> > as a normal user
> > * the announcement would also work for tasks using alternative executors
> >
> >
> > ________________________________________
> > From: John Sirois <j...@conductant.com>
> > Sent: Tuesday, March 29, 2016 23:55
> > To: Bill Farner
> > Cc: dev@aurora.apache.org; John Sirois
> > Subject: Re: Looking for feedback - Setting CommandInfo.user by default
> > when launching tasks.
> >
> > On Tue, Mar 29, 2016 at 3:50 PM, Bill Farner <wfar...@apache.org> wrote:
> >
> > > Aha, i think we have different notions of the proposal.  I was under
> the
> > > impression that the executor itself would run as the target user (e.g.
> > > steve), not as a system user (e.g. aurora).  I find the former more
> > > appealing, with the exception that it leaves us without a solution for
> > > concealing the credentials file.
> > >
> >
> > My sketch was nonsense anyhow, since thermos running as aurora couldn't
> > launch a task as www-data.  Fundamentally we need a priviledged executor
> > (thermos) that can 1: read the creds and announce 2: run the task and
> > health checks nnoth as the targeted un-privileged user.
> >
> >
> > >
> > > On Tue, Mar 29, 2016 at 2:39 PM, John Sirois <j...@conductant.com>
> > wrote:
> > >
> > >> On Tue, Mar 29, 2016 at 3:26 PM, Bill Farner <wfar...@apache.org>
> > wrote:
> > >>
> > >> > If i'm understanding you correctly, that doesn't address preventing
> > >> users
> > >> > from reading the credentials though.
> > >> >
> > >>
> > >> It does:
> > >>
> > >> Say:
> > >> /var/lib/aurora/creds 400 root
> > >>
> > >> And then if the CommandInfo has user: aurora (executor user as Steve
> > >> suggested), it will get a copy of /var/lib/aurora/creds  in its
> sandbox
> > >> chowned to 400 aurora
> > >>
> > >> Now aurora's executor (thermos), launches a task in role www-data
> > >> announcing for it using the cred.  The www-data user will not be able
> to
> > >> read the local sandbox 400 aurora creds.
> > >>
> > >>
> > >> > On Tue, Mar 29, 2016 at 1:52 PM, John Sirois <jsir...@apache.org>
> > >> wrote:
> > >> >
> > >> > > On Tue, Mar 29, 2016 at 2:31 PM, Steve Niemitz <
> sniem...@apache.org
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > So maybe we add it, but don't change the current default
> behavior?
> > >> > > >
> > >> > >
> > >> > > Could we use the CommandInfo.uris [1] to solve this?  IE: the
> > >> scheduler
> > >> > > would need to learn the credential file path, and with that
> > knowledge,
> > >> > the
> > >> > > local mesos (root) readable credential file could be specified as
> a
> > >> uir
> > >> > > dependency for all launched executors (or bare commands).  IIUC,
> if
> > >> the
> > >> > URI
> > >> > > if a file:// the local secured credentails file will be copied
> into
> > >> the
> > >> > > sandbox where it can be read by the executor (as aurora).
> > >> > >
> > >> > > [1]
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L422
> > >> > >
> > >> > >
> > >> > > >
> > >> > > > On Tue, Mar 29, 2016 at 4:26 PM, Bill Farner <
> wfar...@apache.org>
> > >> > wrote:
> > >> > > >
> > >> > > > > I'm in favor of moving forward.  There's no requirement to use
> > the
> > >> > > > > Announcer, and a non-root executor seems like a useful option.
> > >> > > > >
> > >> > > > > On Tue, Mar 29, 2016 at 1:00 PM, Steve Niemitz <
> > >> sniem...@apache.org>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Makes sense, I guess it can be up to the cluster operator
> > which
> > >> > model
> > >> > > > to
> > >> > > > > > choose.  Is there any interest in the feature I proposed or
> > >> should
> > >> > I
> > >> > > > just
> > >> > > > > > drop it?  It's not a lot of code, but also it's not a
> > >> requirement
> > >> > for
> > >> > > > > > anything we're working on either (the docker stuff however,
> > is).
> > >> > > > > >
> > >> > > > > > On Tue, Mar 29, 2016 at 1:39 PM, Bill Farner <
> > >> wfar...@apache.org>
> > >> > > > wrote:
> > >> > > > > >
> > >> > > > > > > That's correct - those credentials should require
> privileged
> > >> > > access.
> > >> > > > > > >
> > >> > > > > > > On Tue, Mar 29, 2016 at 10:25 AM, Steve Niemitz <
> > >> > > > > > > sniem...@twitter.com.invalid> wrote:
> > >> > > > > > >
> > >> > > > > > > > Re: ZK credential files, thats an interesting issue, I
> > >> assume
> > >> > you
> > >> > > > > don't
> > >> > > > > > > > want the role user to be able to read it either, and
> only
> > >> root
> > >> > or
> > >> > > > > some
> > >> > > > > > > > other privileged user?
> > >> > > > > > > >
> > >> > > > > > > > On Tue, Mar 29, 2016 at 12:14 PM, Erb, Stephan <
> > >> > > > > > > > stephan....@blue-yonder.com>
> > >> > > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > I am in favor of your proposal. We offer less attack
> > >> surface
> > >> > if
> > >> > > > the
> > >> > > > > > > > > executor is not running as root.
> > >> > > > > > > > >
> > >> > > > > > > > > Interesting though, this introduces another security
> > >> problem:
> > >> > > The
> > >> > > > > > > > > credentials file in the incoming Zookeeper  ACL patch
> (
> > >> > > > > > > > > https://reviews.apache.org/r/45042/) will have to be
> > >> > readable
> > >> > > by
> > >> > > > > > > > > everyone. That feels a little bit like being back to
> > >> square
> > >> > > one.
> > >> > > > > > > > > ________________________________________
> > >> > > > > > > > > From: Steve Niemitz <sniem...@apache.org>
> > >> > > > > > > > > Sent: Tuesday, March 29, 2016 17:34
> > >> > > > > > > > > To: dev@aurora.apache.org
> > >> > > > > > > > > Subject: Looking for feedback - Setting
> CommandInfo.user
> > >> by
> > >> > > > default
> > >> > > > > > > when
> > >> > > > > > > > > launching tasks.
> > >> > > > > > > > >
> > >> > > > > > > > > I've been working on some changes to how aurora
> submits
> > >> tasks
> > >> > > to
> > >> > > > > > mesos,
> > >> > > > > > > > > specifically around Docker tasks, but I'd also like to
> > see
> > >> > how
> > >> > > > > people
> > >> > > > > > > > feel
> > >> > > > > > > > > about making it more general.
> > >> > > > > > > > >
> > >> > > > > > > > > Currently, when Aurora submits a task to mesos, it
> does
> > >> NOT
> > >> > set
> > >> > > > > > > > > command.user on the ExecutorInfo, this means that
> mesos
> > >> > > > configures
> > >> > > > > > the
> > >> > > > > > > > > sandbox (mesos sandbox that is) as root, and launches
> > the
> > >> > > > executor
> > >> > > > > > > > > (thermos_executor in our case) as root as well.
> > >> > > > > > > > >
> > >> > > > > > > > > What then happens is that the executor then chown()s
> the
> > >> > > sandbox
> > >> > > > it
> > >> > > > > > > > creates
> > >> > > > > > > > > to the aurora role/user, and also setuid()s the
> runners
> > it
> > >> > > forks
> > >> > > > to
> > >> > > > > > > that
> > >> > > > > > > > > role/user.  However, the executor itself is still
> > running
> > >> as
> > >> > > > root.
> > >> > > > > > > > >
> > >> > > > > > > > > My proposal / change is to set command.user to the
> > aurora
> > >> > role
> > >> > > by
> > >> > > > > > > > default,
> > >> > > > > > > > > which will cause the executor to run as that user.
> I've
> > >> > tested
> > >> > > > > this
> > >> > > > > > > > > already, and no changes are needed to the executor, it
> > >> will
> > >> > > still
> > >> > > > > try
> > >> > > > > > > to
> > >> > > > > > > > > chown the sandbox (which is fine since it already owns
> > >> it),
> > >> > and
> > >> > > > > > > setuid()
> > >> > > > > > > > > the runners it forks (again, fine, since they're
> already
> > >> > > running
> > >> > > > as
> > >> > > > > > > that
> > >> > > > > > > > > user).
> > >> > > > > > > > >
> > >> > > > > > > > > *The controversial part of this* however is I'd like
> to
> > >> > enable
> > >> > > > this
> > >> > > > > > > > > behavior BY DEFAULT, and allow disabling it (reverting
> > to
> > >> the
> > >> > > > > current
> > >> > > > > > > > > behavior now) via a flag to the scheduler.  My
> reasoning
> > >> here
> > >> > > is
> > >> > > > > two
> > >> > > > > > > > fold.
> > >> > > > > > > > >  1) It's a more secure default, preventing a
> compromised
> > >> > > executor
> > >> > > > > > from
> > >> > > > > > > > > doing things it shouldn't, and 2) we already have a
> lot
> > of
> > >> > > "flag
> > >> > > > > > > bloat",
> > >> > > > > > > > > and flags are hard enough to discover as they are.
> > >> However,
> > >> > I
> > >> > > do
> > >> > > > > > > believe
> > >> > > > > > > > > this should be considered as a "breaking change",
> > >> > particularly
> > >> > > > > > because
> > >> > > > > > > of
> > >> > > > > > > > > finicky PEX extraction for the executor.
> > >> > > > > > > > >
> > >> > > > > > > > > I'd like to hear people's thoughts on this.
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> John Sirois
> > >> 303-512-3301
> > >>
> > >
> > >
> >
> >
> > --
> > John Sirois
> > 303-512-3301
> >
>



-- 
Cheers,

Zhitao Li

Reply via email to