I like the idea of a cluster event stream very much! With this feature
implemented, we will be able to gather cluster health status, various
statistics for on-the-fly or offline analysis. AFAIK, the only way to
gather some stats now is to parse master and slave logs. As it looks for
me, the event stream will make Mesos more friendly to its users: framework
writers, SREs and so on.

Though an event stream may use hooks internally, I would still distinguish
between these features. While an event stream is a read-only interface with
performance overhead close to 0, hooks can modify cluster state (e.g.
passing TaskInfo messages) and may significantly impact performance or add
to master / slave complexity, as Ben pointed in (B).

> (B) I assume this also means that there is a side-effect inducing "action"
> that is performed, in addition to the transformation. I wouldn't be able
to
> do any expensive or asynchronous work through these, unless we made them
> return Futures. At which point, we would need some additional semantics
> (e.g. ordering), and we'd be adding complexity to the Master.

I would propose to make hooks synchronous in order to keep the code simple.
To protect from performance issues introduced by heavy hooks, we can
enable/disable hooks at the compile time (similar to assert), i.e. if the
code is compiled without -DENABLE_HOOKS, no code related to hooks support
will land in the object files and modules with hook payload will not be
loaded. Thus we delegate responsibility for heavy hooks to Mesos users who
use them for their workflows.


On Sat, Nov 22, 2014 at 1:56 AM, Niklas Nielsen <[email protected]>
wrote:

> First off, thanks for all the comments!
> I really appreciate it and am excited about where we get with this effort.
> Let me see if I can answer your questions (best-effort inlined).
>
> On 21 November 2014 01:11, Tom Arnfeld <[email protected]> wrote:
>
> > This all sounds really great, and opens up some interesting opportunities
> > for automated service discovery (well, the announcement side) for a
> cluster
> > which is what we've been looking into for a while.
> >
> >
> >
> >
> > Correct me if I'm wrong, but would it be possible to make use of the
> > master log to achieve an event stream? I'm not entirely sure what's
> stored
> > in the shared master transaction log but I'm assume some state about
> tasks
> > etc? If there were to be a stream of events, it'd be great to support
> > rewinding and replaying for some period of time better allow for HA
> stream
> > consumers.
> >
>
> See comment below. But yes - service discovery systems could definitely
> leverage hooks.
>
>
> >
> >
> >
> >
> > Either way, hooks would be a welcomed feature for us!
> >
> >
> > --
> >
> >
> > Tom Arnfeld
> >
> > Developer // DueDil
> >
> >
> >
> >
> >
> > (+44) 7525940046
> >
> > 25 Christopher Street, London, EC2A 2BS
> >
> > On Fri, Nov 21, 2014 at 6:44 AM, Vinod Kone <[email protected]> wrote:
> >
> > > Good points Ben.
> > > Also, I've been recently thinking about an events endpoint (not to
> > confuse
> > > with the Event/Call API) that could stream all kinds of events
> happening
> > > the cluster (master events, allocator events, gc events, slave events,
> > > containerizer events etc). In fact this could probably be exposed by
> > > libprocess very easily. I was mainly thinking about this in terms of
> > > auditing. Having such an endpoint would allow external tooling to
> "hook"
> > > into that endpoint and consume the event stream. The tooling could then
> > > perform arbitrary actions *without interfering* with mesos control
> flow.
> > I
> > > think such an architecture would be powerful because it is generic and
> > > non-invasive. Have you considered that approach?
> >
>
> Ben, Vinod: A cluster event stream sounds like an awesome idea!
> I have previously hacked together post-mortem log analysis to determine
> workload profiles. That could be done online (!)
> That aside, our use-case involves hanging meta-data off the task with
> labels which we cannot do with an event stream alone.
> The metadata we need is produced by a 3rd party security infrastructure
> which we invoke and use when setting up the executor environment in the
> slave.
> We actually only need the pre hook / filter mechanism to do this, but
> wanted to come up with a generalized solution.
>
> In my mind, the ideas of hooks and event streams are not mutually
> exclusive.
> The event stream could use all the insertion points of hooks (and
> vice-versa).
>
>
>
> > > On Thu, Nov 20, 2014 at 10:24 PM, Benjamin Mahler <
> > [email protected]
> > >> wrote:
> > >> Thanks for sending this Nik!
> > >>
> > >> The general idea of hooks sounds good. I think the question for hooks
> is
> > >> about which extensibility points make sense, and I think we'll have to
> > >> assess that with the introduction of each hook.
> > >>
> > >> (1) Is the idea behind hooks about actions, as you initially
> mentioned?
> > Or
> > >> is it about data transformation, which is what is shown in the API
> > example?
> > >> Or both?
> >
>
> Both.
>
> To Tom's point: service discovery systems with hooks could both 1) be
> notified when tasks are launched in a push-like fashion and 2) read from
> and alter the task info (for example with labels)
>
> We wanted to aim for flexibility. Similar to web server hooks, they can
> purposely change the behavior of request handling.
> If it cannot interact or influence the task sequence, it isn't a hook but
> rather a probe (similar to DTrace probes).
>
>
> > >>
> > >> (2) Is external tooling meant to describe hooks? Or is it meant to
> > describe
> > >> external tools that can leverage the hooks? This part is a bit fuzzy
> to
> > me.
> > >>
> >
>
> Hooks are defined by us and implementations can be provided module writers.
>
> Similar to dtrace probes, kernel developers chose interesting insertion
> points - some
> specific, others generic (where filters can be applied).
>
>
>
> > >> (3) Is instrumentation meant to allow us to gain visibility into
> things
> > >> like performance? If so, hooks might not be the most maintainable
> > approach
> > >> for that. Ideally we could add instrumentation into libprocess. Are
> > there
> > >> other forms of instrumentation in mind?
> >
>
> Instrumentation in libprocess is one thing (being able to analyze
> bandwidth/latency and message throughput/distribution - which would be
> pretty awesome).
> There should be plenty of non-libprocess code which gives insight into the
> task/status update life-cycle.
>
> Hooks would allow local aggregation of high-frequency events where you want
> to run user-defined code.
>
>
> > >>
> > >> Let's take the hook example you showed:
> > >>
> > >>  // Performs an action and/or transforms the TaskInfo.
> > >>  virtual TaskInfo preMasterLaunchTask(const TaskInfo& task) = 0;
> > >>  virtual TaskInfo postMasterLaunchTask(const TaskInfo& task) = 0;
> > >>  virtual TaskInfo preSlaveLaunchTask(const TaskInfo& task) = 0;
> > >>  virtual TaskInfo postSlaveLaunchTask(const TaskInfo& task) = 0;
> > >>
> > >> Comment mine. This interface suggests synchronous transformation of
> > >> TaskInfo objects:
> > >>
> > >> (A) A transformation of TaskInfo seems a bit surprising to me, how can
> > one
> > >> do this generically? Is the idea that this would be customized per
> > >> framework within the hook? How would one differentiate the frameworks?
> > Via
> > >> role? This part seems fuzzy to me.
> >
>
> That was an oversimplified API. The arguments could/should match the
> parameters passed to Master::launchTask()
> for example. The hook runs in the thread and context, so we can share state
> with the called environment.
> The return argument could be a tuple with all incoming parameter types,
> taken these usually are const.
>
>
> >
> >
> >>
> > >> (B) I assume this also means that there is a side-effect inducing
> > "action"
> > >> that is performed, in addition to the transformation. I wouldn't be
> > able to
> > >> do any expensive or asynchronous work through these, unless we made
> them
> > >> return Futures. At which point, we would need some additional
> semantics
> > >> (e.g. ordering), and we'd be adding complexity to the Master.
> >
>
> Maybe only entry points, so they effectively before filters, makes sense
> (to avoid complexity of post actions being executed on arbitrary places
> and/or on scope exit (which could be one of many places and hard to reason
> about).
>
>
> > >>
> > >> (C) What differentiates pre and post in this case? Sending the
> message?
> > >> Let's consider that these are responsible for performing "actions".
> Then
> > >> differentiating pre and post seems a bit arbitrary, since the sending
> > of a
> > >> message is asynchronous. This means that the "action" occurs after the
> > >> message has been handed to libprocess, but not before it is sent to
> the
> > >> socket, not before it is sent over the wire, not before it is received
> > by
> > >> the slave, etc. Seems like an odd distinction, no?
> >
>
> See comment above.
>
>
> > >>
> > >> Looking forward to hearing more, thanks Nik!
> > >>
> > >> FYI I'm about to go on vacation, so I'm going to be slow at email. :)
> > >>
> > >> On Thu, Nov 20, 2014 at 10:07 AM, Dominic Hamon <
> > [email protected]>
> > >> wrote:
> > >>
> > >> > Do you have specific use cases in mind? Ie, specific actions that
> > might
> > >> > take place pre and post launch?
> > >> >
> > >> > On Thu, Nov 20, 2014 at 9:37 AM, Niklas Nielsen <
> [email protected]
> > >
> > >> > wrote:
> > >> >
> > >> > > Hi everyone,
> > >> > >
> > >> > >
> > >> > > As a part of our current sprint at Mesosphere, we are striving to
> > work
> > >> on
> > >> > > and land an extension to the modules subsystem which we (per
> > >> > > https://issues.apache.org/jira/browse/MESOS-2060) have referred
> to
> > as
> > >> > > ‘hooks’. We wanted to give some background to this feature and
> will
> > be
> > >> > > asking for input to the proposal.
> > >> > >
> > >> > > The term is inspired by Apache Web Server hooks (
> > >> > > http://httpd.apache.org/docs/2.2/developer/hooks.html) which
> allows
> > >> > > modules
> > >> > > to tie into the request processing life-cycle. It is different
> from
> > the
> > >> > > existing modules capability, in that the usual request processing
> > >> remains
> > >> > > untouched (and isn’t replaced fully as a regular module would do).
> > >> > >
> > >> > > In our case, we are interested in being able to tie into the
> > life-cycle
> > >> > of
> > >> > > tasks to run pre and post-actions during task launch in the master
> > and
> > >> > > slave processes. In general, it adds capability for all sorts of
> > >> external
> > >> > > tooling and instrumentation.
> > >> > > The main idea is to enable modules to register themselves as hook
> > >> > > providers. For example through a new flag: --hooks=”module_name1,
> > >> > > module_name2, ...”
> > >> > >
> > >> > > A new ‘HookManager’ will query each module and get an object back
> of
> > >> > type ‘
> > >> > > Hooks’ which has virtual member functions which points to the
> > desired
> > >> > > callbacks in the module.
> > >> > >
> > >> > >
> > >> > > For example,
> > >> > >
> > >> > > class Hooks {
> > >> > >
> > >> > > public:
> > >> > >
> > >> > >  virtual TaskInfo preMasterLaunchTask(TaskInfo task) = 0;
> > >> > >
> > >> > >  virtual TaskInfo postMasterLaunchTask(TaskInfo task) = 0;
> > >> > >
> > >> > >  virtual TaskInfo preSlaveLaunchTask(TaskInfo task) = 0;
> > >> > >
> > >> > >  virtual TaskInfo postSlaveLaunchTask(TaskInfo task) = 0;
> > >> > >
> > >> > >  // ...
> > >> > >
> > >> > > };
> > >> > >
> > >> > > An example of the call site in Mesos could be:
> > >> > >
> > >> > > Master::launchTask(..., TaskInfo task, ...)
> > >> > >
> > >> > > {
> > >> > >
> > >> > >  task = HookManager::preMasterLaunchTask(task);
> > >> > >
> > >> > >  ...
> > >> > >
> > >> > >  task = HookManager::postMasterLaunchTask(task);
> > >> > >
> > >> > > }
> > >> > >
> > >> > > We are not tied at all to how the hooks will be named (they could
> > >> > > potentially live in Master, Slave, Allocator, ...) subclasses,
> > return
> > >> > Try,
> > >> > > Option, Result to indicate failure and so on.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Introducing the hook functionality is similar to what we’ve done
> in
> > the
> > >> > > past with Isolators for the MesosContainerizer that enables people
> > to
> > >> > > provide new functionality for launching containers. In that same
> > way,
> > >> we
> > >> > > want people to be able to provide new functionality with respect
> to
> > >> > > launching tasks without changing the existing task flow.
> > >> > >
> > >> > >
> > >> > > We’d love to get people’s feedback so we can move forward!
> > >> > >
> > >> > >
> > >> > > Thanks,
> > >> > > Niklas
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Dominic Hamon | @mrdo | Twitter
> > >> > *There are no bad ideas; only good ideas that go horribly wrong.*
> > >> >
> > >>
> >
>
> Let's keep the discussion going :-)
>

Reply via email to