Are you suggesting that for every new operation we'll introduce a new
capability?

On Mon, Apr 16, 2018 at 2:14 PM, Vinod Kone <vinodk...@apache.org> wrote:

> Crashing the agent is definitely not a viable option IMO.
>
> Why can't we use agent capabilities instead of agent version and reject
> such operations at master? This is one of the main reasons we introduced
> the concept of framework, master, agent capabilities.
>
> On Mon, Apr 16, 2018 at 2:04 PM, Chun-Hung Hsiao <chhs...@apache.org>
> wrote:
>
> > Hi all,
> >
> > As some might have already known, we are currently working on patches to
> > implement the new GROW_VOLUME and SHRINK_VOLUME operations [1].
> >
> > One problem surfaces is that, since the new operations are not supported
> in
> > Mesos 1.5, they will lead to an agent crash during the operation
> > application
> > cycle if a Mesos 1.6 master send these operations to a Mesos 1.5 agent
> [2].
> >
> > We are now consider two possibilities to address this compatibility
> > problem:
> >
> > 1) The Mesos 1.6 master should check the agent's Mesos version in
> > `Master::accept` [3]. Moving forward, if we add new operations in future
> > Mesos
> > releases, we would have code like the following:
> >
> > ```
> > Version slaveVersion = ...; // Get the Mesos version of the slave of the
> > offer.
> > switch (operation.type()) {
> >   ...
> >   case SOME_NEW_OPERATION: {
> >     if (slaveVersion < minVersionForSomeNewOperation) {
> >       ... // Drop the operation.
> >     }
> >     break;
> >   }
> >   ...
> > }
> > ```
> >
> > Pros and cons:
> > + The new operation won't go into the operation application cycle since
> it
> > is
> >   rejected in the very beginning. This means no resource metadata is
> > touched.
> > - Explicit slave version checks at master side make the code look not
> very
> > clean,
> >   and we will need to update this list every time we add a new operation.
> >
> > 2) Treat this issue as an agent crash bug. The Mesos master would forward
> > the operation to the agent, regardless of the agent's Mesos version. In
> the
> > agent,
> > we deploy and backport the following logic in `Slave::applyOperation`
> [4]:
> >
> > ```
> > if (message.operation_info().type() == OPERATION_UNKNOWN) {
> >   ... // Drop the operation and trigger a re-registration or send an
> >       // `UpdateSlaveMessage` to force the master to update the total
> > resource of
> >       // the slave.
> > }
> > ```
> >
> > Pros and cons:
> > + Easier to add new operations since no new logic needs to be added for
> > backward
> >   Compability.
> > - Since the old agent won't know whether the new operations are
> speculative
> > or not,
> >   a re-registration or an `UpdateSlaveMessage` is required.
> > - Mesos 1.5.0 agents will still have the bug and crash when a new master
> > sends a
> >   new operation to them.
> >
> > Since both options are viable and there seems to be no clear winner, we'd
> > like to
> > check with the community to see which convention is preferable. Please
> let
> > us know
> > what you think. Thanks!
> >
> > Best,
> > Chun-Hung
> >
> >
> > [1] https://issues.apache.org/jira/browse/MESOS-4965
> > [2]
> > https://github.com/apache/mesos/blob/1.5.x/src/common/protob
> > uf_utils.cpp#L851
> > [3] https://github.com/apache/mesos/blob/master/src/master/maste
> > r.cpp#L3899
> > [4] https://github.com/apache/mesos/blob/1.5.x/src/slave/slave.cpp#L4359
> >
>

Reply via email to