Are you suggesting that for every new operation we'll introduce a new capability?
On Mon, Apr 16, 2018 at 2:14 PM, Vinod Kone <vinodk...@apache.org> wrote: > Crashing the agent is definitely not a viable option IMO. > > Why can't we use agent capabilities instead of agent version and reject > such operations at master? This is one of the main reasons we introduced > the concept of framework, master, agent capabilities. > > On Mon, Apr 16, 2018 at 2:04 PM, Chun-Hung Hsiao <chhs...@apache.org> > wrote: > > > Hi all, > > > > As some might have already known, we are currently working on patches to > > implement the new GROW_VOLUME and SHRINK_VOLUME operations [1]. > > > > One problem surfaces is that, since the new operations are not supported > in > > Mesos 1.5, they will lead to an agent crash during the operation > > application > > cycle if a Mesos 1.6 master send these operations to a Mesos 1.5 agent > [2]. > > > > We are now consider two possibilities to address this compatibility > > problem: > > > > 1) The Mesos 1.6 master should check the agent's Mesos version in > > `Master::accept` [3]. Moving forward, if we add new operations in future > > Mesos > > releases, we would have code like the following: > > > > ``` > > Version slaveVersion = ...; // Get the Mesos version of the slave of the > > offer. > > switch (operation.type()) { > > ... > > case SOME_NEW_OPERATION: { > > if (slaveVersion < minVersionForSomeNewOperation) { > > ... // Drop the operation. > > } > > break; > > } > > ... > > } > > ``` > > > > Pros and cons: > > + The new operation won't go into the operation application cycle since > it > > is > > rejected in the very beginning. This means no resource metadata is > > touched. > > - Explicit slave version checks at master side make the code look not > very > > clean, > > and we will need to update this list every time we add a new operation. > > > > 2) Treat this issue as an agent crash bug. The Mesos master would forward > > the operation to the agent, regardless of the agent's Mesos version. In > the > > agent, > > we deploy and backport the following logic in `Slave::applyOperation` > [4]: > > > > ``` > > if (message.operation_info().type() == OPERATION_UNKNOWN) { > > ... // Drop the operation and trigger a re-registration or send an > > // `UpdateSlaveMessage` to force the master to update the total > > resource of > > // the slave. > > } > > ``` > > > > Pros and cons: > > + Easier to add new operations since no new logic needs to be added for > > backward > > Compability. > > - Since the old agent won't know whether the new operations are > speculative > > or not, > > a re-registration or an `UpdateSlaveMessage` is required. > > - Mesos 1.5.0 agents will still have the bug and crash when a new master > > sends a > > new operation to them. > > > > Since both options are viable and there seems to be no clear winner, we'd > > like to > > check with the community to see which convention is preferable. Please > let > > us know > > what you think. Thanks! > > > > Best, > > Chun-Hung > > > > > > [1] https://issues.apache.org/jira/browse/MESOS-4965 > > [2] > > https://github.com/apache/mesos/blob/1.5.x/src/common/protob > > uf_utils.cpp#L851 > > [3] https://github.com/apache/mesos/blob/master/src/master/maste > > r.cpp#L3899 > > [4] https://github.com/apache/mesos/blob/1.5.x/src/slave/slave.cpp#L4359 > > >