[ 
https://issues.apache.org/jira/browse/MESOS-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-7487:
--------------------------------
    Description: 
Before 1.3.0, the master did not send a {{FrameworkInfo}} in the 
{{UpdateFrameworkMessage}}.
In general, this means that a pre-1.3.0 agent will not have the 
{{FrameworkInfo}} updated when
a framework changes their {{FrameworkInfo}}. In specific, if a framework 
upgrades into having
a {{PARTITION_AWARE}} capability, the 1.1.x and 1.2.x agents will not be aware 
of the update,
and incorrectly treat report {{TASK_LOST}} in some cases.

Note that the run task path is okay since the master sends the new 
{{FrameworkInfo}}.
The instances that are incorrect have the following check:

{code}
      if (!protobuf::frameworkHasCapability(
              framework->info,  // This is the one in agent memory!
              FrameworkInfo::Capability::PARTITION_AWARE))
{code}

One solution is to backport the changes to {{UpdateFrameworkMessage}} to 1.1.x 
and 1.2.x,
but only update the capabilities portion of the {{FrameworkInfo}}.

If we update the entire {{FrameworkInfo}}, 1.1.x agent will run into an issue 
where it doesn't know
how to deal with changes to {{FrameworkInfo.roles}}. Frameworks changing their 
roles is a 1.3.x feature.
Note that 1.2.x agent can handle the role changes correctly because of 
{{Resource.allocation_info}}
that was introduced in multi-role support in 1.2.x.

Refer to MESOS-7460 for the potential issue with backporting to 1.1.x.

  was:
Before 1.3.0, the master did not send a {{FrameworkInfo}} in the 
{{UpdateFrameworkMessage}}. In general, this means that a pre-1.3.0 agent will 
not have the {{FrameworkInfo}} updated when a framework changes their 
{{FrameworkInfo}}. In specific, if a framework upgrades into having a 
{{PARTITION_AWARE}} capability, the 1.1.x and 1.2.x agents will not be aware of 
the update, and incorrectly treat report {{TASK_LOST}} in some cases.

Note that the run task path is okay since the master sends the new 
{{FrameworkInfo}}. The instances that are incorrect have the following check:

{code}
      if (!protobuf::frameworkHasCapability(
              framework->info,  // This is the one in agent memory!
              FrameworkInfo::Capability::PARTITION_AWARE))
{code}

One solution is to backport the changes to {{UpdateFrameworkMessage}} to 1.1.x 
and 1.2.x, but only update the capabilities portion of the {{FrameworkInfo}}.

If we update the entire {{FrameworkInfo}}, 1.1.x agent will run into an issue 
where it doesn't know how to deal with changes to {{FrameworkInfo.roles}}. 
Frameworks changing their roles is a 1.3.x feature. Note that 1.2.x agent can 
handle the role changes correctly because of {{Resource.allocation_info}} that 
was introduced in multi-role support in 1.2.x.

Refer to MESOS-7460 for the potential issue with backporting to 1.1.x.


> A framework upgrading into PARTITION_AWARE capability will continue to 
> receive TASK_LOST on old agents.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-7487
>                 URL: https://issues.apache.org/jira/browse/MESOS-7487
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: Michael Park
>
> Before 1.3.0, the master did not send a {{FrameworkInfo}} in the 
> {{UpdateFrameworkMessage}}.
> In general, this means that a pre-1.3.0 agent will not have the 
> {{FrameworkInfo}} updated when
> a framework changes their {{FrameworkInfo}}. In specific, if a framework 
> upgrades into having
> a {{PARTITION_AWARE}} capability, the 1.1.x and 1.2.x agents will not be 
> aware of the update,
> and incorrectly treat report {{TASK_LOST}} in some cases.
> Note that the run task path is okay since the master sends the new 
> {{FrameworkInfo}}.
> The instances that are incorrect have the following check:
> {code}
>       if (!protobuf::frameworkHasCapability(
>               framework->info,  // This is the one in agent memory!
>               FrameworkInfo::Capability::PARTITION_AWARE))
> {code}
> One solution is to backport the changes to {{UpdateFrameworkMessage}} to 
> 1.1.x and 1.2.x,
> but only update the capabilities portion of the {{FrameworkInfo}}.
> If we update the entire {{FrameworkInfo}}, 1.1.x agent will run into an issue 
> where it doesn't know
> how to deal with changes to {{FrameworkInfo.roles}}. Frameworks changing 
> their roles is a 1.3.x feature.
> Note that 1.2.x agent can handle the role changes correctly because of 
> {{Resource.allocation_info}}
> that was introduced in multi-role support in 1.2.x.
> Refer to MESOS-7460 for the potential issue with backporting to 1.1.x.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to