[ 
https://issues.apache.org/jira/browse/MESOS-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825261#comment-16825261
 ] 

Joseph Wu commented on MESOS-9740:
----------------------------------

Yes.  We expect the upgrade to work for most people.  However, our test cluster 
had a relatively wide variety of tasks; and just a single bad framework, 
launching 1+ task on each agent, could cripple the upgrade.

I should clarify that this affects 1.8.x **masters**.  A 1.7.x agent _might_ 
have trouble registering with a 1.8.x master due to this bug.

> Invalid protobuf unions in ExecutorInfo::ContainerInfo will prevent agents 
> from reregistering with 1.8+ masters
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9740
>                 URL: https://issues.apache.org/jira/browse/MESOS-9740
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Joseph Wu
>            Assignee: Benno Evers
>            Priority: Blocker
>              Labels: foundations, mesosphere
>
> As part of MESOS-6874, the master now validates protobuf unions passed as 
> part of an {{ExecutorInfo::ContainerInfo}}.  This prevents a task from 
> specifying, for example, a {{ContainerInfo::MESOS}}, but filling out the 
> {{docker}} field (which is then ignored by the agent).
> However, if a task was already launched with an invalid protobuf union, the 
> same validation will happen when the agent tries to reregister with the 
> master.  In this case, if the master is upgraded to validate protobuf unions, 
> the agent reregistration will be rejected.
> {code}
> master.cpp:7201] Dropping re-registration of agent at 
> slave(1)@172.31.47.126:5051 because it sent an invalid re-registration: 
> Protobuf union `mesos.ContainerInfo` with `Type == MESOS` should not have the 
> field `docker` set.
> {code}
> This bug was found when upgrading a 1.7.x test cluster to 1.8.0.  When 
> MESOS-6874 was committed, I had assumed the invalid protobufs would be rare.  
> However, on the test cluster, 13/17 agents had at least one invalid 
> ContainerInfo when reregistering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to