[ https://issues.apache.org/jira/browse/MESOS-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16825261#comment-16825261 ]
Joseph Wu commented on MESOS-9740: ---------------------------------- Yes. We expect the upgrade to work for most people. However, our test cluster had a relatively wide variety of tasks; and just a single bad framework, launching 1+ task on each agent, could cripple the upgrade. I should clarify that this affects 1.8.x **masters**. A 1.7.x agent _might_ have trouble registering with a 1.8.x master due to this bug. > Invalid protobuf unions in ExecutorInfo::ContainerInfo will prevent agents > from reregistering with 1.8+ masters > --------------------------------------------------------------------------------------------------------------- > > Key: MESOS-9740 > URL: https://issues.apache.org/jira/browse/MESOS-9740 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Joseph Wu > Assignee: Benno Evers > Priority: Blocker > Labels: foundations, mesosphere > > As part of MESOS-6874, the master now validates protobuf unions passed as > part of an {{ExecutorInfo::ContainerInfo}}. This prevents a task from > specifying, for example, a {{ContainerInfo::MESOS}}, but filling out the > {{docker}} field (which is then ignored by the agent). > However, if a task was already launched with an invalid protobuf union, the > same validation will happen when the agent tries to reregister with the > master. In this case, if the master is upgraded to validate protobuf unions, > the agent reregistration will be rejected. > {code} > master.cpp:7201] Dropping re-registration of agent at > slave(1)@172.31.47.126:5051 because it sent an invalid re-registration: > Protobuf union `mesos.ContainerInfo` with `Type == MESOS` should not have the > field `docker` set. > {code} > This bug was found when upgrading a 1.7.x test cluster to 1.8.0. When > MESOS-6874 was committed, I had assumed the invalid protobufs would be rare. > However, on the test cluster, 13/17 agents had at least one invalid > ContainerInfo when reregistering. -- This message was sent by Atlassian JIRA (v7.6.3#76005)