[ 
https://issues.apache.org/jira/browse/MESOS-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990141#comment-13990141
 ] 

Till Toenshoff edited comment on MESOS-1243 at 5/6/14 12:38 AM:
----------------------------------------------------------------

Recovery:
Right now {{recover}} is not container or executor specific, hence it shouldn't 
fail just because a single one wasn't recoverable for any reason.

Let me draft this from the ExternalContainerizer's point of view in a failure 
scenario;
Slave invokes {{launch}} and the EC tries to pass this on to the ECP. Now 
assume the slave dies prior to the ECP actually being able to launch anything. 
After a {{recover}} the slave now assumes that the ECP will be able to {{wait}} 
on that container. The ECP however never {{launch}} ed that container, hence it 
is unable to {{wait}}, thus is unable to return a {{Termination}}.

So the problem here has to be seen specifically minding that the ECP and the 
slave may have differing status.

The quick way out of this is to allow that {{Termination}} to be optional. 
Another way may be to make sure that the container is only checkpointed after a 
fully achieved launch?


was (Author: tillt):
Recovery:
Right now {{recover}} is not container or executor specific, hence it shouldn't 
fail just because a single one wasn't recoverable for any reason.

Let me draft this from the ExternalContainerizer's point of view in a failure 
scenario;
Slave invokes {{launch}} and the EC tries to pass this on to the ECP. Now 
assume the slave dies prior to the ECP actually being able to launch anything. 
After a {{recover}} the slave now assumes that the ECP will be able to {{wait}} 
on that container. The ECP however never {{launch}}ed that container, hence it 
is unable to {{wait}}, thus is unable to return a {{Termination}}.

So the problem here has to be seen specifically minding that the ECP and the 
slave may have differing status.

The quick way out of this is to allow that {{Termination}} to be optional. 
Another way may be to make sure that the container is only checkpointed after a 
fully achieved launch?

> Containerizer::wait return type should be Option<Termination>
> -------------------------------------------------------------
>
>                 Key: MESOS-1243
>                 URL: https://issues.apache.org/jira/browse/MESOS-1243
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Till Toenshoff
>            Priority: Minor
>              Labels: containerizer, external-containerizer, isolation, mesos, 
> mesos-containerizer
>
> The containerizer {{wait}} should return an {{Option<Termination>}} to 
> distinguish the case when it doesn't know about a {{ContainerID}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to