[jira] [Commented] (MESOS-9223) Storage local provider does not sufficiently handle container launch failures or errors

Chun-Hung Hsiao (JIRA) Mon, 14 Jan 2019 15:46:32 -0800


    [ 
https://issues.apache.org/jira/browse/MESOS-9223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742591#comment-16742591
 ]


Chun-Hung Hsiao commented on MESOS-9223:
----------------------------------------

For surfacing the error:
We could pass the failure to {{LocalResourceProviderManager}} through the 
interface I proposed above, then expose that in the 
{{LIST_RESOURCE_PROVIDER_CONFIGS}} API through MESOS-8745.

> Storage local provider does not sufficiently handle container launch failures 
> or errors
> ---------------------------------------------------------------------------------------
>
>                 Key: MESOS-9223
>                 URL: https://issues.apache.org/jira/browse/MESOS-9223
>             Project: Mesos
>          Issue Type: Improvement
>          Components: agent, storage
>            Reporter: Benjamin Bannier
>            Assignee: Benjamin Bannier
>            Priority: Critical
>
> The storage local resource provider as currently implemented does not handle 
> launch failures or task errors of its standalone containers well enough, If 
> e.g., a RP container fails to come up during node start a warning would be 
> logged, but an operator still needs to detect degraded functionality, 
> manually check the state of containers with {{GET_CONTAINERS}}, and decide 
> whether the agent needs restarting; I suspect they do not have always have 
> enough context for this decision. It would be better if the provider would 
> either enforce a restart by failing over the whole agent, or by retrying the 
> operation (optionally: up to some maximum amount of retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (MESOS-9223) Storage local provider does not sufficiently handle container launch failures or errors

Reply via email to