[
https://issues.apache.org/jira/browse/MESOS-9223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chun-Hung Hsiao reassigned MESOS-9223:
--------------------------------------
Assignee: (was: Chun-Hung Hsiao)
> Storage local provider does not sufficiently handle container launch failures
> or errors
> ---------------------------------------------------------------------------------------
>
> Key: MESOS-9223
> URL: https://issues.apache.org/jira/browse/MESOS-9223
> Project: Mesos
> Issue Type: Improvement
> Components: agent, storage
> Reporter: Benjamin Bannier
> Priority: Critical
>
> The storage local resource provider as currently implemented does not handle
> launch failures or task errors of its standalone containers well enough, If
> e.g., a RP container fails to come up during node start a warning would be
> logged, but an operator still needs to detect degraded functionality,
> manually check the state of containers with {{GET_CONTAINERS}}, and decide
> whether the agent needs restarting; I suspect they do not have always have
> enough context for this decision. It would be better if the provider would
> either enforce a restart by failing over the whole agent, or by retrying the
> operation (optionally: up to some maximum amount of retries).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)