[ 
https://issues.apache.org/jira/browse/MESOS-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5763:
----------------------------------
    Target Version/s:   (was: 0.28.3)
       Fix Version/s: 0.28.3

Backport for 0.28.x branch
{noformat}
commit 52a0b0a41482da35dc736ec2fd445b6099e7a4e7
Author: Anand Mazumdar <an...@apache.org>
Date:   Tue Nov 22 20:38:43 2016 -0800

    Added MESOS-5763 to 0.28.3 CHANGELOG.

commit 2d61bde81e3d6fb7400ec5f7078ceedd8d2bb802
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Fri Jul 1 18:12:01 2016 -0700

    Made Mesos containerizer error messages more consistent.

    We've been using slightly different wordings of the same condition in
    multiple places in Mesos containerizer but they don't provide
    additional information about where this failure is thrown in a long
    continuation chain. Since failures don't capture the location in the
    code we'd better distinguish them in a more meaningful way to assist
    debugging.

    Review: https://reviews.apache.org/r/49653

commit d7f8b8558974ee8739d460d53faf54a52832b754
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Fri Jul 1 18:11:29 2016 -0700

    Improved Mesos containerizer invariant checking.

    One of the reasons for MESOS-5763 is due to the lack invariant
    checking. Mesos containerizer transitions the container state in
    particular ways so when continuation chains could potentially be
    interleaved with other actions we should verify the state transitions.

    Review: https://reviews.apache.org/r/49652

commit 008e04433026aaec49779197c4a7b6655d5bb693
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Fri Jul 1 15:25:54 2016 -0700

    Improved Mesos containerizer logging and documentation.

    Review: https://reviews.apache.org/r/49651

commit 90b5be8e95c5868ea9142625b97050a75d0664f5
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Wed Jul 6 13:48:34 2016 -0700

    Fail container launch if it's destroyed during logger->prepare().

    Review: https://reviews.apache.org/r/49725

commit 56b4c561e08a8cc36e5cbc3a786981412bf226dd
Author: Jiang Yan Xu <xuj...@apple.com>
Date:   Fri Jul 1 15:27:37 2016 -0700

    Fixed Mesos containerizer to set container FETCHING state.

    If the container state is not properly set to FETCHING, Mesos agent
    cannot detect the terminated executor when the fetcher times out.

    Review: https://reviews.apache.org/r/49650
{noformat}

> Task stuck in fetching is not cleaned up after 
> --executor_registration_timeout.
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-5763
>                 URL: https://issues.apache.org/jira/browse/MESOS-5763
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.0, 1.0.0
>            Reporter: Yan Xu
>            Assignee: Yan Xu
>            Priority: Blocker
>             Fix For: 0.28.3, 1.0.0
>
>
> When the fetching process hangs forever due to reasons such as HDFS issues, 
> Mesos containerizer would attempt to destroy the container and kill the 
> executor after {{--executor_registration_timeout}}. However this reliably 
> fails for us: the executor would be killed by the launcher destroy and the 
> container would be destroyed but the agent would never find out that the 
> executor is terminated thus leaving the task in the STAGING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to