[ https://issues.apache.org/jira/browse/MESOS-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anand Mazumdar updated MESOS-5763: ---------------------------------- Target Version/s: (was: 0.28.3) Fix Version/s: 0.28.3 Backport for 0.28.x branch {noformat} commit 52a0b0a41482da35dc736ec2fd445b6099e7a4e7 Author: Anand Mazumdar <an...@apache.org> Date: Tue Nov 22 20:38:43 2016 -0800 Added MESOS-5763 to 0.28.3 CHANGELOG. commit 2d61bde81e3d6fb7400ec5f7078ceedd8d2bb802 Author: Jiang Yan Xu <xuj...@apple.com> Date: Fri Jul 1 18:12:01 2016 -0700 Made Mesos containerizer error messages more consistent. We've been using slightly different wordings of the same condition in multiple places in Mesos containerizer but they don't provide additional information about where this failure is thrown in a long continuation chain. Since failures don't capture the location in the code we'd better distinguish them in a more meaningful way to assist debugging. Review: https://reviews.apache.org/r/49653 commit d7f8b8558974ee8739d460d53faf54a52832b754 Author: Jiang Yan Xu <xuj...@apple.com> Date: Fri Jul 1 18:11:29 2016 -0700 Improved Mesos containerizer invariant checking. One of the reasons for MESOS-5763 is due to the lack invariant checking. Mesos containerizer transitions the container state in particular ways so when continuation chains could potentially be interleaved with other actions we should verify the state transitions. Review: https://reviews.apache.org/r/49652 commit 008e04433026aaec49779197c4a7b6655d5bb693 Author: Jiang Yan Xu <xuj...@apple.com> Date: Fri Jul 1 15:25:54 2016 -0700 Improved Mesos containerizer logging and documentation. Review: https://reviews.apache.org/r/49651 commit 90b5be8e95c5868ea9142625b97050a75d0664f5 Author: Jiang Yan Xu <xuj...@apple.com> Date: Wed Jul 6 13:48:34 2016 -0700 Fail container launch if it's destroyed during logger->prepare(). Review: https://reviews.apache.org/r/49725 commit 56b4c561e08a8cc36e5cbc3a786981412bf226dd Author: Jiang Yan Xu <xuj...@apple.com> Date: Fri Jul 1 15:27:37 2016 -0700 Fixed Mesos containerizer to set container FETCHING state. If the container state is not properly set to FETCHING, Mesos agent cannot detect the terminated executor when the fetcher times out. Review: https://reviews.apache.org/r/49650 {noformat} > Task stuck in fetching is not cleaned up after > --executor_registration_timeout. > ------------------------------------------------------------------------------- > > Key: MESOS-5763 > URL: https://issues.apache.org/jira/browse/MESOS-5763 > Project: Mesos > Issue Type: Bug > Components: containerization > Affects Versions: 0.28.0, 1.0.0 > Reporter: Yan Xu > Assignee: Yan Xu > Priority: Blocker > Fix For: 0.28.3, 1.0.0 > > > When the fetching process hangs forever due to reasons such as HDFS issues, > Mesos containerizer would attempt to destroy the container and kill the > executor after {{--executor_registration_timeout}}. However this reliably > fails for us: the executor would be killed by the launcher destroy and the > container would be destroyed but the agent would never find out that the > executor is terminated thus leaving the task in the STAGING state forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)