[ https://issues.apache.org/jira/browse/MESOS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041812#comment-15041812 ]
Joris Van Remoortere commented on MESOS-4059: --------------------------------------------- {code} commit fe4be25fa6011787751547b06f70676fd79bb87b Author: Neil Conway <neil.con...@gmail.com> Date: Fri Dec 4 11:54:18 2015 -0500 Fixed flakiness in MasterMaintenanceTest.InverseOffersFilters. There were two problems: (1) After launching two tasks, we assumed that we would see TASK_RUNNING updates for the tasks in the same order they were launched. This is not guaranteed, so adjust the test to handle TASK_RUNNING updates in the order they are received. (2) The test used this pattern: Mesos m; Call c; m.send(c); Clock::settle(); // Trigger a new batch allocation that reflects the call Clock::advance(); However, this is actually unsafe (see MESOS-3760): the send() call might not have reached the master by the time `Clock::settle()` happens. This was fixed by blocking using `FUTURE_DISPATCH` on the downstream logic in the allocator that is invoked to handle the delivered event. Review: https://reviews.apache.org/r/40935 {code} > Investigate remaining flakiness in MasterMaintenanceTest.InverseOffersFilters > ----------------------------------------------------------------------------- > > Key: MESOS-4059 > URL: https://issues.apache.org/jira/browse/MESOS-4059 > Project: Mesos > Issue Type: Bug > Reporter: Neil Conway > Assignee: Neil Conway > Priority: Minor > Labels: flaky-test, mesosphere > > Per comments in MESOS-3916, the fix for that issue decreased the degree of > flakiness, but it seems that some intermittent test failures do occur -- > should be investigated. > *Flakiness in task acknowledgment* > {code} > I1203 18:25:04.609817 28732 status_update_manager.cpp:392] Received status > update acknowledgement (UUID: 6afd012e-8e88-41b2-8239-a9b852d07ca1) for task > 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework > c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000 > W1203 18:25:04.610076 28732 status_update_manager.cpp:762] Unexpected status > update acknowledgement (received 6afd012e-8e88-41b2-8239-a9b852d07ca1, > expecting 82fc7a7b-e64a-4f4d-ab74-76abac42b4e6) for update TASK_RUNNING > (UUID: 82fc7a7b-e64a-4f4d-ab74-76abac42b4e6) for task > 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework > c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000 > E1203 18:25:04.610339 28736 slave.cpp:2339] Failed to handle status update > acknowledgement (UUID: 6afd012e-8e88-41b2-8239-a9b852d07ca1) for task > 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework > c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000: Duplicate acknowledgemen > {code} > This is a race between [launching and acknowledging two > tasks|https://github.com/apache/mesos/blob/75aaaacb89fa961b249c9ab7fa0f45dfa9d415a5/src/tests/master_maintenance_tests.cpp#L1486-L1517]. > The status updates for each task are not necessarily received in the same > order as launching the tasks. > *Flakiness in first inverse offer filter* > See [this comment in > MESOS-3916|https://issues.apache.org/jira/browse/MESOS-3916?focusedCommentId=15027478&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15027478] > for the explanation. The related logs are above the comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)