[jira] [Commented] (MESOS-4059) Investigate remaining flakiness in MasterMaintenanceTest.InverseOffersFilters

Joris Van Remoortere (JIRA) Fri, 04 Dec 2015 09:33:11 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041812#comment-15041812
 ]


Joris Van Remoortere commented on MESOS-4059:
---------------------------------------------

{code}
commit fe4be25fa6011787751547b06f70676fd79bb87b
Author: Neil Conway <neil.con...@gmail.com>
Date:   Fri Dec 4 11:54:18 2015 -0500

    Fixed flakiness in MasterMaintenanceTest.InverseOffersFilters.
    
    There were two problems:
    
    (1) After launching two tasks, we assumed that we would see TASK_RUNNING
        updates for the tasks in the same order they were launched. This is
        not guaranteed, so adjust the test to handle TASK_RUNNING updates in
        the order they are received.
    
    (2) The test used this pattern:
    
            Mesos m;
            Call c;
    
            m.send(c);
            Clock::settle();
            // Trigger a new batch allocation that reflects the call
            Clock::advance();
    
        However, this is actually unsafe (see MESOS-3760): the send() call
        might not have reached the master by the time `Clock::settle()`
        happens. This was fixed by blocking using `FUTURE_DISPATCH` on the
        downstream logic in the allocator that is invoked to handle the
        delivered event.
    
    Review: https://reviews.apache.org/r/40935
{code}

> Investigate remaining flakiness in MasterMaintenanceTest.InverseOffersFilters
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-4059
>                 URL: https://issues.apache.org/jira/browse/MESOS-4059
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>            Priority: Minor
>              Labels: flaky-test, mesosphere
>
> Per comments in MESOS-3916, the fix for that issue decreased the degree of 
> flakiness, but it seems that some intermittent test failures do occur -- 
> should be investigated.
> *Flakiness in task acknowledgment*
> {code}
> I1203 18:25:04.609817 28732 status_update_manager.cpp:392] Received status 
> update acknowledgement (UUID: 6afd012e-8e88-41b2-8239-a9b852d07ca1) for task 
> 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework 
> c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000
> W1203 18:25:04.610076 28732 status_update_manager.cpp:762] Unexpected status 
> update acknowledgement (received 6afd012e-8e88-41b2-8239-a9b852d07ca1, 
> expecting 82fc7a7b-e64a-4f4d-ab74-76abac42b4e6) for update TASK_RUNNING 
> (UUID: 82fc7a7b-e64a-4f4d-ab74-76abac42b4e6) for task 
> 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework 
> c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000
> E1203 18:25:04.610339 28736 slave.cpp:2339] Failed to handle status update 
> acknowledgement (UUID: 6afd012e-8e88-41b2-8239-a9b852d07ca1) for task 
> 26305fdd-edb0-4764-8b8a-2558f2b2d81b of framework 
> c7900911-cc7a-4dde-92e7-48fe82cddd9e-0000: Duplicate acknowledgemen
> {code}
> This is a race between [launching and acknowledging two 
> tasks|https://github.com/apache/mesos/blob/75aaaacb89fa961b249c9ab7fa0f45dfa9d415a5/src/tests/master_maintenance_tests.cpp#L1486-L1517].
>   The status updates for each task are not necessarily received in the same 
> order as launching the tasks.
> *Flakiness in first inverse offer filter*
> See [this comment in 
> MESOS-3916|https://issues.apache.org/jira/browse/MESOS-3916?focusedCommentId=15027478&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15027478]
>  for the explanation.  The related logs are above the comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4059) Investigate remaining flakiness in MasterMaintenanceTest.InverseOffersFilters

Reply via email to