[ 
https://issues.apache.org/jira/browse/MESOS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041306#comment-15041306
 ] 

Michael Park commented on MESOS-4067:
-------------------------------------

I was able to figure out one issue (not sure if there are more issues, or if 
the subsequent failures are all stemmed from this one):

{code}
  // Attempt to unreserve an invalid set of resources (not dynamically
  // reserved), reserve the second set, and launch a task.
  driver.acceptOffers({offer.id()},
      {UNRESERVE(unreserved1),
       RESERVE(dynamicallyReserved2),
       LAUNCH({taskInfo})},
      filters);

  // Wait for TASK_FINISHED update ack.
  AWAIT_READY(statusUpdateAcknowledgement);
  EXPECT_EQ(TASK_FINISHED, statusUpdateAcknowledgement.get().state());

  // In the next offer, expect to find both sets of reserved
  // resources, since the Unreserve operation should fail.
  AWAIT_READY(offers);

  ASSERT_EQ(1u, offers.get().size());
  offer = offers.get()[0];

  EXPECT_TRUE(
      Resources(offer.resources()).contains(
          dynamicallyReserved1 +
          dynamicallyReserved2 +
          unreserved2));
{code}

The intention here seems to be: Perform an {{acceptOffers}} with a sequence of 
operations including a launch task, wait until the launch task has finished and 
therefore the resources recovered. Then expect all of the available resources 
to be offered in a single offer.

The issue is that at 50ms as our {{allocation_interval}}, we can make an offer 
with the available resources while the task is being launched, running, etc. 
This premature offer is picked up by our {{EXPECT_CALL}} for {{resourceOffers}} 
and we don't meet our expectation of receiving an offer with 
{{dynamicallyReserved1 + dynamicallyReserved2 + unreserved2}}.

A few possible approaches in my preferred order:
# We may not need all of these moving parts, and possibly just use one set of 
resources instead of three. Refer to 
{{ReservationTest.ReserveAndLaunchThenUnreserve}} for an example.
# Turn allocation off {{allocation_interval=1000s}} and use {{reviveOffers}} to 
manually control the offers. Refer to 
{{ReservationEndpointsTest.ReserveAvailableAndOfferedResources}} for an example.
# Instead of a simple {{FutureArg<1>(offers)}} as the action for 
{{EXPECT_CALL}} of {{resourceOffers}}, perhaps we can aggregate them instead. 
This one feels like it could get pretty tricky.

[~greggomann], [~jieyu] What are your thoughts?

> ReservationTest.ACLMultipleOperations is flaky
> ----------------------------------------------
>
>                 Key: MESOS-4067
>                 URL: https://issues.apache.org/jira/browse/MESOS-4067
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Michael Park
>              Labels: flaky, mesosphere
>
> Observed from the CI: 
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,OS=ubuntu%3A14.04,label_exp=docker%7C%7CHadoop/1319/changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to