[
https://issues.apache.org/jira/browse/MESOS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gavin updated MESOS-8828:
-------------------------
Comment: was deleted
(was: www.rtat.net)
> Clock::advance can race with process::delay in tests.
> -----------------------------------------------------
>
> Key: MESOS-8828
> URL: https://issues.apache.org/jira/browse/MESOS-8828
> Project: Mesos
> Issue Type: Bug
> Reporter: Andrei Budnik
> Priority: Major
> Labels: flaky, integration, mesosphere
> Attachments: failed_tests.txt
>
>
> There are lots of tests that use the following pattern:
> 1) [Pause
> clocks|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1108]
> 2) [Start an
> agent|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1122]
> 3) [Advance clocks to trigger an
> event|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1125]
> 4) [Wait for the
> event|https://github.com/apache/mesos/blob/c662048ae365630e3249b51102c9f7f962cc24d3/src/tests/persistent_volume_tests.cpp#L1127]
> If an event is scheduled via `process::delay()` after advancing the clocks,
> then a test hangs in the endless wait for the event that is never triggered,
> because libprocess clocks are paused. For example,
> `DiskResource/PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy/0`
> test hangs at step 4, because the clocks at step 3 has been already advanced
> before the agent scheduled a call of
> [Slave::authenticate()|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1301]
> method. After a successful authentication with a master, the agent sends a
> [UpdateSlaveMessage|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1546-L1550].
> But the authentication process never finishes because
> `[Slave::authenticate()|https://github.com/apache/mesos/blob/ebe92c9b39933136968e4ba3a52527e52b361d22/src/slave/slave.cpp#L1301]`
> is never called.
> A list of tests that might be affected by the issue attached to this ticket.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)