[ https://issues.apache.org/jira/browse/YUNIKORN-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064546#comment-17064546 ]
Wilfred Spiegelenburg commented on YUNIKORN-30: ----------------------------------------------- I have found the issue and have proof of the it finally: {code} 2020-03-23T14:52:59.893+1100 DEBUG scheduler/scheduling_application.go:529 app reservation check {"allocationKey": "alloc-3", "createTime": "2020-03-23T14:52:59.880+1100", "askAge": "13.39217ms", "reservation delay": "10ms"} {code} This was logged in a failure case for {{TestBasicScheduler}}. This test does not set the reservation delay and it should still be set to the standard 2s. It seems to have picked up the setting from a previous run. I have attached a full log to show the whole run. If the test would set the delay we should have seen a line like this in the log: {code} 2020-03-23T14:52:57.772+1100 DEBUG scheduler/scheduling_application.go:65 Test override reservation delay {"delay": "10ms"} {code} That line is nowhere in the logs. > flaky tests cause build failures on PRs > --------------------------------------- > > Key: YUNIKORN-30 > URL: https://issues.apache.org/jira/browse/YUNIKORN-30 > Project: Apache YuniKorn > Issue Type: Test > Components: test - smoke > Reporter: Wilfred Spiegelenburg > Assignee: Wilfred Spiegelenburg > Priority: Blocker > Attachments: TestBasicScheduler_github_fail.log > > > Smoke tests have been failing on PR triggered builds. > Failures are inconsistent and linked to multiple test cases, failures in the > same tests can even happen in different lines of code in different runs > without changes: > {code} > 2020-03-11T04:39:40.8332236Z --- FAIL: TestSchedulerRecovery (3.07s) > 2020-03-11T04:39:40.8340886Z ##[error] mock_rm_callback.go:175: Failed to > wait for allocations, expected 4, actual 3, called from: > TestSchedulerRecovery in scheduler_recovery_test.go:213 > {code} > {code} > 2020-03-11T04:39:40.9102758Z --- FAIL: TestBasicScheduler (1.11s) > 2020-03-11T04:39:40.9103549Z ##[error] mock_rm_callback.go:175: Failed to > wait for allocations, expected 4, actual 3, called from: TestBasicScheduler > in scheduler_smoke_test.go:341 > {code} > {code} > 2020-03-06T07:17:50.4567697Z --- FAIL: TestReservationForTwoQueues (3.10s) > 2020-03-06T07:17:50.4574239Z ##[error] scheduler_reservation_test.go:276: > partition reservations are missing > {code} > {code} > 2020-03-06T08:08:21.8912443Z --- FAIL: TestRemoveReservedNode (1.05s) > 2020-03-06T08:08:21.8917559Z ##[error] scheduler_utils.go:79: Failed to > wait for pending resource, expected 80, actual 60, called from: > TestRemoveReservedNode in scheduler_reservation_test.go:356 > {code} > {code} > 2020-03-04T10:42:16.5788872Z --- FAIL: TestRemoveReservedNode (0.07s) > 2020-03-04T10:42:16.5789359Z ##[error] scheduler_reservation_test.go:357: > assertion failed: 2 (int) != 1 (int): reservations missing from app > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org