[ 
https://issues.apache.org/jira/browse/YUNIKORN-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064546#comment-17064546
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-30:
-----------------------------------------------

I have found the issue and have proof of the it finally:
{code}
2020-03-23T14:52:59.893+1100    DEBUG   scheduler/scheduling_application.go:529 
app reservation check   {"allocationKey": "alloc-3", "createTime": 
"2020-03-23T14:52:59.880+1100", "askAge": "13.39217ms", "reservation delay": 
"10ms"}
{code}
This was logged in a failure case for {{TestBasicScheduler}}.

This test does not set the reservation delay and it should still be set to the 
standard 2s. It seems to have picked up the setting from a previous run.

I have attached a full log to show the whole run. If the test would set the 
delay we should have seen a line like this in the log:
{code}
2020-03-23T14:52:57.772+1100    DEBUG   scheduler/scheduling_application.go:65  
Test override reservation delay {"delay": "10ms"}
{code}
That line is nowhere in the logs.

> flaky tests cause build failures on PRs
> ---------------------------------------
>
>                 Key: YUNIKORN-30
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-30
>             Project: Apache YuniKorn
>          Issue Type: Test
>          Components: test - smoke
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Blocker
>         Attachments: TestBasicScheduler_github_fail.log
>
>
> Smoke tests have been failing on PR triggered builds.
> Failures are inconsistent and linked to multiple test cases, failures in the 
> same tests can even happen in different lines of code in different runs 
> without changes:
> {code}
> 2020-03-11T04:39:40.8332236Z --- FAIL: TestSchedulerRecovery (3.07s)
> 2020-03-11T04:39:40.8340886Z ##[error]    mock_rm_callback.go:175: Failed to 
> wait for allocations, expected 4, actual 3, called from: 
> TestSchedulerRecovery in scheduler_recovery_test.go:213
> {code}
> {code}
> 2020-03-11T04:39:40.9102758Z --- FAIL: TestBasicScheduler (1.11s)
> 2020-03-11T04:39:40.9103549Z ##[error]    mock_rm_callback.go:175: Failed to 
> wait for allocations, expected 4, actual 3, called from: TestBasicScheduler 
> in scheduler_smoke_test.go:341
> {code}
> {code}
> 2020-03-06T07:17:50.4567697Z --- FAIL: TestReservationForTwoQueues (3.10s)
> 2020-03-06T07:17:50.4574239Z ##[error]    scheduler_reservation_test.go:276: 
> partition reservations are missing
> {code}
> {code}
> 2020-03-06T08:08:21.8912443Z --- FAIL: TestRemoveReservedNode (1.05s)
> 2020-03-06T08:08:21.8917559Z ##[error]    scheduler_utils.go:79: Failed to 
> wait for pending resource, expected 80, actual 60, called from: 
> TestRemoveReservedNode in scheduler_reservation_test.go:356
> {code}
> {code}
> 2020-03-04T10:42:16.5788872Z --- FAIL: TestRemoveReservedNode (0.07s)
> 2020-03-04T10:42:16.5789359Z ##[error]    scheduler_reservation_test.go:357: 
> assertion failed: 2 (int) != 1 (int): reservations missing from app
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to