Martijn Visser created FLINK-39917:
--------------------------------------
Summary:
JobMasterTriggerSavepointITCase.testDoNotCancelJobIfSavepointFails: "Disconnect
job manager" log assertion races the async JM->RM disconnect
Key: FLINK-39917
URL: https://issues.apache.org/jira/browse/FLINK-39917
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination, Tests
Reporter: Martijn Visser
Assignee: Martijn Visser
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75865&view=results
(leg: test_cron_azure tests)
{code}
06:17:51.991 [ERROR]
org.apache.flink.runtime.jobmaster.JobMasterTriggerSavepointITCase.testDoNotCancelJobIfSavepointFails(ClusterClient)
-- Time elapsed: 0.394 s <<< FAILURE!
java.lang.AssertionError:
[not all expected events logged by
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager, logged:
[... Message=Registering job manager ..., ... Message=Registered job manager
...]]
Expecting empty but was: [Disconnect job manager .*]
at
org.apache.flink.util.JobIDLoggingUtil.assertKeyPresent(JobIDLoggingUtil.java:98)
at
org.apache.flink.runtime.jobmaster.JobMasterTriggerSavepointITCase.verifyJobIdIsLogged(JobMasterTriggerSavepointITCase.java:280)
{code}
Root cause: {{waitForDisconnect}} cancels the job and waits for the
client-visible {{CANCELED}} status, then {{verifyJobIdIsLogged}} asserts that
{{StandaloneResourceManager}} logged "Disconnect job manager ...". The
JobMaster disconnects from the ResourceManager asynchronously during shutdown,
*after* the job reports CANCELED. The run logs confirm the window: job CANCELED
at 06:17:51,115, JobMaster began stopping at 06:17:51,136, and the assertion
ran in between, capturing only the "Registering/Registered job manager" events.
Not the same failure as FLINK-37821 (closed), which addressed a different
signal in this test.
Proposed fix: in {{waitForDisconnect}}, after the CANCELED wait, additionally
wait until the RM has actually logged the disconnect event before returning. No
assertion change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)