[
https://issues.apache.org/jira/browse/FLINK-39845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martijn Visser updated FLINK-39845:
-----------------------------------
Description:
SavepointITCase.testStopWithSavepointFailsOverToSavepoint fails
deterministically in the test_cron_adaptive_scheduler tests leg on master only.
It has been red every nightly since 2026-03-21, the first nightly after the
JUnit5 migration FLINK-39124 rewrote this file. It passes under the default
scheduler, and passes under the adaptive scheduler on release-2.0/2.1
pre-migration assertion).
{code:java}
java.lang.AssertionError:
Expecting a throwable with cause being an instance of:
org.apache.flink.runtime.scheduler.stopwithsavepoint.StopWithSavepointStoppingException
but was an instance of:
org.apache.flink.util.FlinkException
...
at
org.apache.flink.test.checkpointing.SavepointITCase.testStopWithSavepointFailsOverToSavepoint(SavepointITCase.java:324)
{code}
Root cause: the migration replaced a cause-chain search
(ExceptionUtils.assertThrowable → findThrowable) with a direct-cause check
(assertThatThrownBy(...).hasCauseInstanceOf(StopWithSavepointStoppingException.class)).
Under the AdaptiveScheduler, StopWithSavepoint.onLeave() wraps the expected
StopWithSavepointStoppingException inside a FlinkException ("Stop with
savepoint operation could not be completed."), so it is no longer the direct
cause. The runtime is unchanged (StopWithSavepoint.java is byte-identical).
Fix: restore the chain search via FlinkAssertions.anyCauseMatches, so the test
passes under both schedulers.
Build:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75618
(leg: test_cron_adaptive_scheduler tests)
was:
SavepointITCase.testStopWithSavepointFailsOverToSavepoint fails
deterministically in the test_cron_adaptive_scheduler tests leg on master only.
It has been red every nightly since 2026-03-21, the first nightly after the
JUnit5 migration FLINK-39124 rewrote this file. It passes under the default
scheduler, and passes under the adaptive scheduler on release-2.0/2.1
pre-migration assertion).
{code:java}
java.lang.AssertionError:
Expecting a throwable with cause being an instance of:
org.apache.flink.runtime.scheduler.stopwithsavepoint.StopWithSavepointStoppingException
but was an instance of:
org.apache.flink.util.FlinkException
...
at
org.apache.flink.test.checkpointing.SavepointITCase.testStopWithSavepointFailsOverToSavepoint(SavepointITCase.java:324)
{code}
Root cause: the migration replaced a cause-chain search
(ExceptionUtils.assertThrowable → findThrowable) with a direct-cause check
(assertThatThrownBy(...).hasCauseInstanceOf(StopWithSavepointStoppingException.class)).
Under the AdaptiveScheduler, StopWithSavepoint.onLeave() wraps the expected
StopWithSavepointStoppingException inside a FlinkException ("Stop with
savepoint operation could not be completed."), so it is no longer the direct
cause. The runtime is unchanged (StopWithSavepoint.java is byte-identical).
Fix: restore the chain search via FlinkAssertions.anyCauseMatches, so the test
passes under both schedulers.
> SavepointITCase.testStopWithSavepointFailsOverToSavepoint fails on
> AdaptiveScheduler after JUnit5 migration
> -----------------------------------------------------------------------------------------------------------
>
> Key: FLINK-39845
> URL: https://issues.apache.org/jira/browse/FLINK-39845
> Project: Flink
> Issue Type: Bug
> Components: Tests
> Affects Versions: 2.3.0
> Reporter: Martijn Visser
> Assignee: Martijn Visser
> Priority: Major
> Labels: test-stability
>
> SavepointITCase.testStopWithSavepointFailsOverToSavepoint fails
> deterministically in the test_cron_adaptive_scheduler tests leg on master
> only. It has been red every nightly since 2026-03-21, the first nightly after
> the JUnit5 migration FLINK-39124 rewrote this file. It passes under the
> default scheduler, and passes under the adaptive scheduler on release-2.0/2.1
> pre-migration assertion).
> {code:java}
> java.lang.AssertionError:
> Expecting a throwable with cause being an instance of:
>
> org.apache.flink.runtime.scheduler.stopwithsavepoint.StopWithSavepointStoppingException
> but was an instance of:
> org.apache.flink.util.FlinkException
> ...
> at
> org.apache.flink.test.checkpointing.SavepointITCase.testStopWithSavepointFailsOverToSavepoint(SavepointITCase.java:324)
> {code}
> Root cause: the migration replaced a cause-chain search
> (ExceptionUtils.assertThrowable → findThrowable) with a direct-cause check
> (assertThatThrownBy(...).hasCauseInstanceOf(StopWithSavepointStoppingException.class)).
> Under the AdaptiveScheduler, StopWithSavepoint.onLeave() wraps the expected
> StopWithSavepointStoppingException inside a FlinkException ("Stop with
> savepoint operation could not be completed."), so it is no longer the direct
> cause. The runtime is unchanged (StopWithSavepoint.java is byte-identical).
> Fix: restore the chain search via FlinkAssertions.anyCauseMatches, so the
> test passes under both schedulers.
> Build:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75618
> (leg: test_cron_adaptive_scheduler tests)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)