[ 
https://issues.apache.org/jira/browse/FLINK-39845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser updated FLINK-39845:
-----------------------------------
    Description: 
SavepointITCase.testStopWithSavepointFailsOverToSavepoint fails 
deterministically in the test_cron_adaptive_scheduler tests leg on master only. 
It has been red every nightly since 2026-03-21, the first nightly after the 
JUnit5 migration FLINK-39124 rewrote this file. It passes under the default 
scheduler, and passes under the adaptive scheduler on release-2.0/2.1 
pre-migration assertion).

{code:java}
java.lang.AssertionError:
Expecting a throwable with cause being an instance of:
  
org.apache.flink.runtime.scheduler.stopwithsavepoint.StopWithSavepointStoppingException
but was an instance of:
  org.apache.flink.util.FlinkException
...
  at 
org.apache.flink.test.checkpointing.SavepointITCase.testStopWithSavepointFailsOverToSavepoint(SavepointITCase.java:324)
{code}

Root cause: the migration replaced a cause-chain search 
(ExceptionUtils.assertThrowable → findThrowable) with a direct-cause check 
(assertThatThrownBy(...).hasCauseInstanceOf(StopWithSavepointStoppingException.class)).
 Under the AdaptiveScheduler, StopWithSavepoint.onLeave() wraps the expected 
StopWithSavepointStoppingException inside a FlinkException ("Stop with 
savepoint operation could not be completed."), so it is no longer the direct 
cause. The runtime is unchanged (StopWithSavepoint.java is byte-identical).

Fix: restore the chain search via FlinkAssertions.anyCauseMatches, so the test 
passes under both schedulers.

Build: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75618 
(leg: test_cron_adaptive_scheduler tests)

  was:
SavepointITCase.testStopWithSavepointFailsOverToSavepoint fails 
deterministically in the test_cron_adaptive_scheduler tests leg on master only. 
It has been red every nightly since 2026-03-21, the first nightly after the 
JUnit5 migration FLINK-39124 rewrote this file. It passes under the default 
scheduler, and passes under the adaptive scheduler on release-2.0/2.1 
pre-migration assertion).

{code:java}
java.lang.AssertionError:
Expecting a throwable with cause being an instance of:
  
org.apache.flink.runtime.scheduler.stopwithsavepoint.StopWithSavepointStoppingException
but was an instance of:
  org.apache.flink.util.FlinkException
...
  at 
org.apache.flink.test.checkpointing.SavepointITCase.testStopWithSavepointFailsOverToSavepoint(SavepointITCase.java:324)
{code}

Root cause: the migration replaced a cause-chain search 
(ExceptionUtils.assertThrowable → findThrowable) with a direct-cause check 
(assertThatThrownBy(...).hasCauseInstanceOf(StopWithSavepointStoppingException.class)).
 Under the AdaptiveScheduler, StopWithSavepoint.onLeave() wraps the expected 
StopWithSavepointStoppingException inside a FlinkException ("Stop with 
savepoint operation could not be completed."), so it is no longer the direct 
cause. The runtime is unchanged (StopWithSavepoint.java is byte-identical).

Fix: restore the chain search via FlinkAssertions.anyCauseMatches, so the test 
passes under both schedulers.


> SavepointITCase.testStopWithSavepointFailsOverToSavepoint fails on 
> AdaptiveScheduler after JUnit5 migration
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39845
>                 URL: https://issues.apache.org/jira/browse/FLINK-39845
>             Project: Flink
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 2.3.0
>            Reporter: Martijn Visser
>            Assignee: Martijn Visser
>            Priority: Major
>              Labels: test-stability
>
> SavepointITCase.testStopWithSavepointFailsOverToSavepoint fails 
> deterministically in the test_cron_adaptive_scheduler tests leg on master 
> only. It has been red every nightly since 2026-03-21, the first nightly after 
> the JUnit5 migration FLINK-39124 rewrote this file. It passes under the 
> default scheduler, and passes under the adaptive scheduler on release-2.0/2.1 
> pre-migration assertion).
> {code:java}
> java.lang.AssertionError:
> Expecting a throwable with cause being an instance of:
>   
> org.apache.flink.runtime.scheduler.stopwithsavepoint.StopWithSavepointStoppingException
> but was an instance of:
>   org.apache.flink.util.FlinkException
> ...
>   at 
> org.apache.flink.test.checkpointing.SavepointITCase.testStopWithSavepointFailsOverToSavepoint(SavepointITCase.java:324)
> {code}
> Root cause: the migration replaced a cause-chain search 
> (ExceptionUtils.assertThrowable → findThrowable) with a direct-cause check 
> (assertThatThrownBy(...).hasCauseInstanceOf(StopWithSavepointStoppingException.class)).
>  Under the AdaptiveScheduler, StopWithSavepoint.onLeave() wraps the expected 
> StopWithSavepointStoppingException inside a FlinkException ("Stop with 
> savepoint operation could not be completed."), so it is no longer the direct 
> cause. The runtime is unchanged (StopWithSavepoint.java is byte-identical).
> Fix: restore the chain search via FlinkAssertions.anyCauseMatches, so the 
> test passes under both schedulers.
> Build: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=75618 
> (leg: test_cron_adaptive_scheduler tests)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to