Matthias Pohl created FLINK-26391: ------------------------------------- Summary: Release Testing: Application Mode recovery does not re-trigger a job which failed during cleanup (FLINK-11813) Key: FLINK-26391 URL: https://issues.apache.org/jira/browse/FLINK-26391 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Matthias Pohl Fix For: 1.15.0
FLINK-11813 is about not being able to determine whether a job has been terminated globally before a failover happened. Testing this behavior can be achieved by running a job in HA mode to enable the file-based {{JobResultStore}} (JRS). You can specify [job-result-store.storage-path|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#job-result-store-storage-path] to point to a directory which you can access. [job-result-store.delete-on-commit|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#job-result-store-delete-on-commit] can be used to make the JRS artifacts not being deleted after a job finished. You can make a job finish to generate a the JRS artifact for this job in the specified directory. Renaming the generated file from {{<job-id>.json}} to {{<job-id>_DIRTY.json}} will simulate the job not being cleaned up properly. Starting the job in application mode once more (through specifying the corresponding Job ID) should lead to the job not being started again (you might want to enable {{debug}} logging to verify the logs), i.e.: * Cleanup should be performed. * No JobMaster-related logs should appear in the Flink logs. * cleanup-related logs should appear in the Flink logs. * At the end, the {{_DIRTY.json}} file extension should have been removed from the JRS artifact again -- This message was sent by Atlassian Jira (v8.20.1#820001)