[PR] Clear stale savepointTriggerId on B/G transition abort [flink-kubernetes-operator]

via GitHub Wed, 18 Mar 2026 23:36:38 -0700


jennifer-xiong25 opened a new pull request, #1071:
URL: https://github.com/apache/flink-kubernetes-operator/pull/1071


   ## Summary
   - Clear `savepointTriggerId` from B/G CR status in `abortDeployment()` so 
subsequent transition attempts trigger a fresh savepoint instead of reusing a 
stale triggerId
   - The triggerId is tracked by the Flink CheckpointCoordinator for a specific 
job execution — if the job restarts (e.g. TM OOM), the new execution's 
CheckpointCoordinator won't know about it, causing 
`configureInitialSavepoint()` to fail with "Could not fetch savepoint with 
triggerId"
   - Add assertion in `verifyFailureDuringTransition` test confirming triggerId 
is null after abort
   
   ## Context
   Slack thread: https://shopify.enterprise.slack.com/archives/C0ALAR1TLP8
   
   ## Test plan
   - [x] `verifyFailureDuringTransition` passes with new assertion
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Clear stale savepointTriggerId on B/G transition abort [flink-kubernetes-operator]

Reply via email to