ricky2129 commented on issue #10675:
URL: https://github.com/apache/seatunnel/issues/10675#issuecomment-4169001414
@dybyte Thanks for the clarification — agreed this should be a separate
issue.
To clarify the two symptoms separately:
- Job showing RUNNING in UI = zombie entry remaining in runningJobInfoIMap
(the IMap cleanup race, covered by this fix)
- Worker running 14 days with no checkpoints = CancelTaskOperation never
delivered because coordinator died before cleanJob() completed
You're right that our proposed fix direction has a gap — canceling all task
groups when the deploying coordinator departs would also fire during normal
master failover, incorrectly canceling healthy jobs. We hadn't considered that
case fully.
Does #10506 cover the orphan scenario specifically, or is it primarily
fixing notification delivery after the fact? We'll open a separate issue for
the orphan problem and think through a smarter fix approach.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]