ricky2129 commented on issue #10675:
URL: https://github.com/apache/seatunnel/issues/10675#issuecomment-4169001414

   @dybyte Thanks for the clarification — agreed this should be a separate 
issue.
   
   To clarify the two symptoms separately:
    - Job showing RUNNING in UI = zombie entry remaining in runningJobInfoIMap
       (the IMap cleanup race, covered by this fix)
    - Worker running 14 days with no checkpoints = CancelTaskOperation never
       delivered because coordinator died before cleanJob() completed
   
   You're right that our proposed fix direction has a gap — canceling all task 
groups when the deploying coordinator departs would also fire during normal 
master failover, incorrectly canceling healthy jobs. We hadn't considered that 
case fully.
   
   Does #10506 cover the orphan scenario specifically, or is it primarily 
fixing notification delivery after the fact? We'll open a separate issue for 
the orphan problem and think through a smarter fix approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to