kfaraz commented on PR #14898:
URL: https://github.com/apache/druid/pull/14898#issuecomment-1691008040

   Thanks for the explanation, @YongGang .
   
   > This was due to when SeekableStreamSupervisor started, it looped through 
the tasks from TaskStorage and query TaskMaster for leader status. The query 
returns false as the becoming leader process hasn't finished yet. But that 
created confusion as some tasks can get right TaskRunner some doesn't, all 
depends on whether leader election finished.
   
   Okay, so IIUC, the supervisor has started on a node which has been recently 
elected leader but not fully ready yet.
   
   I think the fix should be more along the lines of supervisor waiting for the 
leader election to be complete before starting its duties. Alternatively, 
`SupervisorManager` itself should become active after leader election is 
complete. This might have the following impact:
   - While leader election is in progress, we cannot perform any supervisor 
CRUD, which makes sense
   - SupervisorManager initialization would be delayed a little
   - Others?
   
   In my opinion, these side effects are only to be expected. It is better than 
being in a state where an Overlord starts doing things before it has properly 
become the leader and fails inevitably.
   
   Let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to