shrihari7396 commented on issue #17917:
URL: 
https://github.com/apache/dolphinscheduler/issues/17917#issuecomment-3995639912

   Hi everyone,
   
   I was going through the alert module code to understand how the current 
AlertServer works.
   
   From what I see in `AlertServer` and `AlertBootstrapService`, the alert 
processing only starts when the node becomes ACTIVE through `AlertHAServer`. 
After that `AlertBootstrapService.start()` starts the main components.
   
   Here `AlertEventFetcher` keeps fetching pending alerts from the `t_ds_alert` 
table and pushes them into an in-memory queue, and then `AlertEventLoop` 
consumes from that queue and executes the sending logic using 
`AlertPluginManager`.
   
   So because only the ACTIVE node runs this pipeline, AlertServer is not 
completely stateless.
   
   If we move this functionality inside the API Server and run multiple API 
instances, then we need some coordination so that multiple API nodes do not 
fetch and process the same alerts.
   
   One idea could be to reuse the existing `AlertHAServer` mechanism inside the 
API server. Then only the ACTIVE API instance will start 
`AlertBootstrapService`, while the other API nodes stay in standby.
   
   Also I think we should keep separate thread pools for API requests and alert 
processing. The API thread pool will handle REST requests (load balanced 
traffic), and the alert thread pool can run the alert event loop and HA related 
work. This way slow alert sending (like SMTP or webhook) will not block the API 
request threads.
   
   I am still exploring this module, so please let me know if this direction 
makes sense.
   
   Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to