shrihari7396 commented on issue #17917: URL: https://github.com/apache/dolphinscheduler/issues/17917#issuecomment-3995639912
Hi everyone, I was going through the alert module code to understand how the current AlertServer works. From what I see in `AlertServer` and `AlertBootstrapService`, the alert processing only starts when the node becomes ACTIVE through `AlertHAServer`. After that `AlertBootstrapService.start()` starts the main components. Here `AlertEventFetcher` keeps fetching pending alerts from the `t_ds_alert` table and pushes them into an in-memory queue, and then `AlertEventLoop` consumes from that queue and executes the sending logic using `AlertPluginManager`. So because only the ACTIVE node runs this pipeline, AlertServer is not completely stateless. If we move this functionality inside the API Server and run multiple API instances, then we need some coordination so that multiple API nodes do not fetch and process the same alerts. One idea could be to reuse the existing `AlertHAServer` mechanism inside the API server. Then only the ACTIVE API instance will start `AlertBootstrapService`, while the other API nodes stay in standby. Also I think we should keep separate thread pools for API requests and alert processing. The API thread pool will handle REST requests (load balanced traffic), and the alert thread pool can run the alert event loop and HA related work. This way slow alert sending (like SMTP or webhook) will not block the API request threads. I am still exploring this module, so please let me know if this direction makes sense. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
