weixiaonan1 commented on issue #14832: URL: https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1704043702
> >@ruanwenjun @zixi0825 > >1. Do We need **Global Alarm Types**: That means that the alert instance accepts all events by default without specifying a specific workflow. > > > > 1. overhead: Adding global alarm type will incur extra overhead. We need to query database to see whether there is a global alert instance before creating events. For example, when workflow starts, we query database first and if there are any global alert instances, create a WorkflowStart event and save it in database . > >I am not clear why we need to `query if there exist xx event before workflow start`, if we want to send the workflow start event, we just write a event record when workflow start, do you means when we start a workflow twice we only send one event? this is unreasonable. In additional of that, this kind of workflow is doing by alert server ,this will not overhead. We query database to check if there exist **global alert instances** rather than `query if there exist xx events` before creating events. Events will be generated when there exactly exists global alert instances. Taking the example of a workflowEnd Event: In the current design, when a workflow is initiated, we bind an alert group to the workflow instance. Therefore, when the workflow execution ends, we can find the alert strategy and alert group directly from `WorkflowInstance`, and if the alert strategy meets the requirement and there is an alert group bound to the instance, we create the workflowEndEvent. With the addition of global alarm types, global alert instances are not bound to workflow instances. So, when workflow execution ends (or begins or other scenarios that may generate events), it's necessary to query the database to check whether there exists global alert instances. Of course, this process can be optimized. For instance, we can query for the alert intance only when constructing the `WorkflowExecuteRunnable`, rather than querying the database each time an event might be generated. > > > 2. Do we need **more Alarm event types**? Such as workflowAddEvent/workflowUpdateEvent/workflowDeleteEvent/workflowStartEvent,etc. > > 3. **Flexibility:** In the current alert module, the title and content are determined when master server create an alert. So the format of meassges is the same for different alert plugins. Do we need to increase some flexibility? similar to KafkaListener, plugins can generate messages in different formats and perform different processing logic for different event types. > > In fact the AlertRecord only need to contains some metadata information, the content/title or something else should generate by alert sender(kafka/email....) You are right. Current design of sender if flexible enough. > > > 4. **Failed Alert Messages:** Do we consider resending alert messages after they failed to send? > > > > 1. If we resend the failed messages: To ensure that messages sent by the same alert instance are in order (e.g., workflowStartEvent should precede workflowEndEvent) and do not affect other instances, the current message processing method needs to be changed. Each alert instance should process its events in chronological order. > > Right now, the alert server will use one loop thread to loop the event, this is guaranteed. > The Alert Server only processes PendingAlerts and does not handle FailedAlerts, which ensures the order of events. But, if we want to **retry FailedAlerts**, things become more complex. - Out-of-Order Messages: For alert instance `instanceA`, when the `process1` start event fails to send, and the `process1` end event succeeds, if we retry sending the `process1` start event and it succeeds, it can result in out-of-order events. - Availability: If the messages for alert instance `instanceA` continuously fail to send and a large number of events accumulate, it can impact other alert instances to send messages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
