[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

via GitHub Sun, 03 Sep 2023 01:08:42 -0700


weixiaonan1 commented on issue #14832:
URL: 
https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1704043702


   > >@ruanwenjun @zixi0825
   > >1. Do We need **Global Alarm Types**: That means that the alert instance 
accepts all events by default without specifying a specific workflow.
   > > 
   > > 1. overhead: Adding global alarm type will incur extra overhead.  We 
need to query database to see whether there is a global alert instance before 
creating events. For example, when workflow starts,  we query database first 
and if there are any global alert instances, create a WorkflowStart event and 
save it in database .
   >
   >I am not clear why we need to `query if there exist xx event before 
workflow start`, if we want to send the workflow start event, we just write a 
event record when workflow start, do you means when we start a workflow twice 
we only send one event? this is unreasonable. In additional of that, this kind 
of workflow is doing by alert server ,this will not overhead.
   
   We query database to check if there exist  **global alert instances** rather 
than `query if there exist xx events` before creating events. Events will be 
generated when there exactly exists global alert instances. 
   Taking the example of a workflowEnd Event: In the current design, when a 
workflow is initiated, we bind an alert group to the workflow instance. 
Therefore, when the workflow execution ends, we can find the alert strategy and 
alert group directly from `WorkflowInstance`, and if the alert strategy meets 
the requirement and there is an alert group bound to the instance, we create 
the workflowEndEvent. With the addition of global alarm types, global alert 
instances are not bound to workflow instances. So, when workflow execution ends 
(or begins or other scenarios that may generate events), it's necessary to 
query the database to check whether there exists global alert instances. Of 
course, this process can be optimized. For instance, we can query for the alert 
intance only when constructing the `WorkflowExecuteRunnable`, rather than 
querying the database each time an event might be generated.
   
   > 
   > > 2. Do we need **more Alarm event types**? Such as 
workflowAddEvent/workflowUpdateEvent/workflowDeleteEvent/workflowStartEvent,etc.
   > > 3. **Flexibility:** In the current alert module, the title and content 
are determined when master server create an alert. So the format of meassges is 
the same for different alert plugins. Do we need to increase some flexibility? 
similar to KafkaListener, plugins can generate messages in different formats 
and perform different processing logic for different event types.
   > 
   > In fact the AlertRecord only need to contains some metadata information, 
the content/title or something else should generate by alert 
sender(kafka/email....)
   
   You are right. Current design of sender if flexible enough.
   > 
   > > 4. **Failed Alert Messages:** Do we consider resending alert messages 
after they failed to send?
   > >    
   > >    1. If we resend the failed messages: To ensure that messages sent by 
the same alert instance are in order (e.g., workflowStartEvent should precede 
workflowEndEvent) and do not affect other instances, the current message 
processing method needs to be changed. Each alert instance should process its 
events in chronological order.
   > 
   > Right now, the alert server will use one loop thread to loop the event, 
this is guaranteed.
   > 
   
   The Alert Server only processes PendingAlerts and does not handle 
FailedAlerts, which ensures the order of events. But, if we want to **retry 
FailedAlerts**, things become more complex.
   - Out-of-Order Messages: For alert instance `instanceA`, when the `process1` 
start event fails to send, and the `process1` end event succeeds, if we retry 
sending the `process1` start event and it succeeds, it can result in 
out-of-order events.
   - Availability: If the messages for alert instance `instanceA` continuously 
fail to send and a large number of events accumulate, it can impact other alert 
instances to send messages.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [dolphinscheduler] weixiaonan1 commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Reply via email to