Yes, the above points have a higher priority.The modification to alert HA can be implemented later to improve system stability.
是的,上面说的这些点优先级会更高些。alert HA的改造可以先放到后续提升系统稳定性时实现。 [email protected] From: [email protected] Date: 2020-08-24 10:26 To: dev Subject: Re: About the high availability implementation of the Alert service HI, strongly agree with Yichao Yang DolpinScheduler not an alarm system. scheduling is the core . there are many points in the master's scheduling that need to be optimized for example scheduling core 1,api commicate master server 2,master server state machine optimized 3,master monitor task state not scan db 4,big dag json split(very important) 5,alert server as a service 6,master code optimized abstract out the interface 7,for scheduling abnormal processes and tasks, some background threads are needed to handle features 1,task parameter passing . default time parameter support hour,minute complement data 2,SQL task problem. SQL Task can't very well support sql 3,task plugin Thx Best Regards --------------- DolphinScheduler(Incubator) PPMC Zhanwei Qiao 乔占卫 [email protected] --------------- From: Yichao Yang Date: 2020-08-23 10:43 To: dev Subject: Re: About the high availability implementation of the Alert service Hi, I don't think the ha of alert is necessary at present. This extension can be extended by users. We should focus on the current scheduling. Best, Yichao Yang ------------------ Original ------------------ From: JUN GAO <[email protected]> Date: Sat,Aug 22,2020 9:41 PM To: dev <[email protected]> Subject: Re: About the high availability implementation of the Alert service I think the first one is better. [email protected] <[email protected]>于2020年8月22日 周六19:30写道: > hi ALL > > I would like to make a suggestion that the Alert Module is not currently > designed to be in a high availability state, and that there are problems > with sending repeated alerts when multiple alert services are started. > Alarm service down, DS alarm failure problem. > So far, I've come up with two architectures that address the problem of > sending warning messages repeatedly, while implementing the > high-availability Alert Moduler feature. > > 1、The first is the master-slave relationship between the alert services > through ZK. Only the master node is responsible for sending information. > After the master node is suspended, the master is selected again, and the > new master node continues to provide the warning service. > 2.The second is a de-centralised design in which all alert services work > simultaneously through exclusive locks between them, in which case the > alert messages are not repeated. > > If we have a better plan, we can discuss it together > > Thx > > 中文: > 我提一个建议,目前alert module 设计上还不是高可用状态,存在启动多个alert 服务时,会重复发送告警信息的问题。 > 告警服务挂掉,ds告警功能失效的问题。 > 目前我想到了两种架构来解决重复发送告警信息的问题,同时实现alert moduler高可用功能。 > 1.第一种是alert 服务之间通过zk 实现主从关系,只有主节点来负责信息发送,在主节点挂掉后,重新选主,新的主节点来继续提供告警服务。 > 2.第二种采用去中心的设计,alert 服务 之间通过排它锁来实现所有alert 服务同时工作,并在这种情况下保证告警信息不重复发送。 > 如果大家有更好的方案,可以一起讨论 > > 谢谢 > > > > > [email protected] > -- DolphinScheduler(Incubator) PPMC Jun Gao 高俊 [email protected]
