Hi, On Tue, 21 Mar 2023 09:33:04 +0100 Jérôme BECOT <jerome.be...@deveryware.com> wrote:
> We have several clusters running for different zabbix components. Some > of these clusters consist of 2 zabbix proxies,where nodes run Mysql, > Zabbix-proxy server and a VIP, and a corosync-qdevice. I'm not sure to understand your topology. The corosync-device is not supposed to be on a cluster node. It is supposed to be on a remote node and provide some quorum features to one or more cluster without setting up the whole pacemaker/corosync stack. > The MySQL servers are always up to replicate, and are configured in > Master/Master (they both replicate from the other but only one is supposed to > be updated by the proxy running on the master node). Why do you bother with Master/Master when a simple (I suppose, I'm not a MySQL cluster guy) Primary-Secondary topology or even a shared storage would be enough and would keep your logic (writes on one node only) safe from incidents, failures, errors, etc? HA must be a simple as possible. Remove useless parts when you can. > One cluster is prompt to frequent sync errors, with duplicate entries > errors in SQL. When I look at the logs, I can see "Mar 21 09:11:41 > zabbix-proxy-01 pacemaker-controld [948] (pcmk_cpg_membership) > info: Group crmd event 89: zabbix-proxy-02 (node 2 pid 967) left via > cluster exit", and within the next second, a rejoin. The same messages > are in the other node logs, suggesting a split brain, which should not > happen, because there is a quorum device. Would it be possible your SQL sync errors and the left/join issues are correlated and are both symptoms of another failure? Look at your log for some explanation about why the node decided to leave the cluster. > Can you help me to troubleshoot this ? I can provide any > log/configuration required in the process, so let me know. > > I'd also like to ask if there is a bit of configuration that can be done > to postpone service start on the other node for two or three seconds as > a quick workaround ? How would it be a workaround? Regards, _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/