Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
On 2/11/19 9:49 AM, Fulong Wang wrote: Thanks Yan, You gave me more valuable hints on the SBD operation! Now, i can see the verbose output after service restart. Be aware since pacemaker integration (-P) is enabled by default, which means despite the sbd failure, if the node itself was clean and "healthy" from pacemaker's point of view and if it's in the cluster partition with the quorum, it wouldn't self-fence -- meaning a node just being unable to fence doesn't necessarily need to be fenced. As described in sbd man page, "this allows sbd to survive temporary outages of the majority of devices. However, while the cluster is in such a degraded state, it can neither successfully fence nor be shutdown cleanly (as taking the cluster below the quorum threshold will immediately cause all remaining nodes to self-fence). In short, it will not tolerate any further faults. Please repair the system before continuing." Yes, I can see the "pacemaker integration" was enabled in my sbd config file by default. So, you mean in some sbd failure cases, if the node was considered as "healthy" from pacemaker's poinit of view, it still wouldn't sel-fence. Honestly speaking, i didn't get you at this point. I have "no-quorum-policy=ignore" setting in my setup and it's a two node cluster. Not directly related to the behaviors of sbd, starting from corosync-2, with properly configured "quorum" service in corosync.conf, no-quorum-policy=ignore in pacemaker should be avoided, meaning pacemaker should follow the decisions on quorum made by corosync: https://www.suse.com/documentation/sle-ha-12/book_sleha/data/sec_ha_config_basics_global.html#sec_ha_config_basics_corosync_2-node Can you show me a sample situation for this? For example if a node loses access to the sbd device, but every node is still "clean" online, meaning there's no need to fence anyone at the point. The node will continue functioning under such a degraded state. But of course administrator needs to fix the sbd issue as soon as possible. Be aware that 2-node cluster is such a common but special use case. If we lose one node meanwhile also lose the access to sbd, the single online node will self-fence even if corosync's votequorum service considers it as being "quorate". This is the safest approach for good in case it's split-brain. This already works correctly with the fix in regard of 2-node cluster from Klaus. Regards, Yan Many Thanks!!! Reagards Fulong *From:* Gao,Yan *Sent:* Thursday, January 3, 2019 20:43 *To:* Fulong Wang; Cluster Labs - All topics related to open-source clustering welcomed *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue On 12/24/18 7:10 AM, Fulong Wang wrote: Yan, klaus and Everyone, Merry Christmas!!! Many thanks for your advice! I added the "-v" param in "SBD_OPTS", but didn't see any apparent change in the system message log, am i looking at a wrong place? Did you restart all cluster services, for example by "crm cluster stop" and then "crm cluster start"? Basically sbd.service needs to be restarted. Be aware "systemctl restart pacemaker" only restarts pacemaker. SBD daemons log into syslog. When a sbd watcher receives a "test" command, there should be a syslog like this showing up: "servant: Received command test from ..." sbd won't actually do anything about a "test" command but logging a message. If you are not running a late version of sbd (maintenance update) yet, a single "-v" will make sbd too verbose already. But of course you could use grep. By the way, we want to test when the disk access paths (multipath devices) lost, the sbd can fence the node automatically. Be aware since pacemaker integration (-P) is enabled by default, which means despite the sbd failure, if the node itself was clean and "healthy" from pacemaker's point of view and if it's in the cluster partition with the quorum, it wouldn't self-fence -- meaning a node just being unable to fence doesn't necessarily need to be fenced. As described in sbd man page, "this allows sbd to survive temporary outages of the majority of devices. However, while the cluster is in such a degraded state, it can neither successfully fence nor be shutdown cleanly (as taking the cluster below the quorum threshold will immediately cause all remaining nodes to self-fence). In short, it will not tolerate any further faults. Please repair the system before continuing." Regards, Yan what's your recommendation for this scenario? The "crm node fence" did the work. Regards Fulong -------
Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
On 2/12/19 3:38 AM, Fulong Wang wrote: Klaus, Thanks for the infor! Did you mean i should compile sbd from github source to include the fixs you mentioned by myself? The corosync, pacemaker and sbd version in my setup is as below: corosync: 2.3.6-9.13.1 pacemaker: 1.1.16-6.5.1 sbd: 1.3.1+20180507 I'm pretty sure this version has the fix in regard of 2-node cluster from Klaus. Regards, Yan Regards Fulong *From:* Klaus Wenninger *Sent:* Monday, February 11, 2019 18:51 *To:* Cluster Labs - All topics related to open-source clustering welcomed; Fulong Wang; Gao,Yan *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue On 02/11/2019 09:49 AM, Fulong Wang wrote: Thanks Yan, You gave me more valuable hints on the SBD operation! Now, i can see the verbose output after service restart. >Be aware since pacemaker integration (-P) is enabled by default, which >means despite the sbd failure, if the node itself was clean and >"healthy" from pacemaker's point of view and if it's in the cluster >partition with the quorum, it wouldn't self-fence -- meaning a node just >being unable to fence doesn't necessarily need to be fenced. >As described in sbd man page, "this allows sbd to survive temporary >outages of the majority of devices. However, while the cluster is in >such a degraded state, it can neither successfully fence nor be shutdown >cleanly (as taking the cluster below the quorum threshold will >immediately cause all remaining nodes to self-fence). In short, it will >not tolerate any further faults. Please repair the system before >continuing." Yes, I can see the "pacemaker integration" was enabled in my sbd config file by default. So, you mean in some sbd failure cases, if the node was considered as "healthy" from pacemaker's poinit of view, it still wouldn't sel-fence. Honestly speaking, i didn't get you at this point. I have "no-quorum-policy=ignore" setting in my setup and it's a two node cluster. Can you show me a sample situation for this? When using sbd with 2-node-clusters and pacemaker-integration you might check https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377 to be included in your sbd-version. This is relevant when 2-node is configured in corosync. Regards, Klaus Many Thanks!!! Reagards Fulong *From:* Gao,Yan <mailto:y...@suse.com> *Sent:* Thursday, January 3, 2019 20:43 *To:* Fulong Wang; Cluster Labs - All topics related to open-source clustering welcomed *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue On 12/24/18 7:10 AM, Fulong Wang wrote: > Yan, klaus and Everyone, > > > Merry Christmas!!! > > > > Many thanks for your advice! > I added the "-v" param in "SBD_OPTS", but didn't see any apparent change > in the system message log, am i looking at a wrong place? Did you restart all cluster services, for example by "crm cluster stop" and then "crm cluster start"? Basically sbd.service needs to be restarted. Be aware "systemctl restart pacemaker" only restarts pacemaker. SBD daemons log into syslog. When a sbd watcher receives a "test" command, there should be a syslog like this showing up: "servant: Received command test from ..." sbd won't actually do anything about a "test" command but logging a message. If you are not running a late version of sbd (maintenance update) yet, a single "-v" will make sbd too verbose already. But of course you could use grep. > > By the way, we want to test when the disk access paths (multipath > devices) lost, the sbd can fence the node automatically. Be aware since pacemaker integration (-P) is enabled by default, which means despite the sbd failure, if the node itself was clean and "healthy" from pacemaker's point of view and if it's in the cluster partition with the quorum, it wouldn't self-fence -- meaning a node just being unable to fence doesn't necessarily need to be fenced. As described in sbd man page, "this allows sbd to survive temporary outages of the majority of devices. However, while the cluster is in such a degraded state, it can neither successfully fence nor be shutdown cleanly (as taking the cluster below the quorum threshold will immediately cause all remaining nodes to self-fence). In short, it will not tolerate any further faults. Please repair the system before continuing." Regards, Yan > what's your recommendation for this scenario? > > > > > > > > The "crm node fence" did the work. &g
Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
Klaus, Thanks for the infor! Did you mean i should compile sbd from github source to include the fixs you mentioned by myself? The corosync, pacemaker and sbd version in my setup is as below: corosync: 2.3.6-9.13.1 pacemaker: 1.1.16-6.5.1 sbd: 1.3.1+20180507 Regards Fulong From: Klaus Wenninger Sent: Monday, February 11, 2019 18:51 To: Cluster Labs - All topics related to open-source clustering welcomed; Fulong Wang; Gao,Yan Subject: Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue On 02/11/2019 09:49 AM, Fulong Wang wrote: Thanks Yan, You gave me more valuable hints on the SBD operation! Now, i can see the verbose output after service restart. >Be aware since pacemaker integration (-P) is enabled by default, which >means despite the sbd failure, if the node itself was clean and >"healthy" from pacemaker's point of view and if it's in the cluster >partition with the quorum, it wouldn't self-fence -- meaning a node just >being unable to fence doesn't necessarily need to be fenced. >As described in sbd man page, "this allows sbd to survive temporary >outages of the majority of devices. However, while the cluster is in >such a degraded state, it can neither successfully fence nor be shutdown >cleanly (as taking the cluster below the quorum threshold will >immediately cause all remaining nodes to self-fence). In short, it will >not tolerate any further faults. Please repair the system before >continuing." Yes, I can see the "pacemaker integration" was enabled in my sbd config file by default. So, you mean in some sbd failure cases, if the node was considered as "healthy" from pacemaker's poinit of view, it still wouldn't sel-fence. Honestly speaking, i didn't get you at this point. I have "no-quorum-policy=ignore" setting in my setup and it's a two node cluster. Can you show me a sample situation for this? When using sbd with 2-node-clusters and pacemaker-integration you might check https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377 to be included in your sbd-version. This is relevant when 2-node is configured in corosync. Regards, Klaus Many Thanks!!! Reagards Fulong From: Gao,Yan <mailto:y...@suse.com> Sent: Thursday, January 3, 2019 20:43 To: Fulong Wang; Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue On 12/24/18 7:10 AM, Fulong Wang wrote: > Yan, klaus and Everyone, > > > Merry Christmas!!! > > > > Many thanks for your advice! > I added the "-v" param in "SBD_OPTS", but didn't see any apparent change > in the system message log, am i looking at a wrong place? Did you restart all cluster services, for example by "crm cluster stop" and then "crm cluster start"? Basically sbd.service needs to be restarted. Be aware "systemctl restart pacemaker" only restarts pacemaker. SBD daemons log into syslog. When a sbd watcher receives a "test" command, there should be a syslog like this showing up: "servant: Received command test from ..." sbd won't actually do anything about a "test" command but logging a message. If you are not running a late version of sbd (maintenance update) yet, a single "-v" will make sbd too verbose already. But of course you could use grep. > > By the way, we want to test when the disk access paths (multipath > devices) lost, the sbd can fence the node automatically. Be aware since pacemaker integration (-P) is enabled by default, which means despite the sbd failure, if the node itself was clean and "healthy" from pacemaker's point of view and if it's in the cluster partition with the quorum, it wouldn't self-fence -- meaning a node just being unable to fence doesn't necessarily need to be fenced. As described in sbd man page, "this allows sbd to survive temporary outages of the majority of devices. However, while the cluster is in such a degraded state, it can neither successfully fence nor be shutdown cleanly (as taking the cluster below the quorum threshold will immediately cause all remaining nodes to self-fence). In short, it will not tolerate any further faults. Please repair the system before continuing." Regards, Yan > what's your recommendation for this scenario? > > > > > > > > The "crm node fence" did the work. > > > > > > > > > > > > > > Regards > Fulong > > > *From:* Gao,Yan <mailto:y...@suse.com> > *Sent:* Friday, December 21, 2018 20:43 >
Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
On 02/11/2019 09:49 AM, Fulong Wang wrote: > Thanks Yan, > > You gave me more valuable hints on the SBD operation! > Now, i can see the verbose output after service restart. > > > >Be aware since pacemaker integration (-P) is enabled by default, which > >means despite the sbd failure, if the node itself was clean and > >"healthy" from pacemaker's point of view and if it's in the cluster > >partition with the quorum, it wouldn't self-fence -- meaning a node just > >being unable to fence doesn't necessarily need to be fenced. > > >As described in sbd man page, "this allows sbd to survive temporary > >outages of the majority of devices. However, while the cluster is in > >such a degraded state, it can neither successfully fence nor be shutdown > >cleanly (as taking the cluster below the quorum threshold will > >immediately cause all remaining nodes to self-fence). In short, it will > >not tolerate any further faults. Please repair the system before > >continuing." > > Yes, I can see the "pacemaker integration" was enabled in my sbd > config file by default. > So, you mean in some sbd failure cases, if the node was considered as > "healthy" from pacemaker's poinit of view, it still wouldn't sel-fence. > > Honestly speaking, i didn't get you at this point. I have > "no-quorum-policy=ignore" setting in my setup and it's a two node > cluster. > Can you show me a sample situation for this? When using sbd with 2-node-clusters and pacemaker-integration you might check https://github.com/ClusterLabs/sbd/commit/4bd0a66da3ac9c9afaeb8a2468cdd3ed51ad3377 to be included in your sbd-version. This is relevant when 2-node is configured in corosync. Regards, Klaus > > Many Thanks!!! > > > > > Reagards > Fulong > > > > -------------------- > *From:* Gao,Yan > *Sent:* Thursday, January 3, 2019 20:43 > *To:* Fulong Wang; Cluster Labs - All topics related to open-source > clustering welcomed > *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue > > On 12/24/18 7:10 AM, Fulong Wang wrote: > > Yan, klaus and Everyone, > > > > > > Merry Christmas!!! > > > > > > > > Many thanks for your advice! > > I added the "-v" param in "SBD_OPTS", but didn't see any apparent > change > > in the system message log, am i looking at a wrong place? > Did you restart all cluster services, for example by "crm cluster stop" > and then "crm cluster start"? Basically sbd.service needs to be > restarted. Be aware "systemctl restart pacemaker" only restarts pacemaker. > > SBD daemons log into syslog. When a sbd watcher receives a "test" > command, there should be a syslog like this showing up: > > "servant: Received command test from ..." > > sbd won't actually do anything about a "test" command but logging a > message. > > If you are not running a late version of sbd (maintenance update) yet, a > single "-v" will make sbd too verbose already. But of course you could > use grep. > > > > > By the way, we want to test when the disk access paths (multipath > > devices) lost, the sbd can fence the node automatically. > Be aware since pacemaker integration (-P) is enabled by default, which > means despite the sbd failure, if the node itself was clean and > "healthy" from pacemaker's point of view and if it's in the cluster > partition with the quorum, it wouldn't self-fence -- meaning a node just > being unable to fence doesn't necessarily need to be fenced. > > As described in sbd man page, "this allows sbd to survive temporary > outages of the majority of devices. However, while the cluster is in > such a degraded state, it can neither successfully fence nor be shutdown > cleanly (as taking the cluster below the quorum threshold will > immediately cause all remaining nodes to self-fence). In short, it will > not tolerate any further faults. Please repair the system before > continuing." > > Regards, > Yan > > > > what's your recommendation for this scenario? > > > > > > > > > > > > > > > > The "crm node fence" did the work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > Fulong > > > > > > *From:* Gao,Yan >
Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
Thanks Yan, You gave me more valuable hints on the SBD operation! Now, i can see the verbose output after service restart. >Be aware since pacemaker integration (-P) is enabled by default, which >means despite the sbd failure, if the node itself was clean and >"healthy" from pacemaker's point of view and if it's in the cluster >partition with the quorum, it wouldn't self-fence -- meaning a node just >being unable to fence doesn't necessarily need to be fenced. >As described in sbd man page, "this allows sbd to survive temporary >outages of the majority of devices. However, while the cluster is in >such a degraded state, it can neither successfully fence nor be shutdown >cleanly (as taking the cluster below the quorum threshold will >immediately cause all remaining nodes to self-fence). In short, it will >not tolerate any further faults. Please repair the system before >continuing." Yes, I can see the "pacemaker integration" was enabled in my sbd config file by default. So, you mean in some sbd failure cases, if the node was considered as "healthy" from pacemaker's poinit of view, it still wouldn't sel-fence. Honestly speaking, i didn't get you at this point. I have "no-quorum-policy=ignore" setting in my setup and it's a two node cluster. Can you show me a sample situation for this? Many Thanks!!! Reagards Fulong From: Gao,Yan Sent: Thursday, January 3, 2019 20:43 To: Fulong Wang; Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue On 12/24/18 7:10 AM, Fulong Wang wrote: > Yan, klaus and Everyone, > > > Merry Christmas!!! > > > > Many thanks for your advice! > I added the "-v" param in "SBD_OPTS", but didn't see any apparent change > in the system message log, am i looking at a wrong place? Did you restart all cluster services, for example by "crm cluster stop" and then "crm cluster start"? Basically sbd.service needs to be restarted. Be aware "systemctl restart pacemaker" only restarts pacemaker. SBD daemons log into syslog. When a sbd watcher receives a "test" command, there should be a syslog like this showing up: "servant: Received command test from ..." sbd won't actually do anything about a "test" command but logging a message. If you are not running a late version of sbd (maintenance update) yet, a single "-v" will make sbd too verbose already. But of course you could use grep. > > By the way, we want to test when the disk access paths (multipath > devices) lost, the sbd can fence the node automatically. Be aware since pacemaker integration (-P) is enabled by default, which means despite the sbd failure, if the node itself was clean and "healthy" from pacemaker's point of view and if it's in the cluster partition with the quorum, it wouldn't self-fence -- meaning a node just being unable to fence doesn't necessarily need to be fenced. As described in sbd man page, "this allows sbd to survive temporary outages of the majority of devices. However, while the cluster is in such a degraded state, it can neither successfully fence nor be shutdown cleanly (as taking the cluster below the quorum threshold will immediately cause all remaining nodes to self-fence). In short, it will not tolerate any further faults. Please repair the system before continuing." Regards, Yan > what's your recommendation for this scenario? > > > > > > > > The "crm node fence" did the work. > > > > > > > > > > > > > > Regards > Fulong > > > *From:* Gao,Yan > *Sent:* Friday, December 21, 2018 20:43 > *To:* kwenn...@redhat.com; Cluster Labs - All topics related to > open-source clustering welcomed; Fulong Wang > *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue > First thanks for your reply, Klaus! > > On 2018/12/21 10:09, Klaus Wenninger wrote: >> On 12/21/2018 08:15 AM, Fulong Wang wrote: >>> Hello Experts, >>> >>> I'm New to this mail lists. >>> Pls kindlyforgive me if this mail has disturb you! >>> >>> Our Company recently is evaluating the usage of the SuSE HAE on x86 >>> platform. >>> Wen simulating the storage disaster fail-over, i finally found that >>> the SBD communication functioned normal on SuSE11 SP4 but abnormal on >>> SuSE12 SP3. >> >> I have no experience with SBD on SLES but I know that handling of the >> logging verbosity-levels has changed
Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
On 12/22/18 5:27 AM, Andrei Borzenkov wrote: 21.12.2018 12:09, Klaus Wenninger пишет: On 12/21/2018 08:15 AM, Fulong Wang wrote: Hello Experts, I'm New to this mail lists. Pls kindlyforgive me if this mail has disturb you! Our Company recently is evaluating the usage of the SuSE HAE on x86 platform. Wen simulating the storage disaster fail-over, i finally found that the SBD communication functioned normal on SuSE11 SP4 but abnormal on SuSE12 SP3. I have no experience with SBD on SLES but I know that handling of the logging verbosity-levels has changed recently in the upstream-repo. Given that it was done by Yan Gao iirc I'd assume it went into SLES. So changing the verbosity of the sbd-daemon might get you back these logs. Do you mean commit 2dbdee29736fcbf0fe1d41c306959b22d05f72b0 Author: Gao,Yan Date: Mon Apr 30 18:02:04 2018 +0200 Log: upgrade important messages and downgrade unimportant ones ?? This commit actually increased severity for message on target node: @@ -1180,7 +1180,7 @@ int servant(const char *diskname, int mode, const void* argp) } if (s_mbox->cmd > 0) { - cl_log(LOG_INFO, + cl_log(LOG_NOTICE, "Received command %s from %s on disk %s", char2cmd(s_mbox->cmd), s_mbox->from, diskname); and did not change severity for messages on source node (they are still INFO). True. Not sure if any of them should belong to notice if everything works well... sbd commands that send messages can be supplied with -v as well of course. Regards, Yan And of course you can use the list command on the other node to verify as well. Klaus The SBD device was added during the initialization of the first cluster node. I have requested help from SuSE guys, but they didn't give me any valuable feedback yet now! Below are some screenshots to explain what i have encountered. ~~~ on a SuSE11 SP4 HAE cluster, i run the sbd test command as below: then there will be some information showed up in the local system message log on the second node, we can found that the communication is normal by but when i turn to a SuSE12 SP3 HAE cluster, ran the same command as above: I didn't get any response in the system message log. "systemctl status sbd" also doesn't give me any clue on this. ~~ What could be the reason for this abnormal behavior? Is there any problems with my setup? Any suggestions are appreciate! Thanks! Regards FuLong ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
On 12/24/18 7:10 AM, Fulong Wang wrote: Yan, klaus and Everyone, Merry Christmas!!! Many thanks for your advice! I added the "-v" param in "SBD_OPTS", but didn't see any apparent change in the system message log, am i looking at a wrong place? Did you restart all cluster services, for example by "crm cluster stop" and then "crm cluster start"? Basically sbd.service needs to be restarted. Be aware "systemctl restart pacemaker" only restarts pacemaker. SBD daemons log into syslog. When a sbd watcher receives a "test" command, there should be a syslog like this showing up: "servant: Received command test from ..." sbd won't actually do anything about a "test" command but logging a message. If you are not running a late version of sbd (maintenance update) yet, a single "-v" will make sbd too verbose already. But of course you could use grep. By the way, we want to test when the disk access paths (multipath devices) lost, the sbd can fence the node automatically. Be aware since pacemaker integration (-P) is enabled by default, which means despite the sbd failure, if the node itself was clean and "healthy" from pacemaker's point of view and if it's in the cluster partition with the quorum, it wouldn't self-fence -- meaning a node just being unable to fence doesn't necessarily need to be fenced. As described in sbd man page, "this allows sbd to survive temporary outages of the majority of devices. However, while the cluster is in such a degraded state, it can neither successfully fence nor be shutdown cleanly (as taking the cluster below the quorum threshold will immediately cause all remaining nodes to self-fence). In short, it will not tolerate any further faults. Please repair the system before continuing." Regards, Yan what's your recommendation for this scenario? The "crm node fence" did the work. Regards Fulong *From:* Gao,Yan *Sent:* Friday, December 21, 2018 20:43 *To:* kwenn...@redhat.com; Cluster Labs - All topics related to open-source clustering welcomed; Fulong Wang *Subject:* Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue First thanks for your reply, Klaus! On 2018/12/21 10:09, Klaus Wenninger wrote: On 12/21/2018 08:15 AM, Fulong Wang wrote: Hello Experts, I'm New to this mail lists. Pls kindlyforgive me if this mail has disturb you! Our Company recently is evaluating the usage of the SuSE HAE on x86 platform. Wen simulating the storage disaster fail-over, i finally found that the SBD communication functioned normal on SuSE11 SP4 but abnormal on SuSE12 SP3. I have no experience with SBD on SLES but I know that handling of the logging verbosity-levels has changed recently in the upstream-repo. Given that it was done by Yan Gao iirc I'd assume it went into SLES. So changing the verbosity of the sbd-daemon might get you back these logs. Yes, I think it's the issue. Could you please retrieve the latest maintenance update for SLE12SP3 and try? Otherwise of course you could temporarily enable verbose/debug logging by adding a couple of "-v" into "SBD_OPTS" in /etc/sysconfig/sbd. But frankly, it makes more sense to manually trigger fencing for example by "crm node fence" and see if it indeed works correctly. And of course you can use the list command on the other node to verify as well. The "test" message in the slot might get overwritten soon by a "clear" if the sbd daemon is running. Regards, Yan Klaus The SBD device was added during the initialization of the first cluster node. I have requested help from SuSE guys, but they didn't give me any valuable feedback yet now! Below are some screenshots to explain what i have encountered. ~~~ on a SuSE11 SP4 HAE cluster, i run the sbd test command as below: then there will be some information showed up in the local system message log on the second node, we can found that the communication is normal by but when i turn to a SuSE12 SP3 HAE cluster, ran the same command as above: I didn't get any response in the system message log. "systemctl status sbd" also doesn't give me any clue on this. ~~ What could be the reason for this abnormal behavior? Is there any problems with my setup? Any suggestions are appreciate! Thanks! Regards FuLong ___ Users mailing list:Users@clusterlabs.org https://lists.clust
Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
21.12.2018 12:09, Klaus Wenninger пишет: > On 12/21/2018 08:15 AM, Fulong Wang wrote: >> Hello Experts, >> >> I'm New to this mail lists. >> Pls kindlyforgive me if this mail has disturb you! >> >> Our Company recently is evaluating the usage of the SuSE HAE on x86 >> platform. >> Wen simulating the storage disaster fail-over, i finally found that >> the SBD communication functioned normal on SuSE11 SP4 but abnormal on >> SuSE12 SP3. > > I have no experience with SBD on SLES but I know that handling of the > logging verbosity-levels has changed recently in the upstream-repo. > Given that it was done by Yan Gao iirc I'd assume it went into SLES. > So changing the verbosity of the sbd-daemon might get you back > these logs. Do you mean commit 2dbdee29736fcbf0fe1d41c306959b22d05f72b0 Author: Gao,Yan Date: Mon Apr 30 18:02:04 2018 +0200 Log: upgrade important messages and downgrade unimportant ones ?? This commit actually increased severity for message on target node: @@ -1180,7 +1180,7 @@ int servant(const char *diskname, int mode, const void* argp) } if (s_mbox->cmd > 0) { - cl_log(LOG_INFO, + cl_log(LOG_NOTICE, "Received command %s from %s on disk %s", char2cmd(s_mbox->cmd), s_mbox->from, diskname); and did not change severity for messages on source node (they are still INFO). > And of course you can use the list command on the other node > to verify as well. > > Klaus > >> The SBD device was added during the initialization of the first >> cluster node. >> >> I have requested help from SuSE guys, but they didn't give me any >> valuable feedback yet now! >> >> >> Below are some screenshots to explain what i have encountered. >> ~~~ >> >> on a SuSE11 SP4 HAE cluster, i run the sbd test command as below: >> >> >> then there will be some information showed up in the local system >> message log >> >> >> >> on the second node, we can found that the communication is normal by >> >> >> >> but when i turn to a SuSE12 SP3 HAE cluster, ran the same command as >> above: >> >> >> >> I didn't get any response in the system message log. >> >> >> "systemctl status sbd" also doesn't give me any clue on this. >> >> >> >> ~~ >> >> What could be the reason for this abnormal behavior? Is there any >> problems with my setup? >> Any suggestions are appreciate! >> >> Thanks! >> >> >> Regards >> FuLong >> >> >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] SuSE12SP3 HAE SBD Communication Issue
First thanks for your reply, Klaus! On 2018/12/21 10:09, Klaus Wenninger wrote: On 12/21/2018 08:15 AM, Fulong Wang wrote: Hello Experts, I'm New to this mail lists. Pls kindlyforgive me if this mail has disturb you! Our Company recently is evaluating the usage of the SuSE HAE on x86 platform. Wen simulating the storage disaster fail-over, i finally found that the SBD communication functioned normal on SuSE11 SP4 but abnormal on SuSE12 SP3. I have no experience with SBD on SLES but I know that handling of the logging verbosity-levels has changed recently in the upstream-repo. Given that it was done by Yan Gao iirc I'd assume it went into SLES. So changing the verbosity of the sbd-daemon might get you back these logs. Yes, I think it's the issue. Could you please retrieve the latest maintenance update for SLE12SP3 and try? Otherwise of course you could temporarily enable verbose/debug logging by adding a couple of "-v" into "SBD_OPTS" in /etc/sysconfig/sbd. But frankly, it makes more sense to manually trigger fencing for example by "crm node fence" and see if it indeed works correctly. And of course you can use the list command on the other node to verify as well. The "test" message in the slot might get overwritten soon by a "clear" if the sbd daemon is running. Regards, Yan Klaus The SBD device was added during the initialization of the first cluster node. I have requested help from SuSE guys, but they didn't give me any valuable feedback yet now! Below are some screenshots to explain what i have encountered. ~~~ on a SuSE11 SP4 HAE cluster, i run the sbd test command as below: then there will be some information showed up in the local system message log on the second node, we can found that the communication is normal by but when i turn to a SuSE12 SP3 HAE cluster, ran the same command as above: I didn't get any response in the system message log. "systemctl status sbd" also doesn't give me any clue on this. ~~ What could be the reason for this abnormal behavior? Is there any problems with my setup? Any suggestions are appreciate! Thanks! Regards FuLong ___ Users mailing list:Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home:http://www.clusterlabs.org Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org