Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK
On 2014-04-22T14:21:33, Tom Parker wrote: Hi Tom, > Has anyone seen this? Do you know what might be causing the flapping? No, I've never seen this. > Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with > cluster ... So it connected fine. This is the process maintaining the pcmk connection, so the others can be disregarded. > Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now. > Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN > Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending > Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Is this all that is happening here? Judging from this, there should be an unstable pacemaker cluster to go with this. Are there any crmd/corosync etc messages? Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK
SDB has a connection to pacemaker to establish overall cluster health (the -P flag). This seems to be where the problem is. I just don't know what the problem might be. On 23/04/14 11:32 AM, emmanuel segura wrote: > what do you mean with link? > > > 2014-04-23 15:23 GMT+02:00 Tom Parker : > >> ok. I have fixed that to be no_path_retry fail but I don't think this >> has anything to do with the errors I am seeing. >> >> They seem to be related to sbd's link with my cluster, not with disk I/O >> >> Tom >> >> On 23/04/14 03:11 AM, emmanuel segura wrote: >>> the first thing, you are using no_path_retry in wrong way in your >>> multipath, try to read this >>> http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html >>> >>> >>> 2014-04-22 20:41 GMT+02:00 Tom Parker : >>> I have attached the config files to this e-mail. The sbd dump is below [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump ==Dumping header on disk /dev/mapper/qa-xen-sbd Header version : 2.1 UUID : ae835596-3d26-4681-ba40-206b4d51149b Number of slots: 255 Sector size: 512 Timeout (watchdog) : 45 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 90 ==Header on disk /dev/mapper/qa-xen-sbd is dumped On 22/04/14 02:30 PM, emmanuel segura wrote: > you are missingo cluster configuration and sbd configuration and multipath > config > > > 2014-04-22 20:21 GMT+02:00 Tom Parker : > >> Has anyone seen this? Do you know what might be causing the flapping? >> >> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled. >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device >> /dev/mapper/qa-xen-sbd >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device >> /dev/mapper/qa-xen-sbd >> uuid: ae835596-3d26-4681-ba40-206b4d51149b >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, >> AIS >> quorum check enabled >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with >> cluster ... >> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device: >> /dev/watchdog >> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45 >> seconds. >> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with >> cluster ... >> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right >> now. >> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN >> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending >> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node stat
Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK
what do you mean with link? 2014-04-23 15:23 GMT+02:00 Tom Parker : > ok. I have fixed that to be no_path_retry fail but I don't think this > has anything to do with the errors I am seeing. > > They seem to be related to sbd's link with my cluster, not with disk I/O > > Tom > > On 23/04/14 03:11 AM, emmanuel segura wrote: > > the first thing, you are using no_path_retry in wrong way in your > > multipath, try to read this > > http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html > > > > > > 2014-04-22 20:41 GMT+02:00 Tom Parker : > > > >> I have attached the config files to this e-mail. The sbd dump is below > >> > >> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump > >> ==Dumping header on disk /dev/mapper/qa-xen-sbd > >> Header version : 2.1 > >> UUID : ae835596-3d26-4681-ba40-206b4d51149b > >> Number of slots: 255 > >> Sector size: 512 > >> Timeout (watchdog) : 45 > >> Timeout (allocate) : 2 > >> Timeout (loop) : 1 > >> Timeout (msgwait) : 90 > >> ==Header on disk /dev/mapper/qa-xen-sbd is dumped > >> > >> On 22/04/14 02:30 PM, emmanuel segura wrote: > >>> you are missingo cluster configuration and sbd configuration and > >> multipath > >>> config > >>> > >>> > >>> 2014-04-22 20:21 GMT+02:00 Tom Parker : > >>> > Has anyone seen this? Do you know what might be causing the flapping? > > Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled. > Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device > /dev/mapper/qa-xen-sbd > Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health > Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device > /dev/mapper/qa-xen-sbd > uuid: ae835596-3d26-4681-ba40-206b4d51149b > Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, > AIS > quorum check enabled > Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with > cluster ... > Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device: > /dev/watchdog > Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45 > seconds. > Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with > cluster ... > Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right > now. > Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN > Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending > Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >
Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK
ok. I have fixed that to be no_path_retry fail but I don't think this has anything to do with the errors I am seeing. They seem to be related to sbd's link with my cluster, not with disk I/O Tom On 23/04/14 03:11 AM, emmanuel segura wrote: > the first thing, you are using no_path_retry in wrong way in your > multipath, try to read this > http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html > > > 2014-04-22 20:41 GMT+02:00 Tom Parker : > >> I have attached the config files to this e-mail. The sbd dump is below >> >> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump >> ==Dumping header on disk /dev/mapper/qa-xen-sbd >> Header version : 2.1 >> UUID : ae835596-3d26-4681-ba40-206b4d51149b >> Number of slots: 255 >> Sector size: 512 >> Timeout (watchdog) : 45 >> Timeout (allocate) : 2 >> Timeout (loop) : 1 >> Timeout (msgwait) : 90 >> ==Header on disk /dev/mapper/qa-xen-sbd is dumped >> >> On 22/04/14 02:30 PM, emmanuel segura wrote: >>> you are missingo cluster configuration and sbd configuration and >> multipath >>> config >>> >>> >>> 2014-04-22 20:21 GMT+02:00 Tom Parker : >>> Has anyone seen this? Do you know what might be causing the flapping? Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled. Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device /dev/mapper/qa-xen-sbd Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd uuid: ae835596-3d26-4681-ba40-206b4d51149b Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS quorum check enabled Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with cluster ... Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device: /dev/watchdog Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45 seconds. Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with cluster ... Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now. Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! Apr 22 13:09:12 qaxen6 sbd: [12971]: WA
Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK
the first thing, you are using no_path_retry in wrong way in your multipath, try to read this http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html 2014-04-22 20:41 GMT+02:00 Tom Parker : > I have attached the config files to this e-mail. The sbd dump is below > > [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump > ==Dumping header on disk /dev/mapper/qa-xen-sbd > Header version : 2.1 > UUID : ae835596-3d26-4681-ba40-206b4d51149b > Number of slots: 255 > Sector size: 512 > Timeout (watchdog) : 45 > Timeout (allocate) : 2 > Timeout (loop) : 1 > Timeout (msgwait) : 90 > ==Header on disk /dev/mapper/qa-xen-sbd is dumped > > On 22/04/14 02:30 PM, emmanuel segura wrote: > > you are missingo cluster configuration and sbd configuration and > multipath > > config > > > > > > 2014-04-22 20:21 GMT+02:00 Tom Parker : > > > >> Has anyone seen this? Do you know what might be causing the flapping? > >> > >> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled. > >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device > >> /dev/mapper/qa-xen-sbd > >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health > >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd > >> uuid: ae835596-3d26-4681-ba40-206b4d51149b > >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS > >> quorum check enabled > >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with > >> cluster ... > >> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device: > >> /dev/watchdog > >> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45 > >> seconds. > >> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with > >> cluster ... > >> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now. > >> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN > >> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending > >> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > >> UNHEALTHY > >> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online > >> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > >> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > >> Apr 22 13:31
Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK
I have attached the config files to this e-mail. The sbd dump is below [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump ==Dumping header on disk /dev/mapper/qa-xen-sbd Header version : 2.1 UUID : ae835596-3d26-4681-ba40-206b4d51149b Number of slots: 255 Sector size: 512 Timeout (watchdog) : 45 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 90 ==Header on disk /dev/mapper/qa-xen-sbd is dumped On 22/04/14 02:30 PM, emmanuel segura wrote: > you are missingo cluster configuration and sbd configuration and multipath > config > > > 2014-04-22 20:21 GMT+02:00 Tom Parker : > >> Has anyone seen this? Do you know what might be causing the flapping? >> >> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled. >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device >> /dev/mapper/qa-xen-sbd >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd >> uuid: ae835596-3d26-4681-ba40-206b4d51149b >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS >> quorum check enabled >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with >> cluster ... >> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device: >> /dev/watchdog >> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45 >> seconds. >> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with >> cluster ... >> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now. >> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN >> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending >> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online >> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online >> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK >> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! >> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check: >> UNHEALTHY >> Apr 22 13:33:01 qaxen6 sbd: [12974
Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK
you are missingo cluster configuration and sbd configuration and multipath config 2014-04-22 20:21 GMT+02:00 Tom Parker : > Has anyone seen this? Do you know what might be causing the flapping? > > Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled. > Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device > /dev/mapper/qa-xen-sbd > Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health > Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd > uuid: ae835596-3d26-4681-ba40-206b4d51149b > Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS > quorum check enabled > Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with > cluster ... > Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device: > /dev/watchdog > Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45 > seconds. > Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with > cluster ... > Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now. > Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN > Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending > Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online > Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online > Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK > Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated! > Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check: > UNHEALTHY > Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node st