Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-25 Thread Lars Marowsky-Bree
On 2014-04-22T14:21:33, Tom Parker  wrote:

Hi Tom,

> Has anyone seen this?  Do you know what might be causing the flapping?

No, I've never seen this.

> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
> cluster ...

So it connected fine. This is the process maintaining the pcmk
connection, so the others can be disregarded.

> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!

Is this all that is happening here? 

Judging from this, there should be an unstable pacemaker cluster to go
with this.

Are there any crmd/corosync etc messages?


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-23 Thread Tom Parker
SDB has a connection to pacemaker to establish overall cluster health
(the -P flag).  This seems to be where the problem is.  I just don't
know what the problem might be.

On 23/04/14 11:32 AM, emmanuel segura wrote:
> what do you mean with link?
>
>
> 2014-04-23 15:23 GMT+02:00 Tom Parker :
>
>> ok.  I have fixed that to be no_path_retry fail but I don't think this
>> has anything to do with the errors I am seeing.
>>
>> They seem to be related to sbd's link with my cluster, not with disk I/O
>>
>> Tom
>>
>> On 23/04/14 03:11 AM, emmanuel segura wrote:
>>> the first thing, you are using no_path_retry in wrong way in your
>>> multipath, try to read this
>>> http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html
>>>
>>>
>>> 2014-04-22 20:41 GMT+02:00 Tom Parker :
>>>
 I have attached the config files to this e-mail.  The sbd dump is below

 [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
 ==Dumping header on disk /dev/mapper/qa-xen-sbd
 Header version : 2.1
 UUID   : ae835596-3d26-4681-ba40-206b4d51149b
 Number of slots: 255
 Sector size: 512
 Timeout (watchdog) : 45
 Timeout (allocate) : 2
 Timeout (loop) : 1
 Timeout (msgwait)  : 90
 ==Header on disk /dev/mapper/qa-xen-sbd is dumped

 On 22/04/14 02:30 PM, emmanuel segura wrote:
> you are missingo cluster configuration and sbd configuration and
 multipath
> config
>
>
> 2014-04-22 20:21 GMT+02:00 Tom Parker :
>
>> Has anyone seen this?  Do you know what might be causing the flapping?
>>
>> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
>> /dev/mapper/qa-xen-sbd
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device
>> /dev/mapper/qa-xen-sbd
>> uuid: ae835596-3d26-4681-ba40-206b4d51149b
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected,
>> AIS
>> quorum check enabled
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
>> cluster ...
>> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
>> /dev/watchdog
>> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
>> seconds.
>> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
>> cluster ...
>> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right
>> now.
>> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
>> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
>> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node stat

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-23 Thread emmanuel segura
what do you mean with link?


2014-04-23 15:23 GMT+02:00 Tom Parker :

> ok.  I have fixed that to be no_path_retry fail but I don't think this
> has anything to do with the errors I am seeing.
>
> They seem to be related to sbd's link with my cluster, not with disk I/O
>
> Tom
>
> On 23/04/14 03:11 AM, emmanuel segura wrote:
> > the first thing, you are using no_path_retry in wrong way in your
> > multipath, try to read this
> > http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html
> >
> >
> > 2014-04-22 20:41 GMT+02:00 Tom Parker :
> >
> >> I have attached the config files to this e-mail.  The sbd dump is below
> >>
> >> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
> >> ==Dumping header on disk /dev/mapper/qa-xen-sbd
> >> Header version : 2.1
> >> UUID   : ae835596-3d26-4681-ba40-206b4d51149b
> >> Number of slots: 255
> >> Sector size: 512
> >> Timeout (watchdog) : 45
> >> Timeout (allocate) : 2
> >> Timeout (loop) : 1
> >> Timeout (msgwait)  : 90
> >> ==Header on disk /dev/mapper/qa-xen-sbd is dumped
> >>
> >> On 22/04/14 02:30 PM, emmanuel segura wrote:
> >>> you are missingo cluster configuration and sbd configuration and
> >> multipath
> >>> config
> >>>
> >>>
> >>> 2014-04-22 20:21 GMT+02:00 Tom Parker :
> >>>
>  Has anyone seen this?  Do you know what might be causing the flapping?
> 
>  Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
>  Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
>  /dev/mapper/qa-xen-sbd
>  Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
>  Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device
> /dev/mapper/qa-xen-sbd
>  uuid: ae835596-3d26-4681-ba40-206b4d51149b
>  Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected,
> AIS
>  quorum check enabled
>  Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
>  cluster ...
>  Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
>  /dev/watchdog
>  Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
>  seconds.
>  Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
>  cluster ...
>  Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right
> now.
>  Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
>  Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>  Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
>  Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>  Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>  UNHEALTHY
>  Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>  Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>  Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>  UNHEALTHY
>  Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>  Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>  Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>  UNHEALTHY
>  Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>  Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>  Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>  UNHEALTHY
>  Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>  Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>  Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>  UNHEALTHY
>  Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>  Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>  Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>  UNHEALTHY
>  Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>  Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>  Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>  UNHEALTHY
>  Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
>  Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>  Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>  Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> 

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-23 Thread Tom Parker
ok.  I have fixed that to be no_path_retry fail but I don't think this
has anything to do with the errors I am seeing. 

They seem to be related to sbd's link with my cluster, not with disk I/O

Tom

On 23/04/14 03:11 AM, emmanuel segura wrote:
> the first thing, you are using no_path_retry in wrong way in your
> multipath, try to read this
> http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html
>
>
> 2014-04-22 20:41 GMT+02:00 Tom Parker :
>
>> I have attached the config files to this e-mail.  The sbd dump is below
>>
>> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
>> ==Dumping header on disk /dev/mapper/qa-xen-sbd
>> Header version : 2.1
>> UUID   : ae835596-3d26-4681-ba40-206b4d51149b
>> Number of slots: 255
>> Sector size: 512
>> Timeout (watchdog) : 45
>> Timeout (allocate) : 2
>> Timeout (loop) : 1
>> Timeout (msgwait)  : 90
>> ==Header on disk /dev/mapper/qa-xen-sbd is dumped
>>
>> On 22/04/14 02:30 PM, emmanuel segura wrote:
>>> you are missingo cluster configuration and sbd configuration and
>> multipath
>>> config
>>>
>>>
>>> 2014-04-22 20:21 GMT+02:00 Tom Parker :
>>>
 Has anyone seen this?  Do you know what might be causing the flapping?

 Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
 Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
 /dev/mapper/qa-xen-sbd
 Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
 Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
 uuid: ae835596-3d26-4681-ba40-206b4d51149b
 Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
 quorum check enabled
 Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
 cluster ...
 Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
 /dev/watchdog
 Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
 seconds.
 Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
 cluster ...
 Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
 Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
 Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
 Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
 Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
 Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
 UNHEALTHY
 Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
 Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
 UNHEALTHY
 Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
 Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
 UNHEALTHY
 Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
 Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
 UNHEALTHY
 Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
 Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
 UNHEALTHY
 Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
 Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
 UNHEALTHY
 Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
 Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
 UNHEALTHY
 Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
 Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
 UNHEALTHY
 Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
 Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
 Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
 Apr 22 13:09:12 qaxen6 sbd: [12971]: WA

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-23 Thread emmanuel segura
the first thing, you are using no_path_retry in wrong way in your
multipath, try to read this
http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html


2014-04-22 20:41 GMT+02:00 Tom Parker :

> I have attached the config files to this e-mail.  The sbd dump is below
>
> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
> ==Dumping header on disk /dev/mapper/qa-xen-sbd
> Header version : 2.1
> UUID   : ae835596-3d26-4681-ba40-206b4d51149b
> Number of slots: 255
> Sector size: 512
> Timeout (watchdog) : 45
> Timeout (allocate) : 2
> Timeout (loop) : 1
> Timeout (msgwait)  : 90
> ==Header on disk /dev/mapper/qa-xen-sbd is dumped
>
> On 22/04/14 02:30 PM, emmanuel segura wrote:
> > you are missingo cluster configuration and sbd configuration and
> multipath
> > config
> >
> >
> > 2014-04-22 20:21 GMT+02:00 Tom Parker :
> >
> >> Has anyone seen this?  Do you know what might be causing the flapping?
> >>
> >> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
> >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
> >> /dev/mapper/qa-xen-sbd
> >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
> >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
> >> uuid: ae835596-3d26-4681-ba40-206b4d51149b
> >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
> >> quorum check enabled
> >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
> >> cluster ...
> >> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
> >> /dev/watchdog
> >> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
> >> seconds.
> >> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
> >> cluster ...
> >> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
> >> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
> >> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
> >> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 13:31

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-22 Thread Tom Parker
I have attached the config files to this e-mail.  The sbd dump is below

[LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
==Dumping header on disk /dev/mapper/qa-xen-sbd
Header version : 2.1
UUID   : ae835596-3d26-4681-ba40-206b4d51149b
Number of slots: 255
Sector size: 512
Timeout (watchdog) : 45
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait)  : 90
==Header on disk /dev/mapper/qa-xen-sbd is dumped

On 22/04/14 02:30 PM, emmanuel segura wrote:
> you are missingo cluster configuration and sbd configuration and multipath
> config
>
>
> 2014-04-22 20:21 GMT+02:00 Tom Parker :
>
>> Has anyone seen this?  Do you know what might be causing the flapping?
>>
>> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
>> /dev/mapper/qa-xen-sbd
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
>> uuid: ae835596-3d26-4681-ba40-206b4d51149b
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
>> quorum check enabled
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
>> cluster ...
>> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
>> /dev/watchdog
>> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
>> seconds.
>> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
>> cluster ...
>> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
>> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
>> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
>> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 13:33:01 qaxen6 sbd: [12974

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-22 Thread emmanuel segura
you are missingo cluster configuration and sbd configuration and multipath
config


2014-04-22 20:21 GMT+02:00 Tom Parker :

> Has anyone seen this?  Do you know what might be causing the flapping?
>
> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
> /dev/mapper/qa-xen-sbd
> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
> uuid: ae835596-3d26-4681-ba40-206b4d51149b
> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
> quorum check enabled
> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
> cluster ...
> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
> /dev/watchdog
> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
> seconds.
> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
> cluster ...
> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node st