Hi, I have a simple active/passive Corosync/DRBD/XFS/NFS cluster (see http://thread.gmane.org/gmane.linux.highavailability.pacemaker/4672 for config details).
I have made some more failover tests, and I see some errors in one case that I would like to share. Initial situation: Node 1 is DRBD master, XFS mounted, NFS started Node 2 is DRBD slave, Pacemaker DC If I power off node 2, I get the following logs (filtered on drbd/fencing): Feb 22 11:03:20 tnfsa kernel: block drbd0: PingAck did not arrive in time. Feb 22 11:03:20 tnfsa kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Feb 22 11:03:20 tnfsa kernel: block drbd0: asender terminated Feb 22 11:03:20 tnfsa kernel: block drbd0: Terminating asender thread Feb 22 11:03:20 tnfsa kernel: block drbd0: short read expecting header on sock: r=-512 Feb 22 11:03:20 tnfsa kernel: block drbd0: Creating new current UUID Feb 22 11:03:20 tnfsa kernel: block drbd0: Connection closed Feb 22 11:03:20 tnfsa kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 Feb 22 11:03:20 tnfsa crm-fence-peer.sh[11711]: invoked for nfs Feb 22 11:03:50 tnfsa crm-fence-peer.sh[11711]: Call cib_create failed (-41): Remote node did not respond Feb 22 11:03:50 tnfsa crm-fence-peer.sh[11711]: <null> Feb 22 11:03:50 tnfsa crm-fence-peer.sh[11711]: WARNING could not place the constraint! Feb 22 11:03:50 tnfsa kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 1 (0x100) Feb 22 11:03:50 tnfsa kernel: block drbd0: fence-peer helper broken, returned 1 Feb 22 11:03:50 tnfsa kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Feb 22 11:03:50 tnfsa kernel: block drbd0: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown r--- } Feb 22 11:03:50 tnfsa kernel: block drbd0: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown r--- } Feb 22 11:03:50 tnfsa kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Feb 22 11:03:50 tnfsa kernel: block drbd0: receiver terminated Feb 22 11:03:50 tnfsa kernel: block drbd0: Restarting receiver thread Feb 22 11:03:50 tnfsa kernel: block drbd0: receiver (re)started Feb 22 11:03:50 tnfsa kernel: block drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' Feb 22 11:03:50 tnfsa kernel: block drbd0: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown r--- } Feb 22 11:03:50 tnfsa kernel: block drbd0: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown r--- } Feb 22 11:03:50 tnfsa kernel: block drbd0: conn( Unconnected -> WFConnection ) If I later check the config, I can see that the fencing 'location' constraint isn't there. I am not sure it is a big deal, but wanted to share and have you insight. This only happens when I power off the current DC, and a constraint has to be placed by the surviving node. Thanks a ton, - Patrick - ************************************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. postmas...@navixia.com ************************************************************************************** _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker