This is probably because cman (which is it's own cluster stack and used
to provide DLM and quorum to pacemaker on EL6) detected the node failed
after the initial fence and called it's own fence. You see a similar
behaviour when using DRBD. It will also call a fence when the peer dies
(even when it died because of a controlled fence call). In theory,
pacemaker using cman's dlm with DRBD would trigger three fences per
failure. :)
digimer
On 01/01/14 12:04 AM, emmanuel segura wrote:
maybe you missing log when you had fenced the node? because i think the
clvmd hungup because your node are in unclean state, use dlm_tool ls to
see if you any pending fencing operation.
2014/1/1 Bob Haxo <bh...@sgi.com <mailto:bh...@sgi.com>>
__
Greetings ... Happy New Year!
I am testing a configuration that is created from example in
"Chapter 6. Configuring a GFS2 File System in a Cluster" of the "Red
Hat Enterprise Linux 7.0 Beta Global File System 2" document. Only
addition is stonith:fence_ipmilan. After encountering this issue
when I configured with "crm", I re-configured using "pcs". I've
included the configuration below.
I'm thinking that, in a 2-node cluster, if I run "stonith_admin -F
<peer-node>", then <peer-node> should reboot and cleanly rejoin the
cluster. This is not happening.
What ultimately happens is that after the initially fenced node
reboots, the system from which the stonith_admin -F command was run
is fenced and reboots. The fencing stops there, leaving the cluster
in an appropriate state.
The issue seems to reside with clvmd/lvm. With the reboot of the
initially fenced node, the clvmd resource fails on the surviving
node, with a maximum of errors. I hypothesize there is an issue
with locks, but have insufficient knowledge of clvmd/lvm locks to
prove or disprove this hypothesis.
Have I missed something ...
1) Is this expected behavior, and always the reboot of the fencing
node happens?
2) Or, maybe I didn't correctly duplicate the Chapter 6 example?
3) Or, perhaps something is wrong or omitted from the Chapter 6 example?
Suggestions will be much appreciated.
Thanks,
Bob Haxo
RHEL6.5
pacemaker-cli-1.1.10-14.el6_5.1.x86_64
crmsh-1.2.5-55.1sgi709r3.rhel6.x86_64
pacemaker-libs-1.1.10-14.el6_5.1.x86_64
cman-3.0.12.1-59.el6_5.1.x86_64
pacemaker-1.1.10-14.el6_5.1.x86_64
corosynclib-1.4.1-17.el6.x86_64
corosync-1.4.1-17.el6.x86_64
pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
Cluster Name: mici
Corosync Nodes:
Pacemaker Nodes:
mici-admin mici-admin2
Resources:
Clone: clusterfs-clone
Meta Attrs: interleave=true target-role=Started
Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/vgha2/lv_clust2 directory=/images
fstype=gfs2 options=defaults,noatime,nodiratime
Operations: monitor on-fail=fence interval=30s
(clusterfs-monitor-interval-30s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true target-role=Started
Resource: clvmd (class=lsb type=clvmd)
Operations: monitor on-fail=fence interval=30s
(clvmd-monitor-interval-30s)
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: monitor on-fail=fence interval=30s
(dlm-monitor-interval-30s)
Stonith Devices:
Resource: p_ipmi_fencing_1 (class=stonith type=fence_ipmilan)
Attributes: ipaddr=128.##.##.78 login=XXXXX passwd=XXXXX
lanplus=1 action=reboot pcmk_host_check=static-list
pcmk_host_list=mici-admin
Meta Attrs: target-role=Started
Operations: monitor start-delay=30 interval=60s timeout=30
(p_ipmi_fencing_1-monitor-60s)
Resource: p_ipmi_fencing_2 (class=stonith type=fence_ipmilan)
Attributes: ipaddr=128.##.##.220 login=XXXXX passwd=XXXXX
lanplus=1 action=reboot pcmk_host_check=static-list
pcmk_host_list=mici-admin2
Meta Attrs: target-role=Started
Operations: monitor start-delay=30 interval=60s timeout=30
(p_ipmi_fencing_2-monitor-60s)
Fencing Levels:
Location Constraints:
Resource: p_ipmi_fencing_1
Disabled on: mici-admin (score:-INFINITY)
(id:location-p_ipmi_fencing_1-mici-admin--INFINITY)
Resource: p_ipmi_fencing_2
Disabled on: mici-admin2 (score:-INFINITY)
(id:location-p_ipmi_fencing_2-mici-admin2--INFINITY)
Ordering Constraints:
start dlm-clone then start clvmd-clone (Mandatory)
(id:order-dlm-clone-clvmd-clone-mandatory)
start clvmd-clone then start clusterfs-clone (Mandatory)
(id:order-clvmd-clone-clusterfs-clone-mandatory)
Colocation Constraints:
clusterfs-clone with clvmd-clone (INFINITY)
(id:colocation-clusterfs-clone-clvmd-clone-INFINITY)
clvmd-clone with dlm-clone (INFINITY)
(id:colocation-clvmd-clone-dlm-clone-INFINITY)
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.10-14.el6_5.1-368c726
last-lrm-refresh: 1388530552
no-quorum-policy: ignore
stonith-enabled: true
Node Attributes:
mici-admin: standby=off
mici-admin2: standby=off
Last updated: Tue Dec 31 17:15:55 2013
Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin
Stack: cman
Current DC: mici-admin2 - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
8 Resources configured
Online: [ mici-admin mici-admin2 ]
Full list of resources:
p_ipmi_fencing_1 (stonith:fence_ipmilan): Started
mici-admin2
p_ipmi_fencing_2 (stonith:fence_ipmilan): Started
mici-admin
Clone Set: clusterfs-clone [clusterfs]
Started: [ mici-admin mici-admin2 ]
Clone Set: clvmd-clone [clvmd]
Started: [ mici-admin mici-admin2 ]
Clone Set: dlm-clone [dlm]
Started: [ mici-admin mici-admin2 ]
Migration summary:
* Node mici-admin:
* Node mici-admin2:
=====================================================
crm_mon after the fenced node reboots. Shows the failure of clvmd
that then
occurs, which in turn triggers a fencing of that nnode
Last updated: Tue Dec 31 17:06:55 2013
Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin
Stack: cman
Current DC: mici-admin - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
8 Resources configured
Node mici-admin: UNCLEAN (online)
Online: [ mici-admin2 ]
Full list of resources:
p_ipmi_fencing_1 (stonith:fence_ipmilan): Stopped
p_ipmi_fencing_2 (stonith:fence_ipmilan): Started
mici-admin
Clone Set: clusterfs-clone [clusterfs]
Started: [ mici-admin ]
Stopped: [ mici-admin2 ]
Clone Set: clvmd-clone [clvmd]
clvmd (lsb:clvmd): FAILED mici-admin
Stopped: [ mici-admin2 ]
Clone Set: dlm-clone [dlm]
Started: [ mici-admin mici-admin2 ]
Migration summary:
* Node mici-admin:
clvmd: migration-threshold=1000000 fail-count=1
last-failure='Tue Dec 31 17:04:29 2013'
* Node mici-admin2:
Failed actions:
clvmd_monitor_30000 on mici-admin 'unknown error' (1): call=60,
status=Timed Out, la
st-rc-change='Tue Dec 31 17:04:29 2013', queued=0ms, exec=0ms
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
<mailto:Pacemaker@oss.clusterlabs.org>
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org