Re: [Pacemaker] "stonith_admin -F node" results in a pair of reboots

Digimer Tue, 31 Dec 2013 21:24:40 -0800

This is probably because cman (which is it's own cluster stack and usedto provide DLM and quorum to pacemaker on EL6) detected the node failedafter the initial fence and called it's own fence. You see a similarbehaviour when using DRBD. It will also call a fence when the peer dies(even when it died because of a controlled fence call). In theory,pacemaker using cman's dlm with DRBD would trigger three fences perfailure. :)


digimer


On 01/01/14 12:04 AM, emmanuel segura wrote:

maybe you missing log when you had fenced the node? because i think the
clvmd hungup because your node are in unclean state, use dlm_tool ls to
see if you any pending fencing operation.


2014/1/1 Bob Haxo <bh...@sgi.com <mailto:bh...@sgi.com>>

    __
    Greetings ... Happy New Year!

    I am testing a configuration that is created from example in
    "Chapter 6. Configuring a GFS2 File System in a Cluster" of the "Red
    Hat Enterprise Linux 7.0 Beta Global File System 2" document.  Only
    addition is stonith:fence_ipmilan.  After encountering this issue
    when I configured with "crm", I re-configured using "pcs". I've
    included the configuration below.

    I'm thinking that, in a 2-node cluster, if I run "stonith_admin -F
    <peer-node>", then <peer-node> should reboot and cleanly rejoin the
    cluster.  This is not happening.

    What ultimately happens is that after the initially fenced node
    reboots, the system from which the stonith_admin -F command was run
    is fenced and reboots. The fencing stops there, leaving the cluster
    in an appropriate state.

    The issue seems to reside with clvmd/lvm.  With the reboot of the
    initially fenced node, the clvmd resource fails on the surviving
    node, with a maximum of errors.  I hypothesize there is an issue
    with locks, but have insufficient knowledge of clvmd/lvm locks to
    prove or disprove this hypothesis.

    Have I missed something ...

    1) Is this expected behavior, and always the reboot of the fencing
    node happens?

    2) Or, maybe I didn't correctly duplicate the Chapter 6 example?

    3) Or, perhaps something is wrong or omitted from the Chapter 6 example?

    Suggestions will be much appreciated.

    Thanks,
    Bob Haxo

    RHEL6.5
    pacemaker-cli-1.1.10-14.el6_5.1.x86_64
    crmsh-1.2.5-55.1sgi709r3.rhel6.x86_64
    pacemaker-libs-1.1.10-14.el6_5.1.x86_64
    cman-3.0.12.1-59.el6_5.1.x86_64
    pacemaker-1.1.10-14.el6_5.1.x86_64
    corosynclib-1.4.1-17.el6.x86_64
    corosync-1.4.1-17.el6.x86_64
    pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64

    Cluster Name: mici
    Corosync Nodes:

    Pacemaker Nodes:
    mici-admin mici-admin2

    Resources:
    Clone: clusterfs-clone
       Meta Attrs: interleave=true target-role=Started
       Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem)
        Attributes: device=/dev/vgha2/lv_clust2 directory=/images
    fstype=gfs2 options=defaults,noatime,nodiratime
        Operations: monitor on-fail=fence interval=30s
    (clusterfs-monitor-interval-30s)
    Clone: clvmd-clone
       Meta Attrs: interleave=true ordered=true target-role=Started
       Resource: clvmd (class=lsb type=clvmd)
        Operations: monitor on-fail=fence interval=30s
    (clvmd-monitor-interval-30s)
    Clone: dlm-clone
       Meta Attrs: interleave=true ordered=true
       Resource: dlm (class=ocf provider=pacemaker type=controld)
        Operations: monitor on-fail=fence interval=30s
    (dlm-monitor-interval-30s)

    Stonith Devices:
    Resource: p_ipmi_fencing_1 (class=stonith type=fence_ipmilan)
       Attributes: ipaddr=128.##.##.78 login=XXXXX passwd=XXXXX
    lanplus=1 action=reboot pcmk_host_check=static-list
    pcmk_host_list=mici-admin
       Meta Attrs: target-role=Started
       Operations: monitor start-delay=30 interval=60s timeout=30
    (p_ipmi_fencing_1-monitor-60s)
    Resource: p_ipmi_fencing_2 (class=stonith type=fence_ipmilan)
       Attributes: ipaddr=128.##.##.220 login=XXXXX passwd=XXXXX
    lanplus=1 action=reboot pcmk_host_check=static-list
    pcmk_host_list=mici-admin2
       Meta Attrs: target-role=Started
       Operations: monitor start-delay=30 interval=60s timeout=30
    (p_ipmi_fencing_2-monitor-60s)
    Fencing Levels:

    Location Constraints:
       Resource: p_ipmi_fencing_1
         Disabled on: mici-admin (score:-INFINITY)
    (id:location-p_ipmi_fencing_1-mici-admin--INFINITY)
       Resource: p_ipmi_fencing_2
         Disabled on: mici-admin2 (score:-INFINITY)
    (id:location-p_ipmi_fencing_2-mici-admin2--INFINITY)
    Ordering Constraints:
       start dlm-clone then start clvmd-clone (Mandatory)
    (id:order-dlm-clone-clvmd-clone-mandatory)
       start clvmd-clone then start clusterfs-clone (Mandatory)
    (id:order-clvmd-clone-clusterfs-clone-mandatory)
    Colocation Constraints:
       clusterfs-clone with clvmd-clone (INFINITY)
    (id:colocation-clusterfs-clone-clvmd-clone-INFINITY)
       clvmd-clone with dlm-clone (INFINITY)
    (id:colocation-clvmd-clone-dlm-clone-INFINITY)

    Cluster Properties:
    cluster-infrastructure: cman
    dc-version: 1.1.10-14.el6_5.1-368c726
    last-lrm-refresh: 1388530552
    no-quorum-policy: ignore
    stonith-enabled: true
    Node Attributes:
    mici-admin: standby=off
    mici-admin2: standby=off


    Last updated: Tue Dec 31 17:15:55 2013
    Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin
    Stack: cman
    Current DC: mici-admin2 - partition with quorum
    Version: 1.1.10-14.el6_5.1-368c726
    2 Nodes configured
    8 Resources configured

    Online: [ mici-admin mici-admin2 ]

    Full list of resources:

    p_ipmi_fencing_1        (stonith:fence_ipmilan):        Started
    mici-admin2
    p_ipmi_fencing_2        (stonith:fence_ipmilan):        Started
    mici-admin
    Clone Set: clusterfs-clone [clusterfs]
          Started: [ mici-admin mici-admin2 ]
    Clone Set: clvmd-clone [clvmd]
          Started: [ mici-admin mici-admin2 ]
    Clone Set: dlm-clone [dlm]
          Started: [ mici-admin mici-admin2 ]

    Migration summary:
    * Node mici-admin:
    * Node mici-admin2:

    =====================================================
    crm_mon  after the fenced node reboots.  Shows the failure of clvmd
    that then
    occurs, which in turn triggers a fencing of that nnode

    Last updated: Tue Dec 31 17:06:55 2013
    Last change: Tue Dec 31 16:57:37 2013 via cibadmin on mici-admin
    Stack: cman
    Current DC: mici-admin - partition with quorum
    Version: 1.1.10-14.el6_5.1-368c726
    2 Nodes configured
    8 Resources configured

    Node mici-admin: UNCLEAN (online)
    Online: [ mici-admin2 ]

    Full list of resources:

    p_ipmi_fencing_1        (stonith:fence_ipmilan):        Stopped
    p_ipmi_fencing_2        (stonith:fence_ipmilan):        Started
    mici-admin
    Clone Set: clusterfs-clone [clusterfs]
          Started: [ mici-admin ]
          Stopped: [ mici-admin2 ]
    Clone Set: clvmd-clone [clvmd]
          clvmd      (lsb:clvmd):    FAILED mici-admin
          Stopped: [ mici-admin2 ]
    Clone Set: dlm-clone [dlm]
          Started: [ mici-admin mici-admin2 ]

    Migration summary:
    * Node mici-admin:
        clvmd: migration-threshold=1000000 fail-count=1
    last-failure='Tue Dec 31 17:04:29 2013'
    * Node mici-admin2:

    Failed actions:
         clvmd_monitor_30000 on mici-admin 'unknown error' (1): call=60,
    status=Timed Out, la
    st-rc-change='Tue Dec 31 17:04:29 2013', queued=0ms, exec=0ms








    _______________________________________________
    Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
    <mailto:Pacemaker@oss.clusterlabs.org>
    http://oss.clusterlabs.org/mailman/listinfo/pacemaker

    Project Home: http://www.clusterlabs.org
    Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
    Bugs: http://bugs.clusterlabs.org




--
esta es mi vida e me la vivo hasta que dios quiera


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] "stonith_admin -F node" results in a pair of reboots

Reply via email to