[Pacemaker] fencing question

Karl Rößmann Wed, 12 Mar 2014 07:25:36 -0700

Hi,

we have a two node HA cluster using SuSE SlES 11 HA Extension SP3,
latest release value.
A resource (xen) was manually stopped, the shutdown_timeout is 120s
but after 60s the node was fenced and shut down by the other node.


should I change some timeout value ?

This is a part of our configuration:
...
primitive fkflmw ocf:heartbeat:Xen \
        meta target-role="Started" is-managed="true" allow-migrate="true" \
        op monitor interval="10" timeout="30" \
        op migrate_from interval="0" timeout="600" \
        op migrate_to interval="0" timeout="600" \
        params xmfile="/etc/xen/vm/fkflmw" shutdown_timeout="120"
...
...
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-f3eeaf4" \
        cluster-infrastructure="classic openais (with plugin)" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1394533475" \
        default-action-timeout="60s"
rsc_defaults $id="rsc_defaults-options" \
        resource-stickiness="10" \
        migration-threshold="3"


we had this scenario:

on Node ha2infra:

Mar 12 11:59:59 ha2infra pengine[25631]: notice: LogActions: Stopfkflmw (ha2infra) <--------------- Resource fkflmw was stoppedmanuallyMar 12 11:59:59 ha2infra pengine[25631]: notice: process_pe_message:Calculated Transition 105: /var/lib/pacemaker/pengine/pe-input-519.bz2Mar 12 11:59:59 ha2infra crmd[25632]: notice: do_te_invoke:Processing graph 105 (ref=pe_calc-dc-1394621999-178) derived from/var/lib/pacemaker/pengine/pe-input-519.bz2Mar 12 11:59:59 ha2infra crmd[25632]: notice: te_rsc_command:Initiating action 60: stop fkflmw_stop_0 on ha2infra (local)Mar 12 11:59:59 ha2infra Xen(fkflmw)[22718]: INFO: Xen domain fkflmwwill be stopped (timeout: 120s) <--------------- stopping fkflmw

Mar 12 12:00:00 ha2infra mgmtd: [25633]: info: CIB query: cib
Mar 12 12:00:00 ha2infra mgmtd: [25633]: info: CIB query: cib

Mar 12 12:00:59 ha2infra sshd[24992]: Connection closed by134.105.232.21 [preauth]Mar 12 12:00:59 ha2infra lrmd[25629]: warning:child_timeout_callback: fkflmw_stop_0 process (PID 22718) timed outMar 12 12:00:59 ha2infra lrmd[25629]: warning: operation_finished:fkflmw_stop_0:22718 - timed out after 60000ms <--------------- Stoptimed out after 60s (not 120s)Mar 12 12:00:59 ha2infra crmd[25632]: error: process_lrm_event: LRMoperation fkflmw_stop_0 (136) Timed Out (timeout=60000ms)Mar 12 12:00:59 ha2infra crmd[25632]: warning: status_from_rc: Action60 (fkflmw_stop_0) on ha2infra failed (target: 0 vs. rc: 1): Error

Mar 12 12:00:59 ha2infra pengine[25631]: warning:unpack_rsc_op_failure: Processing failed op stop for fkflmw onha2infra: unknown error (1)Mar 12 12:00:59 ha2infra pengine[25631]: warning: pe_fence_node: Nodeha2infra will be fenced because of resource failure(s)<--------------- is this normal ?Mar 12 12:00:59 ha2infra pengine[25631]: warning: stage6: SchedulingNode ha2infra for STONITH


Node ha1infra:

Mar 12 12:00:59 ha1infra stonith-ng[21808]: notice:can_fence_host_with_device: stonith_1 can fence ha2infra: dynamic-listMar 12 12:01:01 ha1infra stonith-ng[21808]: notice: log_operation:Operation 'reboot' [23984] (call 2 from crmd.25632) for host'ha2infra' with device 'stonith_1' returned: 0 (OK)Mar 12 12:01:05 ha1infra corosync[21794]: [TOTEM ] A processorfailed, forming new configuration.





Karl Roessmann
--
Karl Rößmann                            Tel. +49-711-689-1657
Max-Planck-Institut FKF                 Fax. +49-711-689-1632
Postfach 800 665
70506 Stuttgart                         email k.roessm...@fkf.mpg.de


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] fencing question

Reply via email to