Hi,

we have a two node HA cluster using SuSE SlES 11 HA Extension SP3,
latest release value.
A resource (xen) was manually stopped, the shutdown_timeout is 120s
but after 60s the node was fenced and shut down by the other node.

should I change some timeout value ?

This is a part of our configuration:
...
primitive fkflmw ocf:heartbeat:Xen \
        meta target-role="Started" is-managed="true" allow-migrate="true" \
        op monitor interval="10" timeout="30" \
        op migrate_from interval="0" timeout="600" \
        op migrate_to interval="0" timeout="600" \
        params xmfile="/etc/xen/vm/fkflmw" shutdown_timeout="120"
...
...
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-f3eeaf4" \
        cluster-infrastructure="classic openais (with plugin)" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1394533475" \
        default-action-timeout="60s"
rsc_defaults $id="rsc_defaults-options" \
        resource-stickiness="10" \
        migration-threshold="3"


we had this scenario:

on Node ha2infra:

Mar 12 11:59:59 ha2infra pengine[25631]: notice: LogActions: Stop fkflmw (ha2infra) <--------------- Resource fkflmw was stopped manually Mar 12 11:59:59 ha2infra pengine[25631]: notice: process_pe_message: Calculated Transition 105: /var/lib/pacemaker/pengine/pe-input-519.bz2 Mar 12 11:59:59 ha2infra crmd[25632]: notice: do_te_invoke: Processing graph 105 (ref=pe_calc-dc-1394621999-178) derived from /var/lib/pacemaker/pengine/pe-input-519.bz2 Mar 12 11:59:59 ha2infra crmd[25632]: notice: te_rsc_command: Initiating action 60: stop fkflmw_stop_0 on ha2infra (local) Mar 12 11:59:59 ha2infra Xen(fkflmw)[22718]: INFO: Xen domain fkflmw will be stopped (timeout: 120s) <--------------- stopping fkflmw
Mar 12 12:00:00 ha2infra mgmtd: [25633]: info: CIB query: cib
Mar 12 12:00:00 ha2infra mgmtd: [25633]: info: CIB query: cib
Mar 12 12:00:59 ha2infra sshd[24992]: Connection closed by 134.105.232.21 [preauth] Mar 12 12:00:59 ha2infra lrmd[25629]: warning: child_timeout_callback: fkflmw_stop_0 process (PID 22718) timed out Mar 12 12:00:59 ha2infra lrmd[25629]: warning: operation_finished: fkflmw_stop_0:22718 - timed out after 60000ms <--------------- Stop timed out after 60s (not 120s) Mar 12 12:00:59 ha2infra crmd[25632]: error: process_lrm_event: LRM operation fkflmw_stop_0 (136) Timed Out (timeout=60000ms) Mar 12 12:00:59 ha2infra crmd[25632]: warning: status_from_rc: Action 60 (fkflmw_stop_0) on ha2infra failed (target: 0 vs. rc: 1): Error

Mar 12 12:00:59 ha2infra pengine[25631]: warning: unpack_rsc_op_failure: Processing failed op stop for fkflmw on ha2infra: unknown error (1) Mar 12 12:00:59 ha2infra pengine[25631]: warning: pe_fence_node: Node ha2infra will be fenced because of resource failure(s) <--------------- is this normal ? Mar 12 12:00:59 ha2infra pengine[25631]: warning: stage6: Scheduling Node ha2infra for STONITH

Node ha1infra:

Mar 12 12:00:59 ha1infra stonith-ng[21808]: notice: can_fence_host_with_device: stonith_1 can fence ha2infra: dynamic-list Mar 12 12:01:01 ha1infra stonith-ng[21808]: notice: log_operation: Operation 'reboot' [23984] (call 2 from crmd.25632) for host 'ha2infra' with device 'stonith_1' returned: 0 (OK) Mar 12 12:01:05 ha1infra corosync[21794]: [TOTEM ] A processor failed, forming new configuration.




Karl Roessmann
--
Karl Rößmann                            Tel. +49-711-689-1657
Max-Planck-Institut FKF                 Fax. +49-711-689-1632
Postfach 800 665
70506 Stuttgart                         email k.roessm...@fkf.mpg.de


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to