Hi,
we have a two node HA cluster using SuSE SlES 11 HA Extension SP3,
latest release value.
A resource (xen) was manually stopped, the shutdown_timeout is 120s
but after 60s the node was fenced and shut down by the other node.
should I change some timeout value ?
This is a part of our configuration:
...
primitive fkflmw ocf:heartbeat:Xen \
meta target-role="Started" is-managed="true" allow-migrate="true" \
op monitor interval="10" timeout="30" \
op migrate_from interval="0" timeout="600" \
op migrate_to interval="0" timeout="600" \
params xmfile="/etc/xen/vm/fkflmw" shutdown_timeout="120"
...
...
property $id="cib-bootstrap-options" \
dc-version="1.1.10-f3eeaf4" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
last-lrm-refresh="1394533475" \
default-action-timeout="60s"
rsc_defaults $id="rsc_defaults-options" \
resource-stickiness="10" \
migration-threshold="3"
we had this scenario:
on Node ha2infra:
Mar 12 11:59:59 ha2infra pengine[25631]: notice: LogActions: Stop
fkflmw (ha2infra) <--------------- Resource fkflmw was stopped
manually
Mar 12 11:59:59 ha2infra pengine[25631]: notice: process_pe_message:
Calculated Transition 105: /var/lib/pacemaker/pengine/pe-input-519.bz2
Mar 12 11:59:59 ha2infra crmd[25632]: notice: do_te_invoke:
Processing graph 105 (ref=pe_calc-dc-1394621999-178) derived from
/var/lib/pacemaker/pengine/pe-input-519.bz2
Mar 12 11:59:59 ha2infra crmd[25632]: notice: te_rsc_command:
Initiating action 60: stop fkflmw_stop_0 on ha2infra (local)
Mar 12 11:59:59 ha2infra Xen(fkflmw)[22718]: INFO: Xen domain fkflmw
will be stopped (timeout: 120s) <--------------- stopping fkflmw
Mar 12 12:00:00 ha2infra mgmtd: [25633]: info: CIB query: cib
Mar 12 12:00:00 ha2infra mgmtd: [25633]: info: CIB query: cib
Mar 12 12:00:59 ha2infra sshd[24992]: Connection closed by
134.105.232.21 [preauth]
Mar 12 12:00:59 ha2infra lrmd[25629]: warning:
child_timeout_callback: fkflmw_stop_0 process (PID 22718) timed out
Mar 12 12:00:59 ha2infra lrmd[25629]: warning: operation_finished:
fkflmw_stop_0:22718 - timed out after 60000ms <--------------- Stop
timed out after 60s (not 120s)
Mar 12 12:00:59 ha2infra crmd[25632]: error: process_lrm_event: LRM
operation fkflmw_stop_0 (136) Timed Out (timeout=60000ms)
Mar 12 12:00:59 ha2infra crmd[25632]: warning: status_from_rc: Action
60 (fkflmw_stop_0) on ha2infra failed (target: 0 vs. rc: 1): Error
Mar 12 12:00:59 ha2infra pengine[25631]: warning:
unpack_rsc_op_failure: Processing failed op stop for fkflmw on
ha2infra: unknown error (1)
Mar 12 12:00:59 ha2infra pengine[25631]: warning: pe_fence_node: Node
ha2infra will be fenced because of resource failure(s)
<--------------- is this normal ?
Mar 12 12:00:59 ha2infra pengine[25631]: warning: stage6: Scheduling
Node ha2infra for STONITH
Node ha1infra:
Mar 12 12:00:59 ha1infra stonith-ng[21808]: notice:
can_fence_host_with_device: stonith_1 can fence ha2infra: dynamic-list
Mar 12 12:01:01 ha1infra stonith-ng[21808]: notice: log_operation:
Operation 'reboot' [23984] (call 2 from crmd.25632) for host
'ha2infra' with device 'stonith_1' returned: 0 (OK)
Mar 12 12:01:05 ha1infra corosync[21794]: [TOTEM ] A processor
failed, forming new configuration.
Karl Roessmann
--
Karl Rößmann Tel. +49-711-689-1657
Max-Planck-Institut FKF Fax. +49-711-689-1632
Postfach 800 665
70506 Stuttgart email k.roessm...@fkf.mpg.de
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org