Please try: # crm resource cleanup WebFS
This will fix if resource's fail-count reached INFINITY. Rgds, Michael On 2010/7/22 下午 03:29, Proskurin Kirill wrote: > Hello all. > > I really new to Pacemaker and try to make some test and learn how it is > all works. I use Clusters From Scratch pdf from clusterlabs.org as how-to. > > What we have: > Debian Lenny 5.0.5 (with kernel 2.6.32-bpo.4-amd64 from backports) > pacemaker 1.0.8+hg15494-4~bpo50+1 > openais 1.1.2-2~bpo50+1 > > > Problem: > I try to add fs mount resource but get unknown error. If I mount it by > hands - all is ok. > > crm_mon: > > ============ > Last updated: Thu Jul 22 08:22:20 2010 > Stack: openais > Current DC: node01.domain.org - partition with quorum > Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd > 2 Nodes configured, 2 expected votes > 4 Resources configured. > ============ > > Online: [ node02.domain.org node01.domain.org ] > > ClusterIP (ocf::heartbeat:IPaddr2): Started node02.domain.org > Master/Slave Set: WebData > Masters: [ node02.domain.org ] > Slaves: [ node01.domain.org ] > WebFS (ocf::heartbeat:Filesystem): Started node02.domain.org FAILED > > Failed actions: > WebFS_start_0 (node=node01.domain.org, call=18, rc=1, > status=complete): unknown error > WebFS_start_0 (node=node02.domain.org, call=301, rc=1, > status=complete): unknown error > > node01:~# crm_verify -VL > crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing > failed op WebFS_start_0 on node01.domain.org: unknown error (1) > crm_verify[1482]: 2010/07/22_08:28:13 WARN: unpack_rsc_op: Processing > failed op WebFS_start_0 on node02.domain.org: unknown error (1) > crm_verify[1482]: 2010/07/22_08:28:13 WARN: common_apply_stickiness: > Forcing WebFS away from node01.domain.org after 1000000 failures > (max=1000000) > > > node01:~# crm configure show > node node01.domain.org > node node02.domain.org > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="192.168.1.100" cidr_netmask="32" \ > op monitor interval="30s" > primitive WebFS ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/var/spool/dovecot" > fstype="ext4" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" \ > meta target-role="Started" > primitive WebSite ocf:heartbeat:apache \ > params configfile="/etc/apache2/apache2.conf" \ > op monitor interval="1min" \ > op start interval="0" timeout="40s" \ > op stop interval="0" timeout="60s" \ > meta target-role="Started" > primitive wwwdrbd ocf:linbit:drbd \ > params drbd_resource="drbd0" \ > op monitor interval="60s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" > ms WebData wwwdrbd \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Started" > colocation WebSite-with-WebFS inf: WebSite WebFS > colocation fs_on_drbd inf: WebFS WebData:Master > colocation website-with-ip inf: WebSite ClusterIP > order WebFS-after-WebData inf: WebData:promote WebFS:start > order WebSite-after-WebFS inf: WebFS WebSite > order apache-after-ip inf: ClusterIP WebSite > property $id="cib-bootstrap-options" \ > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > last-lrm-refresh="1279717510" > > > In logs: > Jul 22 08:18:39 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:39 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:39 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:39 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:40 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:40 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:40 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:40 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:41 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:41 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:41 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:41 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:42 node01 cibadmin: [1199]: info: Invoked: cibadmin -Ql -o > resources > Jul 22 08:18:42 node01 cibadmin: [1200]: info: Invoked: cibadmin -p -R > -o resources > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > <cib admin_epoch="0" epoch="143" num_updates="2" > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > <configuration > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > <resources > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > <primitive id="WebFS" > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > <meta_attributes id="WebFS-meta_attributes" > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > <nvpair value="Stopped" id="WebFS-meta_attributes-target-role" /> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > </meta_attributes> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > </primitive> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > </resources> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > </configuration> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: - > </cib> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > <cib admin_epoch="0" epoch="144" num_updates="1" > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > <configuration > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > <resources > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > <primitive id="WebFS" > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > <meta_attributes id="WebFS-meta_attributes" > > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > <nvpair value="Started" id="WebFS-meta_attributes-target-role" /> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > </meta_attributes> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > </primitive> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > </resources> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > </configuration> > Jul 22 08:18:42 node01 cib: [1810]: info: log_data_element: cib:diff: + > </cib> > Jul 22 08:18:42 node01 cib: [1810]: info: cib_process_request: Operation > complete: op cib_replace for section resources (origin=local/cibadmin/2, > version=0.144.1): ok (rc=0) > Jul 22 08:18:42 node01 cib: [1201]: info: write_cib_contents: Archived > previous version as /var/lib/heartbeat/crm/cib-89.raw > Jul 22 08:18:42 node01 cib: [1201]: info: write_cib_contents: Wrote > version 0.144.0 of the CIB to disk (digest: > 5f51a15c21330c7ff76862ad9a5193b1) > Jul 22 08:18:42 node01 cib: [1201]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.woPqNQ (digest: > /var/lib/heartbeat/crm/cib.bF43Zi) > Jul 22 08:18:42 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:42 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:42 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:42 node01 crmd: [1814]: info: abort_transition_graph: > need_abort:59 - Triggered transition abort (complete=1) : Non-status change > Jul 22 08:18:42 node01 crmd: [1814]: info: need_abort: Aborting on > change to admin_epoch > Jul 22 08:18:42 node01 crmd: [1814]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Jul 22 08:18:42 node01 crmd: [1814]: info: do_state_transition: All 2 > cluster nodes are eligible to run resources. > Jul 22 08:18:42 node01 crmd: [1814]: info: do_pe_invoke: Query 350: > Requesting the current CIB: S_POLICY_ENGINE > Jul 22 08:18:42 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:43 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:43 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:43 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:43 node01 crmd: [1814]: info: do_pe_invoke_callback: > Invoking the PE: query=350, ref=pe_calc-dc-1279783123-729, seq=152, > quorate=1 > Jul 22 08:18:43 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:43 node01 pengine: [1813]: info: unpack_config: Node > scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > Jul 22 08:18:43 node01 pengine: [1813]: info: determine_online_status: > Node node01.domain.org is online > Jul 22 08:18:43 node01 pengine: [1813]: notice: unpack_rsc_op: Operation > WebSite_monitor_0 found resource WebSite active on node01.domain.org > Jul 22 08:18:43 node01 pengine: [1813]: WARN: unpack_rsc_op: Processing > failed op WebFS_start_0 on node01.domain.org: unknown error (1) > Jul 22 08:18:43 node01 pengine: [1813]: info: determine_online_status: > Node node02.domain.org is online > Jul 22 08:18:43 node01 pengine: [1813]: notice: unpack_rsc_op: Operation > WebSite_monitor_0 found resource WebSite active on node02.domain.org > Jul 22 08:18:43 node01 pengine: [1813]: WARN: unpack_rsc_op: Processing > failed op WebFS_start_0 on node02.domain.org: unknown error (1) > Jul 22 08:18:43 node01 pengine: [1813]: notice: native_print: > ClusterIP#011(ocf::heartbeat:IPaddr2):#011Started node02.domain.org > Jul 22 08:18:43 node01 pengine: [1813]: notice: native_print: > WebSite#011(ocf::heartbeat:apache):#011Stopped > Jul 22 08:18:43 node01 pengine: [1813]: notice: clone_print: > Master/Slave Set: WebData > Jul 22 08:18:43 node01 pengine: [1813]: notice: short_print: Masters: [ > node02.domain.org ] > Jul 22 08:18:43 node01 pengine: [1813]: notice: short_print: Slaves: [ > node01.domain.org ] > Jul 22 08:18:43 node01 pengine: [1813]: notice: native_print: > WebFS#011(ocf::heartbeat:Filesystem):#011Stopped > Jul 22 08:18:43 node01 pengine: [1813]: info: get_failcount: WebFS has > failed 1000000 times on node01.domain.org > Jul 22 08:18:43 node01 pengine: [1813]: WARN: common_apply_stickiness: > Forcing WebFS away from node01.domain.org after 1000000 failures > (max=1000000) > Jul 22 08:18:43 node01 pengine: [1813]: info: native_merge_weights: > WebData: Rolling back scores from WebFS > Jul 22 08:18:43 node01 pengine: [1813]: info: native_merge_weights: > wwwdrbd:0: Rolling back scores from WebFS > Jul 22 08:18:43 node01 pengine: [1813]: info: native_merge_weights: > WebData: Rolling back scores from WebFS > Jul 22 08:18:43 node01 pengine: [1813]: info: master_color: Promoting > wwwdrbd:0 (Master node02.domain.org) > Jul 22 08:18:43 node01 pengine: [1813]: info: master_color: WebData: > Promoted 1 instances of a possible 1 to master > Jul 22 08:18:43 node01 pengine: [1813]: info: master_color: Promoting > wwwdrbd:0 (Master node02.domain.org) > Jul 22 08:18:43 node01 pengine: [1813]: info: master_color: WebData: > Promoted 1 instances of a possible 1 to master > Jul 22 08:18:43 node01 pengine: [1813]: notice: RecurringOp: Start > recurring monitor (60s) for WebSite on node02.domain.org > Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Leave > resource ClusterIP#011(Started node02.domain.org) > Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Start > WebSite#011(node02.domain.org) > Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Leave > resource wwwdrbd:0#011(Master node02.domain.org) > Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Leave > resource wwwdrbd:1#011(Slave node01.domain.org) > Jul 22 08:18:43 node01 pengine: [1813]: notice: LogActions: Start > WebFS#011(node02.domain.org) > Jul 22 08:18:43 node01 pengine: [1813]: info: process_pe_message: > Transition 199: PEngine Input stored in: /var/lib/pengine/pe-input-243.bz2 > Jul 22 08:18:44 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:44 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:44 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:44 node01 crmd: [1814]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Jul 22 08:18:44 node01 crmd: [1814]: info: unpack_graph: Unpacked > transition 199: 4 actions in 4 synapses > Jul 22 08:18:44 node01 crmd: [1814]: info: do_te_invoke: Processing > graph 199 (ref=pe_calc-dc-1279783123-729) derived from > /var/lib/pengine/pe-input-243.bz2 > Jul 22 08:18:44 node01 crmd: [1814]: info: te_rsc_command: Initiating > action 42: start WebFS_start_0 on node02.domain.org > Jul 22 08:18:44 node01 crmd: [1814]: info: te_rsc_command: Initiating > action 5: probe_complete probe_complete on node02.domain.org - no waiting > Jul 22 08:18:44 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:45 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:45 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:45 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:45 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:46 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:46 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:46 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:46 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > Jul 22 08:18:47 node01 crmd: [1814]: ERROR: stonithd_signon: Can't > initiate connection to stonithd > Jul 22 08:18:47 node01 crmd: [1814]: notice: Not currently connected. > Jul 22 08:18:47 node01 crmd: [1814]: ERROR: te_connect_stonith: Sign-in > failed: triggered a retry > Jul 22 08:18:47 node01 crmd: [1814]: info: te_connect_stonith: > Attempting connection to fencing daemon... > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker