On Mon, Sep 15, 2008 at 7:34 AM, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> Your stonith agent modifications seem to have broken something as the > stonith agents are not able to start up (rc=6) under some conditions. Well, apparently something is very wrong. But what are these "some conditions"? Where can I find documentation for the said "condition"s? > > On Thu, Sep 11, 2008 at 19:50, Itay Donenhirsch <[EMAIL PROTECTED]> > wrote: > > Hi all, > > > > I have a problem that begins to wreck my nerves for several days now: I > > start out from a 4-nodes cluster, stations are > > ibp1-105...ibp3-105,ibp-standby-105. All is nice and cozy until all hell > > breaks loose. It happens after a few ethernet cable pull-outs and ins. > The > > results are not deterministic but result usually in a split-brain-like > > situations, or double allocation of resources (two stations got the same > > resource). There are many warnings that accompany this situation, but I > can > > not make much of them. > > > > I'm working with heartbeat 2.1.4 (no, not using pacemaker!). > > > > You can get the logs and all the vital stats from > > http://itay.bazoo.org/problem.tar.gz (~400kb). (incl. cibs, conf, logs, > > crm_mons, crm_verify) > > > > *** crm_mon on ibp1-105: > > > > ============ > > Last updated: Thu Sep 11 20:40:28 2008 > > Current DC: ibp3-105 (534b8ee0-d476-48ff-806b-5301b2a45037) > > 4 Nodes configured. > > 7 Resources configured. > > ============ > > > > Node: ibp-standby-105 (049722ba-19df-43a8-a73f-3f3d69eb332f): standby > > Node: ibp3-105 (534b8ee0-d476-48ff-806b-5301b2a45037): online > > ibp2-105_stonith:1 (stonith:external/qod-ipmi) > > ibp3_mgmt_ip (ocf::heartbeat:IPaddr2) > > ibp3_qod_ha_process (lsb:qod-ha) > > ibp1-105_stonith:2 (stonith:external/qod-ipmi) > > ibp-standby-105_stonith:2 (stonith:external/qod-ipmi) > > ibp3_data0_ip (ocf::heartbeat:IPaddr2) > > Node: ibp2-105 (b6a94dfe-a247-48fb-a008-556e8598f3e0): online > > ibp2_data0_ip (ocf::heartbeat:IPaddr2) > > ibp2_qod_ha_process (lsb:qod-ha) > > ibp2_mgmt_ip (ocf::heartbeat:IPaddr2) > > ibp3-105_stonith:0 (stonith:external/qod-ipmi) > > Node: ibp1-105 (f64f0bd8-e86e-40c4-8299-fc0bd2239d75): standby > > > > Failed actions: > > ibp1-105_stonith:0_start_0 (node=ibp1-105, call=30, rc=6): complete > > ibp1-105_stonith:1_start_0 (node=ibp1-105, call=28, rc=6): complete > > ibp-standby-105_stonith:0_start_0 (node=ibp-standby-105, call=26, > rc=6): > > complete > > ibp-standby-105_stonith:1_start_0 (node=ibp-standby-105, call=29, > rc=6): > > complete > > ibp2-105_stonith:0_start_0 (node=ibp2-105, call=24, rc=6): complete > > ibp2-105_stonith:2_start_0 (node=ibp2-105, call=28, rc=6): complete > > ibp3-105_stonith:1_start_0 (node=ibp3-105, call=24, rc=6): complete > > ibp3-105_stonith:2_start_0 (node=ibp3-105, call=35, rc=6): complete > > > > *** Endless repeatitions in the logs in ibp1-105 of: > > > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: WARN: > > determine_online_status: Node ibp-standby-105 is unclean > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: ERROR: > > native_add_running: Resource > stonith::external/qod-ipmi:ibp1-105_stonith:0 > > appears to be active on 2 nodes. > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: ERROR: See > > http://linux-ha.org/v2/faq/resource_too_active for more information. > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: ERROR: > unpack_rsc_op: > > Hard error: ibp-standby-105_stonith:0_start_0 failed with rc=6. > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: ERROR: > > unpack_rsc_op: Preventing ibp-standby-105_stonith:0 from re-starting > > anywhere in the cluster > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: WARN: > unpack_rsc_op: > > Processing failed op ibp-standby-105_stonith:0_start_0 on > ibp-standby-105: > > Error > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: WARN: > unpack_rsc_op: > > Compatability handling for failed op ibp-standby-105_stonith:0_start_0 on > > ibp-standby-105 > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: ERROR: > unpack_rsc_op: > > Hard error: ibp-standby-105_stonith:1_start_0 failed with rc=6. > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: ERROR: > > unpack_rsc_op: Preventing ibp-standby-105_stonith:1 from re-starting > > anywhere in the cluster > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: WARN: > unpack_rsc_op: > > Processing failed op ibp-standby-105_stonith:1_start_0 on > ibp-standby-105: > > Error > > Sep 11 19:56:35 [EMAIL PROTECTED] crm_resource: [25821]: WARN: > unpack_rsc_op: > > Compatability handling for failed op ibp-standby-105_stonith:1_start_0 on > > ibp-standby-105 > > > > *** And more endless repeatitions in the logs in ibp1-standby-105 of: > > Sep 11 19:56:39 [EMAIL PROTECTED] pengine: [29098]: WARN: send queue > > maximum length(500) exceeded > > Sep 11 19:56:39 [EMAIL PROTECTED] pengine: [29098]: WARN: send queue > > maximum length(500) exceeded > > Sep 11 19:56:39 [EMAIL PROTECTED] pengine: [29098]: WARN: send queue > > maximum length(500) exceeded > > Sep 11 19:56:39 [EMAIL PROTECTED] pengine: [29098]: WARN: send queue > > maximum length(500) exceeded > > Sep 11 19:56:39 [EMAIL PROTECTED] pengine: [29098]: WARN: send queue > > maximum length(500) exceeded > > > > Another thing (maybe it's related?) that I noticed: I'm using Dells with > > DRAC5 and trying to stonith them with external/ipmi. It seems that this > > fails when the stations are powers down, a thing that violates rule 4 > > according to http://www.linux-ha.org/STONITH. Therefore I made a variant > of > > the script and named it "qod-ipmi". It is attached in the mentioned > tarball > > as well. > > > > I'll be grateful for any advice, and as soon as possbile I hope... > > > > Thanks, > > Itay > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
