i say thx to you, for trying to help me. :) yes i checked it, there is no problem with it, i assume it must be a problem with the ping or the attrd_updater, because as I understand that, the crm gets an timeout from the monitor process and then kills the ressource.
" p_out=`$p_exe $p_args $OCF_RESKEY_options $host 2>&1`; rc=$? case $rc in 0) active=`expr $active + 1`;; 1) ping_conditional_log warn "$host is inactive: $p_out";; *) ocf_log err "Unexpected result for '$p_exe $p_args $OCF_RESKEY_options $host' $rc: $p_out";; esac done score=`expr $active \* $OCF_RESKEY_multiplier` attrd_updater -n $OCF_RESKEY_name -v $score -d $OCF_RESKEY_dampen $attrd_options rc=$? case $rc in 0) ping_conditional_log debug "Updated $OCF_RESKEY_name = $score" ;; *) ocf_log warn "Could not update $OCF_RESKEY_name = $score: rc=$rc";; esac return $rc " as there is no response from this part of the RA, the cluster reacts in that way: "Jan 5 08:40:33 node2 crmd: [5993]: ERROR: process_lrm_event: LRM >> operation pingd:0_monitor_15000 (48559) Timed Out (timeout=5000ms)" this is what i assume. kr patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 82930-500 patrik.rappo...@knapp.com www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no responsibility for any alteration to this e-mail or its attachments. KNAPP has taken every reasonable precaution to ensure that any attachment to this e-mail has been swept for virus. However, KNAPP does not accept any liability for damage sustained as a result of such attachment being virus infected and strongly recommend that you carry out your own virus check before opening any attachment. Andreas Kurz <andreas.k...@linbit.com> 10.01.2011 15:41 Bitte antworten an The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> An pacemaker@oss.clusterlabs.org Kopie Thema Re: [Pacemaker] Fw: Antwort: Re: pingd process dies for no reason On 2011-01-10 13:35, patrik.rappo...@knapp.com wrote: > Anyone an idea or did anyone have the same problem? sorry for the question ;-) ... of course you checked your host xxx.xxx.xxx.xxx is ping-able from the cluster nodes? only idea here is a firewall somewhere. Regards, Andreas > > > Mit freundlichen Grüßen / Best Regards > * > Patrik Rapposch, BSc* > System Administration > * > KNAPP Systemintegration GmbH* > Waltenbachstraße 9 > 8700 Leoben, Austria > Phone: +43 3842 805-915 > Fax: +43 3842 82930-500 > patrik.rappo...@knapp.com > www.KNAPP.com > > Commercial register number: FN 138870x > Commercial register court: Leoben > > The information in this e-mail (including any attachment) is > confidential and intended to be for the use of the addressee(s) only. If > you have received the e-mail by mistake, any disclosure, copy, > distribution or use of the contents of the e-mail is prohibited, and you > must delete the e-mail from your system. As e-mail can be changed > electronically KNAPP assumes no responsibility for any alteration to > this e-mail or its attachments. KNAPP has taken every reasonable > precaution to ensure that any attachment to this e-mail has been swept > for virus. However, KNAPP does not accept any liability for damage > sustained as a result of such attachment being virus infected and > strongly recommend that you carry out your own virus check before > opening any attachment. > ----- Weitergeleitet von Patrik Rapposch/KSI am 10.01.2011 13:35 ----- > *patrik.rappo...@knapp.com* > > 07.01.2011 16:38 > Bitte antworten an > The Pacemaker cluster resource manager > <pacemaker@oss.clusterlabs.org> > > > > An > The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> > Kopie > > Thema > [Pacemaker] Antwort: Re: pingd process dies for no reason > > > > > > > > > Hello, > > thx for your fast reply, we use the ping ressource, you can see it in > our config, its just the id which is called pingd, i admit this is a > little confusing.:* > "**<primitive class="ocf" id="pingd" provider="pacemaker" >> /type="ping"/>**"* > > kr patrik > > > Mit freundlichen Grüßen / Best Regards* > > Patrik Rapposch, BSc* > System Administration* > > KNAPP Systemintegration GmbH* > Waltenbachstraße 9 > 8700 Leoben, Austria > Phone: +43 3842 805-915 > Fax: +43 3842 82930-500 > patrik.rappo...@knapp.com _ > __www.KNAPP.com_ > > Commercial register number: FN 138870x > Commercial register court: Leoben > > The information in this e-mail (including any attachment) is > confidential and intended to be for the use of the addressee(s) only. If > you have received the e-mail by mistake, any disclosure, copy, > distribution or use of the contents of the e-mail is prohibited, and you > must delete the e-mail from your system. As e-mail can be changed > electronically KNAPP assumes no responsibility for any alteration to > this e-mail or its attachments. KNAPP has taken every reasonable > precaution to ensure that any attachment to this e-mail has been swept > for virus. However, KNAPP does not accept any liability for damage > sustained as a result of such attachment being virus infected and > strongly recommend that you carry out your own virus check before > opening any attachment. > > *Michael Schwartzkopff <mi...@clusterbau.com>* > > 07.01.2011 15:02 > Bitte antworten an > The Pacemaker cluster resource manager > <pacemaker@oss.clusterlabs.org> > > > An > The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> > Kopie > > Thema > Re: [Pacemaker] pingd process dies for no reason > > > > > > > > > > > On Friday 07 January 2011 14:56:03 patrik.rappo...@knapp.com wrote: >> Greetings, >> >> we have a problem, that the ping daemon dies for no reason and we can't >> find why this happened. >> >> we use following versions on SLES 11.1: >> >> libpacemaker3-1.1.2-0.6.1 >> pacemaker-mgmt-2.0.0-0.3.10 >> pacemaker-mgmt-client-2.0.0-0.3.10 >> drbd-pacemaker-8.3.8.1-0.2.9 >> libpacemaker-devel-1.1.2-0.6.1 >> pacemaker-1.1.2-0.6.1 >> pacemaker-mgmt-devel-2.0.0-0.3.10 >> libcorosync4-1.2.6-0.2.2 >> corosync-1.2.6-0.2.2 >> libcorosync-devel-1.2.6-0.2.2 >> >> here is the important part of the log trace: >> " >> Jan 5 08:40:30 node2 lrmd: [5990]: info: rsc:OSR_IP:46535: monitor >> Jan 5 08:40:30 node2 lrmd: [5990]: info: rsc:Cluster_IP:46533: monitor >> Jan 5 08:40:33 node2 lrmd: [5990]: WARN: pingd:0:monitor process (PID >> 23937) timed out (try 1). Killing with signal SIGTERM (15). >> Jan 5 08:40:33 node2 lrmd: [5990]: WARN: operation monitor[48559] on >> ocf::ping::pingd:0 for client 5993, its parameters: CRM_meta_clone=[0] >> host_list=[xxx.xxx.xxx.xxx] CRM_meta_clone_node_max=[1] >> CRM_meta_clone_max=[2] CRM_meta_notify=[false] dampen=[5s] >> CRM_meta_globally_unique=[false] crm_feature_set=[3.0.2] multiplier=[100] >> CRM_meta_name=[monitor] CRM_meta_interval=[15000] CRM_meta_timeout=[5000] >> >> : pid [23937] timed out >> >> Jan 5 08:40:33 node2 crmd: [5993]: ERROR: process_lrm_event: LRM >> operation pingd:0_monitor_15000 (48559) Timed Out (timeout=5000ms) >> Jan 5 08:40:33 node2 crmd: [5993]: WARN: update_failcount: Updating >> failcount for pingd:0 on node2 after failed monitor: rc=-2 >> (update=value++, time=1294213233) >> Jan 5 08:40:35 node2 pengine: [5992]: notice: unpack_config: On loss of >> CCM Quorum: Ignore >> Jan 5 08:40:35 node2 pengine: [5992]: WARN: unpack_rsc_op: Processing >> failed op drbd_r0:1_promote_0 on node1: unknown exec error (-2) >> Jan 5 08:40:35 node2 pengine: [5992]: WARN: unpack_rsc_op: Processing >> failed op pingd:0_monitor_15000 on node2: unknown exec error (-2) >> Jan 5 08:40:35 node2 pengine: [5992]: notice: clone_print: Clone Set: >> pingdclone [pingd] >> Jan 5 08:40:35 node2 pengine: [5992]: notice: native_print: pingd:0 >> (ocf::pacemaker:ping): Started node2 FAILED >> Jan 5 08:40:35 node2 pengine: [5992]: notice: short_print: Started: >> [ node1 ]" >> >> the ressource is configured in following way: >> <clone id="pingdclone"> >> <meta_attributes id="pingdclone-meta_attributes"> >> <nvpair id="pingdclone-meta_attributes-globally-unique" >> name="globally-unique" value="false"/> >> </meta_attributes> >> <primitive class="ocf" id="pingd" provider="pacemaker" >> type="ping"> >> <instance_attributes id="pingd-instance_attributes"> >> <nvpair id="pingd-instance_attributes-host_list" >> name="host_list" value="xxx.xxx.xxx.xxx"/> >> <nvpair id="pingd-instance_attributes-multiplier" >> name="multiplier" value="100"/> >> <nvpair id="nvpair-96877c9e-2825-4d7d-997b-944652f89584" >> name="dampen" value="5s"/> >> </instance_attributes> >> <operations> >> <op id="pingd-monitor-15s" interval="15s" name="monitor" >> timeout="5s"/> >> </operations> >> </primitive> >> </clone> >> >> thx for your help in advance. >> >> Mit freundlichen Grüßen / Best Regards >> >> Patrik Rapposch, BSc > > Please use the "ping" resource agent instead of the "pingd" > > Greetings, > > -- > Dr. Michael Schwartzkopff > Guardinistr. 63 > 81375 München > > Tel: (0163) 172 50 98 > _______________________________________________ > Pacemaker mailing list: pacema...@oss.clusterlabs.org_ > __http://oss.clusterlabs.org/mailman/listinfo/pacemaker_ > > Project Home: _http://www.clusterlabs.org_ <http://www.clusterlabs.org/> > Getting started: _http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf_ > Bugs: > _http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker_ > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
signature.asc
Description: Binary data
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker