Hi, On Tue, Dec 15, 2009 at 09:45:39AM +0100, Michael Schwartzkopff wrote: > Am Dienstag, 15. Dezember 2009 09:37:01 schrieb Chris Picton: > > On Tue, 15 Dec 2009 07:13:29 +0000, Chris Picton wrote: > > >>> > The monitor op shouldn't make any changes. If the rule has gone > > >>> > away, the monitor op should return failure to indicate the resource > > >>> > is broken, which will result in Pacemaker telling the the failed > > >>> > resource to stop, and start again. Actually, from the logs it looks > > >>> > like a restart was attempted, and the stop op reported success, but > > >>> > the subsequent start failed for some reason. > > >>> > > > >>> > Regards, > > >>> > > > >>> > Tim > > >>> > > >>> Exactly. So the RA seems to have a problem handeling this error > > >>> scenario correctly. > > >> > > >> OK. Anybody knows how should it work and where's the problem. It seems > > >> like it can't find some proc file. > > > > > > I will have a go at fixing the RA today to do the following: 1. Detect > > > the error in monitor and return the correct value 2. Stop the resource > > > cleanly > > > 3. Start it up again. > > > > > > > > > Will let you know how it goes. > > > > The below patch seems to detect this specific failure, and stop the > > resource cleanly. > > > > The start operation is able to start it up again without errors > > > > Chris > > > > ------------------- > > --- IPaddr2.orig 2009-12-15 10:07:58.000000000 +0200 > > +++ IPaddr2.new 2009-12-15 10:22:03.000000000 +0200 > > @@ -548,6 +548,7 @@ > > # returns: > > # ok = served (for CIP: + hash bucket) > > # partial = served and no hash bucket (CIP only) > > +# partial2 = served and no CIP iptables rule > > # no = nothing > > # > > ip_served() { > > @@ -577,6 +578,10 @@ > > fi > > > > # Special handling for the CIP: > > + if [ ! -e $IP_CIP_FILE ]; then > > + echo "partial2" > > + return 0 > > + fi > > if egrep -q "(^|,)${IP_INC_NO}(,|$)" $IP_CIP_FILE ; then > > echo "ok" > > return 0 > > @@ -620,7 +625,7 @@ > > exit $OCF_SUCCESS > > fi > > > > - if [ -n "$IP_CIP" ] && [ $ip_status = "no" ]; then > > + if [ -n "$IP_CIP" ] && [ $ip_status = "no" ] || [ $ip_status = > > "partial2" ]; then > > $MODPROBE ip_conntrack > > $IPTABLES -I INPUT -d $BASEIP -i $NIC -j CLUSTERIP \ > > --new \ > > @@ -691,13 +696,14 @@ > > fi > > fi > > local ip_status=`ip_served` > > + ocf_log info "IP status = $ip_status, IP_CIP=$IP_CIP" > > > > if [ $ip_status = "no" ]; then > > > > : Requested interface not in use > > > > exit $OCF_SUCCESS > > fi > > > > - if [ -n "$IP_CIP" ]; then > > + if [ -n "$IP_CIP" ] && [ $ip_status != "partial2" ]; then > > if [ $ip_status = "partial" ]; then > > exit $OCF_SUCCESS > > fi > > @@ -743,7 +749,7 @@ > > ok) > > return $OCF_SUCCESS > > ;; > > - partial|no) > > + partial|no|partial2) > > exit $OCF_NOT_RUNNING > > ;; > > *) > > Thank you very much for you help!
The patch looks good to me, though I don't really understand what's going on :) At any rate, we need a bugzilla entry for this fix. Who can describe the problem? Cheers, Dejan > > -- > Dr. Michael Schwartzkopff > MultiNET Services GmbH > Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany > Tel: +49 - 89 - 45 69 11 0 > Fax: +49 - 89 - 45 69 11 21 > mob: +49 - 174 - 343 28 75 > > mail: mi...@multinet.de > web: www.multinet.de > > Sitz der Gesellschaft: 85630 Grasbrunn > Registergericht: Amtsgericht München HRB 114375 > Geschäftsführer: Günter Jurgeneit, Hubert Martens > > --- > > PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B > Skype: misch42 > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker