Re: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- Automatic failback occurs

Andrew Beekhof Thu, 24 Apr 2008 06:55:25 -0700

2008/4/24 NAKAHIRA Kazutomo <[EMAIL PROTECTED]>:
> hello, all
>
>  I tried same test pattern reported by Hideo Yamauchi,
>  and still automatic fail-back occurs in latest Pacemaker.
>  (Pacemaker changeset: bf619298929c, Heartbeat changeset: 54723736ab18)


oh :-(
sorry, i just assumed it was the same problem

>  There is a log output by PE when execute "crm_resource -C -r
>  group1-dummy2 -H dl380g5e".
>
>  (snip ha-log)
>  pengine[13894]: 2008/04/24_19:02:57 info: common_apply_stickiness:
>  Setting failure stickiness for group1-dummy2 on dl380g5e: 727379968
>  (snip ha-log)
>
>  It seems that if fail-count become INFINITY for any reason and
>  default-resource-failure-stickiness value defined as "-INFINITY",
>  then common_apply_stickiness() calculates invalid value.

Can you try the following patch?

diff -r 5229c9b520f3 lib/crm/pengine/complex.c
--- a/lib/crm/pengine/complex.c Thu Apr 24 13:20:48 2008 +0200
+++ b/lib/crm/pengine/complex.c Thu Apr 24 15:47:40 2008 +0200
@@ -372,14 +372,19 @@ common_apply_stickiness(resource_t *rsc,
        
        if(fail_count > 0 && rsc->fail_stickiness != 0) {
                resource_t *failed = rsc;
+               int score = fail_count * rsc->fail_stickiness;
                if(is_not_set(rsc->flags, pe_rsc_unique)) {
                    failed = uber_parent(rsc);
                }
-               resource_location(failed, node, fail_count * 
rsc->fail_stickiness,
-                                 "fail_stickiness", data_set);
+
+               /* detect and prevent score underflows */
+               if(rsc->fail_stickiness < 0 && (score > 0 || score < 
-INFINITY)) {
+                   score = -INFINITY;
+               }
+
+               resource_location(failed, node, score, "fail_stickiness", 
data_set);
                crm_info("Setting failure stickiness for %s on %s: %d",
-                         failed->id, node->details->uname,
-                         fail_count * rsc->fail_stickiness);
+                         failed->id, node->details->uname, score);
        }
        g_hash_table_destroy(meta_hash);
 }


>
>  Best regards,
>  NAKAHIRA Kazutomo
>
>
>
>  HIDEO YAMAUCHI wrote:
>  > Hi,
>  >
>  >> 2008/4/17 HIDEO YAMAUCHI <[EMAIL PROTECTED]>:
>  >>> Hi,
>  >>>
>  >>>  I used Heartbeat-STABLE-2-1-932f11969945.
>  >>>  I confirmed movement of a simple group resource.
>  >>>
>  >>>  1)I fail in the start movement of one resource in an Active node.
>  >>>
>  >>>
>  >>>  2)All resources move to a Standby node.
>  >>>
>  >>>  3)I make the resource of the Active node clear by a crm_resource 
> command.
>  >>>   crm_resource -C -r group1-dummy1 -H rh51-pm
>  >>>
>  >>>  4)All the resources move to an Active node. (Automatic failback occurs.)
>  >>>
>  >>>  Node: rh51-pm (fe4ff160-196b-4b5f-b341-5b1ccf666bf1): online
>  >>>  Node: rh51-pm2 (19ca6bf8-a6a0-4207-ad1f-bd4ed22ebcd4): online
>  >>>
>  >>>  Resource Group: resource_group1
>  >>>     group1-dummy1       (ocf::heartbeat:Dummy): Started rh51-pm
>  >>>     group1-dummy2       (ocf::heartbeat:Dummy2):        Started rh51-pm
>  >>>
>  >>>
>  >>>  I think that the failback did not work in Ver2.1.3. (at case 4)
>  >>>
>  >>>  Is this new specifications from Ver2.1.4?
>  >> No it was a bug that I fixed a few days back - I guess the fix hasn't
>  >> been backported yet
>  >
>  > OK.
>  >
>  > I wait for the revision of the bug to be reflected.
>  >
>  > Thanks,
>  >
>  > Hideo Yamauchi.
>  >
>  >>>  And, is there the setting method that does not  failback in the same 
> way as Ver2.1.3?
>  >> _______________________________________________________
>  >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>  >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>  >> Home Page: http://linux-ha.org/
>  >>
>  >
>  > _______________________________________________________
>  > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>  > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>  > Home Page: http://linux-ha.org/
>
>
>  --
>  ----------------------------------------
>  NAKAHIRA Kazutomo
>  NTT DATA INTELLILINK CORPORATION
>
> _______________________________________________________
>  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>  Home Page: http://linux-ha.org/
>
>
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- Automatic failback occurs

Reply via email to