2008/4/24 NAKAHIRA Kazutomo <[EMAIL PROTECTED]>:
> hello, all
>
> I tried same test pattern reported by Hideo Yamauchi,
> and still automatic fail-back occurs in latest Pacemaker.
> (Pacemaker changeset: bf619298929c, Heartbeat changeset: 54723736ab18)
oh :-(
sorry, i just assumed it was the same problem
> There is a log output by PE when execute "crm_resource -C -r
> group1-dummy2 -H dl380g5e".
>
> (snip ha-log)
> pengine[13894]: 2008/04/24_19:02:57 info: common_apply_stickiness:
> Setting failure stickiness for group1-dummy2 on dl380g5e: 727379968
> (snip ha-log)
>
> It seems that if fail-count become INFINITY for any reason and
> default-resource-failure-stickiness value defined as "-INFINITY",
> then common_apply_stickiness() calculates invalid value.
Can you try the following patch?
diff -r 5229c9b520f3 lib/crm/pengine/complex.c
--- a/lib/crm/pengine/complex.c Thu Apr 24 13:20:48 2008 +0200
+++ b/lib/crm/pengine/complex.c Thu Apr 24 15:47:40 2008 +0200
@@ -372,14 +372,19 @@ common_apply_stickiness(resource_t *rsc,
if(fail_count > 0 && rsc->fail_stickiness != 0) {
resource_t *failed = rsc;
+ int score = fail_count * rsc->fail_stickiness;
if(is_not_set(rsc->flags, pe_rsc_unique)) {
failed = uber_parent(rsc);
}
- resource_location(failed, node, fail_count *
rsc->fail_stickiness,
- "fail_stickiness", data_set);
+
+ /* detect and prevent score underflows */
+ if(rsc->fail_stickiness < 0 && (score > 0 || score <
-INFINITY)) {
+ score = -INFINITY;
+ }
+
+ resource_location(failed, node, score, "fail_stickiness",
data_set);
crm_info("Setting failure stickiness for %s on %s: %d",
- failed->id, node->details->uname,
- fail_count * rsc->fail_stickiness);
+ failed->id, node->details->uname, score);
}
g_hash_table_destroy(meta_hash);
}
>
> Best regards,
> NAKAHIRA Kazutomo
>
>
>
> HIDEO YAMAUCHI wrote:
> > Hi,
> >
> >> 2008/4/17 HIDEO YAMAUCHI <[EMAIL PROTECTED]>:
> >>> Hi,
> >>>
> >>> I used Heartbeat-STABLE-2-1-932f11969945.
> >>> I confirmed movement of a simple group resource.
> >>>
> >>> 1)I fail in the start movement of one resource in an Active node.
> >>>
> >>>
> >>> 2)All resources move to a Standby node.
> >>>
> >>> 3)I make the resource of the Active node clear by a crm_resource
> command.
> >>> crm_resource -C -r group1-dummy1 -H rh51-pm
> >>>
> >>> 4)All the resources move to an Active node. (Automatic failback occurs.)
> >>>
> >>> Node: rh51-pm (fe4ff160-196b-4b5f-b341-5b1ccf666bf1): online
> >>> Node: rh51-pm2 (19ca6bf8-a6a0-4207-ad1f-bd4ed22ebcd4): online
> >>>
> >>> Resource Group: resource_group1
> >>> group1-dummy1 (ocf::heartbeat:Dummy): Started rh51-pm
> >>> group1-dummy2 (ocf::heartbeat:Dummy2): Started rh51-pm
> >>>
> >>>
> >>> I think that the failback did not work in Ver2.1.3. (at case 4)
> >>>
> >>> Is this new specifications from Ver2.1.4?
> >> No it was a bug that I fixed a few days back - I guess the fix hasn't
> >> been backported yet
> >
> > OK.
> >
> > I wait for the revision of the bug to be reflected.
> >
> > Thanks,
> >
> > Hideo Yamauchi.
> >
> >>> And, is there the setting method that does not failback in the same
> way as Ver2.1.3?
> >> _______________________________________________________
> >> Linux-HA-Dev: [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >> Home Page: http://linux-ha.org/
> >>
> >
> > _______________________________________________________
> > Linux-HA-Dev: [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
>
>
> --
> ----------------------------------------
> NAKAHIRA Kazutomo
> NTT DATA INTELLILINK CORPORATION
>
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
>
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/