2008/4/24 NAKAHIRA Kazutomo <[EMAIL PROTECTED]>: > hello, all > > I tried same test pattern reported by Hideo Yamauchi, > and still automatic fail-back occurs in latest Pacemaker. > (Pacemaker changeset: bf619298929c, Heartbeat changeset: 54723736ab18)
oh :-( sorry, i just assumed it was the same problem > There is a log output by PE when execute "crm_resource -C -r > group1-dummy2 -H dl380g5e". > > (snip ha-log) > pengine[13894]: 2008/04/24_19:02:57 info: common_apply_stickiness: > Setting failure stickiness for group1-dummy2 on dl380g5e: 727379968 > (snip ha-log) > > It seems that if fail-count become INFINITY for any reason and > default-resource-failure-stickiness value defined as "-INFINITY", > then common_apply_stickiness() calculates invalid value. Can you try the following patch? diff -r 5229c9b520f3 lib/crm/pengine/complex.c --- a/lib/crm/pengine/complex.c Thu Apr 24 13:20:48 2008 +0200 +++ b/lib/crm/pengine/complex.c Thu Apr 24 15:47:40 2008 +0200 @@ -372,14 +372,19 @@ common_apply_stickiness(resource_t *rsc, if(fail_count > 0 && rsc->fail_stickiness != 0) { resource_t *failed = rsc; + int score = fail_count * rsc->fail_stickiness; if(is_not_set(rsc->flags, pe_rsc_unique)) { failed = uber_parent(rsc); } - resource_location(failed, node, fail_count * rsc->fail_stickiness, - "fail_stickiness", data_set); + + /* detect and prevent score underflows */ + if(rsc->fail_stickiness < 0 && (score > 0 || score < -INFINITY)) { + score = -INFINITY; + } + + resource_location(failed, node, score, "fail_stickiness", data_set); crm_info("Setting failure stickiness for %s on %s: %d", - failed->id, node->details->uname, - fail_count * rsc->fail_stickiness); + failed->id, node->details->uname, score); } g_hash_table_destroy(meta_hash); } > > Best regards, > NAKAHIRA Kazutomo > > > > HIDEO YAMAUCHI wrote: > > Hi, > > > >> 2008/4/17 HIDEO YAMAUCHI <[EMAIL PROTECTED]>: > >>> Hi, > >>> > >>> I used Heartbeat-STABLE-2-1-932f11969945. > >>> I confirmed movement of a simple group resource. > >>> > >>> 1)I fail in the start movement of one resource in an Active node. > >>> > >>> > >>> 2)All resources move to a Standby node. > >>> > >>> 3)I make the resource of the Active node clear by a crm_resource > command. > >>> crm_resource -C -r group1-dummy1 -H rh51-pm > >>> > >>> 4)All the resources move to an Active node. (Automatic failback occurs.) > >>> > >>> Node: rh51-pm (fe4ff160-196b-4b5f-b341-5b1ccf666bf1): online > >>> Node: rh51-pm2 (19ca6bf8-a6a0-4207-ad1f-bd4ed22ebcd4): online > >>> > >>> Resource Group: resource_group1 > >>> group1-dummy1 (ocf::heartbeat:Dummy): Started rh51-pm > >>> group1-dummy2 (ocf::heartbeat:Dummy2): Started rh51-pm > >>> > >>> > >>> I think that the failback did not work in Ver2.1.3. (at case 4) > >>> > >>> Is this new specifications from Ver2.1.4? > >> No it was a bug that I fixed a few days back - I guess the fix hasn't > >> been backported yet > > > > OK. > > > > I wait for the revision of the bug to be reflected. > > > > Thanks, > > > > Hideo Yamauchi. > > > >>> And, is there the setting method that does not failback in the same > way as Ver2.1.3? > >> _______________________________________________________ > >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > >> Home Page: http://linux-ha.org/ > >> > > > > _______________________________________________________ > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > > Home Page: http://linux-ha.org/ > > > -- > ---------------------------------------- > NAKAHIRA Kazutomo > NTT DATA INTELLILINK CORPORATION > > _______________________________________________________ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > > _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/