Hi

Roland G. McIntosh wrote:
No matter how many times I kill IPaddr2 I can't seem to cause a failover in my simple 2 node cluster.

OT, but why do people keep calling 2 node clusters "simple" clusters? Clusters are not simple. Maybe it's a rather "small" cluster.

I'm trying to get it working for the 3 services in my group (HB 2.1.3 on RHEL4 using CentOS packages). I don't understand why showscores.sh shows "INFINITY" for my OCF resources, but an integer value for the IPaddr2 resource.

This is expected. In a colocated group, only the first resource receives the configured integer stickiness value (times the number of resources in that group). Read below.

Here is the output of my showscores.sh:

[EMAIL PROTECTED] rss]$ ./showscores.sh
Resource Score Node Stickiness #Fail Fail-Stickiness
slink_db            -INFINITY slinkfail       100        0        -30
slink_db            INFINITY  slinkmaster     100        0        -30
slink_ipaddr2       0         slinkfail       100        0        -30
slink_ipaddr2       400       slinkmaster     100        0        -30

As you see here. You have a node preference of 100 plus 3 * 100 stickiness.

slink_jboss         -INFINITY slinkfail       100        0        -30
slink_jboss         INFINITY  slinkmaster     100        0        -30

The INFINITY is implicitly given by the colocated group - that way your resources run on the same node. -INFINITY is to make sure they dont run on any other node than the one the first resource was started on.

With a failure stickiness of -30, you allow your groups resources to fail (400/30)=14 times. Is that what you want?

I'm using the Mar 2008 version of showscores.sh (thanks Dominik!), so perhaps this is related to the known issue of meta attributes on the group instead of on the primitive.

From your config - no, it's not about that, as you don't have a stickiness meta attribute for the group, just default values.

I've been trying to force a failover like this:

export OCF_RESKEY_ip=192.168.1.222
for nn in `seq 1 15`; do
  /usr/lib/ocf/resource.d/heartbeat/IPaddr2 stop
  sleep 1m
done

After one the score becomes "200".
Then it seems to jump back up to 300 and stays there. It never proceeds down below zero as I expect. I have a colocation constraint, as you can see in my cib.xml.

You don't have any monitor operations for the ipaddr and jboss resources. Failures on them are not detected. Configure monitor operations and try again.

Also make sure you use a recent version. Otherwise you may also hit the bug of not increasing failcount in 2.1.3's crm. This is fixed in pacemaker (0.6.x)

This line from your config:

<rsc_colocation id="colocation_MyGroup" from="MyGroup" to="MyGroup" score="INFINITY"/>

is not needed. I don't even know what you want to express with this.

Regards
Dominik

ps. I'll add the group score things to http://www.linux-ha.org/ScoreCalculation soon.

pps. where did you get the jboss RA? I'd be interested in it.
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to