I did figure it out. Problem is some of the docs out there are not very clear. I will blog this at www.mlds-networks.com If you need this latter, or just look in the mailing list archives.

The format is

stonith_host  HOST SENDING  external/ipmi  HOST TO CONTROL

So what I really needed:

stonith_host mds2.engin.umich.edu external/ipmi mds1.engin.umich.edu mds1-m.engin.umich.edu USER PASSWORD stonith_host mds1.engin.umich.edu external/ipmi mds2.engin.umich.edu mds2-m.engin.umich.edu USER PASSWORD

Notice how the first host, second host and IPMI host are differnt. The first one tells mds2 how to kill mds1 using mds1-m IPMI device.
The second one tells mds1 how to kill mds2 using mds2-m etc.

I hope that helps.  Good luck,

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



On Aug 1, 2008, at 8:37 AM, Chun Tian (binghe) wrote:
Hi, Brock Palen

Good, finally there's someone got the same things as me. I just don't know if there's any chance the stonith/external would parse return value 0 into 256, or ipmitool itself have bugs when doing reset.

Brock, can I ask your machine type and model? I met some non-zero return values when using ipmitool reset on some HP Proliant DL140/145 servers.

Regards,

Chun Tian (binghe)

I have made some luck getting STONITH to work but still running into a problem I can not figure out how to debug.

In the ha.cf  on each host I have:

stonith_host mds2.engin.umich.edu external/ipmi mds2.engin.umich.edu mds2-m.engin.umich.edu root PASSWORD stonith_host mds1.engin.umich.edu external/ipmi mds1.engin.umich.edu mds1-m.engin.umich.edu root PASSWORD

Now heartbeat does try to kill the node where I kill heartbeat. In the log I see:

heartbeat[12013]: 2008/07/31_15:47:56 info: Resetting node mds1.engin.umich.edu with [IPMI STONITH device] heartbeat[12013]: 2008/07/31_15:47:57 info: glib: external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/ ipmi reset mds1.engin.umich.edu' returned 256 heartbeat[12013]: 2008/07/31_15:47:57 ERROR: glib: external_reset_req: 'ipmi reset' for host mds1.engin.umich.edu failed with rc 256

I can run:

stonith -t external/ipmi -p "mds1.engin.umich.edu mds1- m.engin.umich.edu root PASSWORD" -T reset mds1.engin.umich.edu

and the dead node will restart. So from the documentation of 1.x style configs I am not sure where to debug why the stonith_host lines do not work.

mds1 and mds2 are the nodes of the cluster, mds1-m and mds2-m are the hostnames of the IPMI devices which have lan configs set up. Note how stonith from the cmd line works just fine, just not in heartbeat.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to