I did figure it out. Problem is some of the docs out there are not
very clear. I will blog this at www.mlds-networks.com If you need
this latter, or just look in the mailing list archives.
The format is
stonith_host HOST SENDING external/ipmi HOST TO CONTROL
So what I really needed:
stonith_host mds2.engin.umich.edu external/ipmi mds1.engin.umich.edu
mds1-m.engin.umich.edu USER PASSWORD
stonith_host mds1.engin.umich.edu external/ipmi mds2.engin.umich.edu
mds2-m.engin.umich.edu USER PASSWORD
Notice how the first host, second host and IPMI host are differnt.
The first one tells mds2 how to kill mds1 using mds1-m IPMI device.
The second one tells mds1 how to kill mds2 using mds2-m etc.
I hope that helps. Good luck,
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985
On Aug 1, 2008, at 8:37 AM, Chun Tian (binghe) wrote:
Hi, Brock Palen
Good, finally there's someone got the same things as me. I just
don't know if there's any chance the stonith/external would parse
return value 0 into 256, or ipmitool itself have bugs when doing
reset.
Brock, can I ask your machine type and model? I met some non-zero
return values when using ipmitool reset on some HP Proliant
DL140/145 servers.
Regards,
Chun Tian (binghe)
I have made some luck getting STONITH to work but still running
into a problem I can not figure out how to debug.
In the ha.cf on each host I have:
stonith_host mds2.engin.umich.edu external/ipmi
mds2.engin.umich.edu mds2-m.engin.umich.edu root PASSWORD
stonith_host mds1.engin.umich.edu external/ipmi
mds1.engin.umich.edu mds1-m.engin.umich.edu root PASSWORD
Now heartbeat does try to kill the node where I kill heartbeat.
In the log I see:
heartbeat[12013]: 2008/07/31_15:47:56 info: Resetting node
mds1.engin.umich.edu with [IPMI STONITH device]
heartbeat[12013]: 2008/07/31_15:47:57 info: glib:
external_run_cmd: Calling '/usr/lib64/stonith/plugins/external/
ipmi reset mds1.engin.umich.edu' returned 256
heartbeat[12013]: 2008/07/31_15:47:57 ERROR: glib:
external_reset_req: 'ipmi reset' for host mds1.engin.umich.edu
failed with rc 256
I can run:
stonith -t external/ipmi -p "mds1.engin.umich.edu mds1-
m.engin.umich.edu root PASSWORD" -T reset mds1.engin.umich.edu
and the dead node will restart. So from the documentation of 1.x
style configs I am not sure where to debug why the stonith_host
lines do not work.
mds1 and mds2 are the nodes of the cluster, mds1-m and mds2-m are
the hostnames of the IPMI devices which have lan configs set up.
Note how stonith from the cmd line works just fine, just not in
heartbeat.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems