Hi, On Thu, Jan 24, 2008 at 10:00:45AM +0000, Peter Clapham wrote: > Hello, > > > Following local experience with ipmi stonith, we've found that certain > systems (i.e. Sun X4200)would occasionally fail to perform a reset. > Using a power off and then a power on proved far more reliable. So if > scripts are likely to find their way into mixed environments then going > the route of the lowest common denominator is probably the best / safest > way to go ?
Definitely. However, this seems to be the case where we can't have a one size fits all solution. Looks like additional configuration is called for. > NB. The script we use has been previously submitted to this > list. While trawling the Internet in search of information on IPMI, I ran into a tool called ipmipower. According to the documentation at least, it looks like a more robust software than the ipmitool. It also has a number of options on how to control the power such as these: on-if-off [on|off] - Toggle on-if-off functionality. wait-until-on [on|off] - Toggle wait-until-on functionality. wait-until-off [on|off] - Toggle wait-until-off functionality. This is worth examining. See https://computing.llnl.gov/linux/ipmipower.html > Cautionary footnote. For more recent server systems that claim ipmi > compatibility it appears that the support for ipmi is at the kernel > level only (and not via the ilo itself...) so if the system becomes > unresponsive and a fencing action is sent, it probably won't have any > effect... For these systems we've resorted to using non-ipmi based > scripts (i.e. modified rilo etc). > > ipmi is a useful standard, but as with any standard it helps end users > (us) if manufacturers implemented things in a consistent manner :-). There's already a number of workarounds implemented in the said ipmipower tool: ipmipower> workaround-flags workaround_flags must be specified: idzero,forcepermsg,unexpectedauth,endianseq,authcap,intel20,supermicro20,sun20 Cheers, Dejan > > Pete > > Hi, Dejan > > > > It seems that I have same hardware with you, some HP Proliant DL145 > > with Qlogic BMC which (claims to) support IPMI 1.5 > > > > I tried the IPMI power cycle function and my server didn't got any > > response, I think there may be something wrong between the ACPI > > interface and server BMC, which caused OS didn't know about a soft > > reset happened. > > > > As for STONITH use, I think you should always use power reset, like > > the stonith script external/ipmi, power reset can do a quick and > > "real" reset to the server hardware and not depend on OS behavior, > > that's what we need for stonith to do. > > > > Recently I build a 4-node HA cluster for my cite's LVS with pingd and > > stonith supported. I use Debian's Heartbeat 2.1.3 package with a > > little modify, and I found some issue about IPMI: > > > > 1. stonith2/ipmilan cannot work, a segmentation fault throw from > > OpenIPMI library. > > 2. external/ipmi cannot work, because ipmitool power reset sometimes > > successfully reset my server but didn't exit with 0, and I have to > > modify the script to let it exit 0 when calling with reset ... > > > > Regards, > > > > Chun TIAN (binghe) > > > > ?? 2008-1-23??????11:40?? Dejan Muhamedagic ?????? > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems