Hi,

On Thu, Jan 24, 2008 at 10:00:45AM +0000, Peter Clapham wrote:
> Hello,
> 
> 
> Following local experience with ipmi stonith, we've found that certain
> systems (i.e. Sun X4200)would occasionally fail to perform a reset.
> Using a power off and then a power on proved far more reliable. So if
> scripts are likely to find their way into mixed environments then going
> the route of the lowest common denominator is probably the best / safest
> way to go ?

Definitely. However, this seems to be the case where we can't
have a one size fits all solution. Looks like additional
configuration is called for.

> NB. The script we use has been previously submitted to this
> list.

While trawling the Internet in search of information on IPMI, I
ran into a tool called ipmipower. According to the documentation
at least, it looks like a more robust software than the ipmitool.
It also has a number of options on how to control the power such
as these:

on-if-off [on|off]                      - Toggle on-if-off functionality.
wait-until-on [on|off]                  - Toggle wait-until-on functionality.
wait-until-off [on|off]                 - Toggle wait-until-off functionality.

This is worth examining. See
https://computing.llnl.gov/linux/ipmipower.html

> Cautionary footnote. For more recent server systems that claim ipmi
> compatibility it appears that the support for ipmi is at the kernel
> level only (and not via the ilo itself...) so if the system becomes
> unresponsive and a fencing action is sent, it probably won't have any
> effect... For these systems we've resorted to using non-ipmi based
> scripts (i.e. modified rilo etc).
> 
> ipmi is a useful standard, but as with any standard it helps end users
> (us) if manufacturers implemented things in a consistent manner :-).

There's already a number of workarounds implemented in the said
ipmipower tool:

ipmipower> workaround-flags
workaround_flags must be specified:
idzero,forcepermsg,unexpectedauth,endianseq,authcap,intel20,supermicro20,sun20

Cheers,

Dejan

> 
> Pete
> > Hi, Dejan
> >
> > It seems that I have same hardware with you, some HP Proliant DL145
> > with Qlogic BMC which (claims to) support IPMI 1.5
> >
> > I tried the IPMI power cycle function and my server didn't got any
> > response, I think there may be something wrong between the ACPI
> > interface and server BMC, which caused OS didn't know about a soft
> > reset happened.
> >
> > As for STONITH use, I think you should always use power reset, like
> > the stonith script external/ipmi, power reset can do a quick and
> > "real" reset to the server hardware and not depend on OS behavior,
> > that's what we need for stonith to do.
> >
> > Recently I build a 4-node HA cluster for my cite's LVS with pingd and
> > stonith supported. I use Debian's Heartbeat 2.1.3 package with a
> > little modify, and I found some issue about IPMI:
> >
> > 1. stonith2/ipmilan cannot work, a segmentation fault throw from
> > OpenIPMI library.
> > 2. external/ipmi cannot work, because ipmitool power reset sometimes
> > successfully reset my server but didn't exit with 0, and I have to
> > modify the script to let it exit 0 when calling with reset ...
> >
> > Regards,
> >
> > Chun TIAN (binghe)
> >
> > ?? 2008-1-23??????11:40?? Dejan Muhamedagic ??????
> >
> 
> 
> 
> -- 
>  The Wellcome Trust Sanger Institute is operated by Genome Research 
>  Limited, a charity registered in England with number 1021457 and a 
>  company registered in England with number 2742969, whose registered 
>  office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to