Hi, I just want to post the workaround I'm using for this STONITH issue where the remaining server couldn't power on the other one when it's off. It's not "proper", just a quick patch that gets it work for our purposes. At the end of this email I state what I think needs to be done to clean it up properly.
Here's what I did: I changed line 69 in the Heartbeat plugin script external/riloe from: off = [ '<SERVER_INFO MODE="write">', '<HOLD_PWR_BTN/>', '</SERVER_INFO>' ] to: off = [ '<SERVER_INFO MODE="write">', '<HOLD_PWR_BTN TOGGLE="YES" />', '</SERVER_INFO>' ] Now when I power off the standby, the active server powers it back on. If I power off the Active server, the Standby turns it back on and then takes over the resources. I got the idea of adding the TOGGLE argument after adding print statements to the riloe script to see what's being sent and received by the server. I ran "off" when it was already off and saw a message saying "HOLD_PWR_BTN" without having "TOGGLE="YES"" is ignored when the power is off, then it exited with an error. So I added the "TOGGLE="YES"" option, tried "off" again and it turned the server ON. The result is that the "off" command now acts as a power state toggle. This is a valid workaround for us because Heartbeat only uses "reset" - which in our case is "off" then "on" because we're using "ilo_can_reset='0'" and "on" when the server's already on keeps it on. Server is on, gets "reset": - off turns it off. - on turns it on. Server is off, gets "reset": - off turns it on. - on has no effect. >From the bit of research I did I think that "<HOLD_PWR_BTN>" in newer iLO firmware can only be used to turn the server off. Calling it when the server is already off is what was producing the error. I haven't tested it but I read that "<PRESS_PWR_BTN>" is now used to turn the server on. The proper solution might require querying the firmware version and power state from the iLO controller and sending off or on commands based on that. Tyler Sutherland On Thu, 2008-09-04 at 16:06 +0200, Dejan Muhamedagic wrote: > Hi, > > On Wed, Sep 03, 2008 at 11:09:29PM +0000, Todd, Conor wrote: > > If a server is off, it's not part of the cluster, right? In > > this case, why would a tool for fencing nodes worry about a > > node which is offline? Even if you argue that the same tool > > which removed power from the node should be able to re-apply > > power, why would such a tool do this when the node which was > > powered-down was killed because it was acting erratically? > > Your reasoning is fine, but our software is not perfect. The > cluster master (i.e. CRM) could ask for a node to be fenced > even if it is down. The reason: if it can't talk to a node, it > doesn't know what the node's doing, and there's only one way to > make sure that it's not doing anything (see also > startup-fencing). It may also happen while the cluster is up. Not > optimal, but unavoidable. Hence, a stonith module must be able to > handle this. > > > Just wondering, before I ask any people at HP about this ;) > > Many thanks! > > Dejan > > > - Conor > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Tyler Sutherland Network Administration Software Engineering Iders Incorporated 100-137 Innovation Drive Winnipeg, MB, Canada R3T 6B6 email: [EMAIL PROTECTED] Tel: (204) 779-5400 ext 241 Fax: (204) 779-5444 ============ Iders Incorporated: Confidential ============= Note: This message is intended solely for the use of the designated recipient(s) and their appointed delegates, and may contain confidential information. Any unauthorized disclosure, copying or distribution of its contents is strictly prohibited. If you have received this message in error, please destroy it and advise the sender immediately by phone, Email or facsimile. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
