Hi
I encounterd same problems when I used ibmrsa-telnet.
These are common problems of stonith devices which share their
power with their host nodes.
If a stonith device matches the condition I wrote, that device
can't accept any request when its host node powered off,
because both of their power switched off at the same time.
Due to this, Any plugin for these stonith devices can't control
powered-off hosts and no infomation can be got from such hosts.
Alexander Hofmann wrote:
Would it solve the problem if I request an ICMP echo before connecting to the
iLO-card, and return 0 (success) if I get no response.
If you do this, you'd better to do ICMP echo check at the time
of "status" too. Otherwise, network toubles about ICMP echo
would be discovered at the time of "reset" or "off".
Or: Can I be sure that the node is already off when his iLO-card doesn't
>> respond? (point-to-point connection, no routing etc.)
If "fence" is set as the value of "on_fail" attrbute of some
operation such as "monitor" or "stop", and it causes "reset"
or "off", in this case the answer for your question is "no".
Alexander Hofmann schrieb:
Hi,
Dejan Muhamedagic wrote:
Hi,
On Wed, Aug 27, 2008 at 01:56:25PM +0200, Alexander Hofmann wrote:
Hello list,
after many hours of try and error, I got the iLo STONITH configuration
working.
During some tests I noticed the following issue:
Testcase 1: node1 has all resources and node2 is hard powered off.
node1 tries to STONITH node2 but has no success.
node1 retries to STONITH node2 every 30sec.
If I now boot node2 it is shutdown by node1 because of the retries.
How can I configure STONITH, so that the STONITH plugin is only executed
once or twice
in a very small interval.
Testcase 2: node2 has all resources and is hard powered off.
node1 tries to STONITH node2 but does not succeed.
node1 _doesn't_ start the resources! it retries to STONITH node2
every ~30sec.
Both problems are most probably in the external/riloe stonith
plugin: if a node is powered off, it should report success for
the stonith operation. The point of a stonith operation is to
ensure that a host is down or rebooted. This seems to be a
serious issue with external/riloe.
I've browsed through the sourcecode (python...brrrr :-) of external/riloe but
could not find
the piece of code where the error occurs.
If I send "power off" twice at the iLO-cmdline, I get the following string at
the second execution:
Server power already Off
Perhaps the HTTP cmd returns the same string an the iLO plugin does not know
how to interpret:
# stonith -t external/riloe hostlist=node1 ilo_hostname=10.0.2.1
ilo_user=user ilo_password=**** ilo_protocol=2.0 ilo_powerdown_method=button
ilo_can_reset=1 -T off tfdps01
** INFO: external_run_cmd: Calling '/usr/lib/stonith/plugins/external/riloe
off node1' returned 256
** (process:27676): CRITICAL **: external_reset_req: 'riloe off' for host
node1 failed with rc 256
# stonith -t external/riloe hostlist=tfdps01 ilo_hostname=10.0.2.1
ilo_user=tfdps ilo_password=startdfs ilo_protocol=2.0
ilo_powerdown_method=button ilo_can_reset=1 -S
stonith: external/riloe device OK.
Today, another problem crossed my mind:
If I detach the power cable of one node, I cannot communicate with his
iLO-card.
Would it solve the problem if I request an ICMP echo before connecting to the
iLO-card, and return 0 (success) if I get no response.
Or: Can I be sure that the node is already off when his iLO-card doesn't
respond? (point-to-point connection, no routing etc.)
Example: I detached the power cable an executed the following commands:
# stonith -t external/riloe hostlist=tfdps01 ilo_hostname=10.0.2.1
ilo_user=tfdps ilo_password=startdfs ilo_protocol=2.0
ilo_powerdown_method=button ilo_can_reset=1 -T off tfdps01
** INFO: external_run_cmd: Calling '/usr/lib/stonith/plugins/external/riloe
status' returned 256
** INFO: external_run_cmd: Calling '/usr/lib/stonith/plugins/external/riloe
off tfdps01' returned 256
** (process:22349): CRITICAL **: external_reset_req: 'riloe off' for host
tfdps01 failed with rc 256
PS: Where can I find a list explaining all possible STONITH plugin return
codes?
I made a mistake:
Node: tfdps01 == node1
User: tfdps == user
Thanks,
Dejan
Thanks,
Alex
--
Takenaka Kazuhiro <[EMAIL PROTECTED]>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems