Hi,

I am running some tests in order to implement fencing with two methods, and I got stuck on the WTI configuration while the IPMI configuration was pretty straight forward.

I have an installation with two nodes on Centos 6.3 running pacemaker 1.1.7 + corosync 1.4.1 . Both servers supports IPMI, and are plugged to a WTI power switch. This is the physical configuration :
{node1,psu1} => WTI_Port1
{node1,psu2} => WTI_Port5
{node2,psu1} => WTI_Port2
{node2,psu2} => WTI_Port6
{WTI,port1..4} => Electrical circuit A
{WTI,port5..8} => Electrical circuit B

I cleared the IPMI configuration and kept only the two WTI fencing Primitives in my configuration to make it as simple as possible :

primitive wti_fence01 stonith:fence_wti \
params ipaddr="192.168.0.7" action="reboot" verbose="true" pcmk_host_check="static-list" pcmk_host_list="fence01.domain" pcmk_host_map="fence01.domain:1,5" login_timeout="20" shell_timeout="20" \
        op monitor interval="30s"
primitive wti_fence02 stonith:fence_wti \
params ipaddr="192.168.0.7" action="reboot" verbose="true" pcmk_host_check="static-list" pcmk_host_list="fence02.domain" pcmk_host_map="fence02.domain:2,6" login_timeout="20" shell_timeout="20" \
        op monitor interval="30s"

location wti_fence01-on-fence02 wti_fence01 \
rule $id="wti_fence01-on-fence02-rule" -inf: #uname eq fence01.domain
location wti_fence02-on-fence01 wti_fence02 \
rule $id="wti_fence02-on-fence01-rule" -inf: #uname eq fence02.domain
location bind-on-fence02 bind 100: fence01.domain

With this configuration, in /var/log/cluster/corosync.log, among the whole telnet session with the PDU, I can read this error : Feb 01 12:49:36 [4492] fence02.domain stonith-ng: error: log_operation: wti_fence01: IPS>Failed: Unable to obtain correct plug status or plug is not available

I believe my problem comes from the attribute pcmk_host_map="fence02.domain:2,6". If I modify the value of this attribute to pcmk_host_map="fence01.domain:1" and pcmk_host_map="fence02.domain:2", I no longer have the errors in the logs. Furthermore, with only one port configured, when I provoke the fencing, I can see that it works fine in the logs :


Feb 01 12:47:12 [4492] fence02.domain stonith-ng: info: initiate_remote_stonith_op: Initiating remote operation reboot for fence01.lyra-network.com: 13eb69d2-6e94-4563-a6f8-60d849ab5926
[...]
Feb 01 12:47:14 [4492] fence02.domain stonith-ng: info: log_operation: wti_fence01: Plug | Name | Password | Status | Boot/Seq. Delay | Default | Feb 01 12:47:14 [4492] fence02.domain stonith-ng: info: log_operation: wti_fence01: 1 | f01-wti | (undefined) | ON | 1 Sec | ON |
[...]
Feb 01 12:47:14 [4492] fence02.domain stonith-ng: info: log_operation: wti_fence01: 1 | f01-wti | (undefined) | OFF | 1 Sec | ON |
[...]
Feb 01 12:47:14 [4492] fence02.domain stonith-ng: info: log_operation: wti_fence01: 1 | f01-wti | (undefined) | ON | 1 Sec | ON |

The PDU works fine, as I can manually reboot without troubles :
- In CLI mode, with "/boot 1+5" or "/boot 1 5" in order to reboot the first node. - In "remote" mode, with the fence_agent fence_wti, and this command (no passwd configured nor confirmation required) :
for port in 1 5; do  fence_wti -o reboot -a 192.168.0.7 -n $port -v; done

I've reach a dead-end here, and I have lost a lot of time trying to figure it out. Am I missing something obvious, am I a newbee that can't make a proper use of stonithd, or is this somehow a bug or incompatibility ?

Any help on this will be greatly appreciated !

Thibaut.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to