On 1 Aug 2014, at 7:47 am, Andrew Beekhof <and...@beekhof.net> wrote:
> > On 31 Jul 2014, at 4:46 pm, Cédric Dufour - Idiap Research Institute > <cedric.duf...@idiap.ch> wrote: > >> On 31/07/14 00:17, Andrew Beekhof wrote: >>> On 31 Jul 2014, at 2:48 am, Cédric Dufour - Idiap Research Institute >>> <cedric.duf...@idiap.ch> wrote: >>> >>>> After packaging pacemaker 1.1.12 for Debian/Wheezy (along corosync 1.4.6 >>>> and libqb 0.17.0), I have successfully initialized a new cluster. >>>> >>>> Back to a very simple test cluster, the only problem I have is with >>>> fencing, which fails altogether with "route_ais_message: Sending message >>>> to local.stonith-ng failed: ipc delivery failed (rc=-2)" messages: >>>> >>>> root@bc1hs22a01:~ # tail /var/log/corosync.rsyslog >>>> Jul 30 18:41:41 bc1hs22a01 stonith_admin[5411]: notice: crm_log_args: >>>> Invoked: stonith_admin -F bc1hs22a02 >>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]: notice: handle_request: >>>> Client stonith_admin.5411.fe1388ed wants to fence (off) 'bc1hs22a02' with >>>> device '(any)' >>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]: notice: >>>> initiate_remote_stonith_op: Initiating remote operation off for >>>> bc1hs22a02: 48b69f82-29ad-4c9a-af57-0e60ae5242e4 (0) >>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) >>> rc=-2 is coming from send_client_ipc(void *conn, const AIS_Message * >>> ais_msg) >>> >>> specifically: >>> >>> if (conn == NULL) { >>> rc = -2; >>> >>> So the plugin thinks that stonith-ng isn't connected. >>> More logs? >>> >> >> I have completed a full restart of the cluster in order to provide the logs >> at each step; see attached log files: >> (from node_1/DC) >> - node_1-corosync-start.log >> - node_1-pacemaker-start.log >> - node_1-corosync-node_2_join.log >> - node_1-pacemaker-node_2_join.log >> (from node_2) >> - node_2-corosync-start.log >> - node_2-pacemaker-start.log >> >> The problem manifests itself already in DC start log - because of previous >> fencing attempt - at 08:19:21 and 08:19:42: >> >> root@bc1hs22a01:~ # fgrep 'ipc delivery failed' node_1-corosync-start.log >> Jul 31 08:19:21 bc1hs22a01 corosync[31057]: [pcmk ] WARN: >> route_ais_message: Sending message to local.stonith-ng failed: ipc delivery >> failed (rc=-2) >> Jul 31 08:19:42 bc1hs22a01 corosync[31057]: [pcmk ] WARN: >> route_ais_message: Sending message to local.stonith-ng failed: ipc delivery >> failed (rc=-2) >> >> While it would seem (to me) that the stonith plugin successfully connected >> to the CIB: > > Its not the CIB thats the issue: > >>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]: [pcmk ] WARN: >>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>> delivery failed (rc=-2) > > Thats the pacemaker plugin inside corosync (which uses a completely different > IPC mechanism). It looks like there is a name mismatch: Jul 31 08:19:20 bc1hs22a01 corosync[31057]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2543e30 for stonithd/0 Jul 31 08:19:20 bc1hs22a01 corosync[31057]: [pcmk ] debug: process_ais_message: Msg[1] (dest=local:ais, from=bc1hs22a01:stonithd.31092, remote=true, size=6): 31092 ... Jul 31 08:19:21 bc1hs22a01 corosync[31057]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jul 31 08:19:42 bc1hs22a01 corosync[31057]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Could you try the following patch? diff --git a/lib/ais/plugin.c b/lib/ais/plugin.c index 3d4f369..560e18b 100644 --- a/lib/ais/plugin.c +++ b/lib/ais/plugin.c @@ -1508,6 +1508,9 @@ route_ais_message(const AIS_Message * msg, gboolean local_origin) /* te messages are routed via the crm */ dest = crm_msg_crmd; + } else if (dest == crm_msg_stonith_ng) { + dest = crm_msg_stonithd; + } else if (dest >= SIZEOF(pcmk_children)) { /* Transient client */ > > FWIW, the plugin is extremely deprecated, you're encouraged to use > pacemaker+cman or begin working towards corosync2 + pacemakerd. > >> >> root@bc1hs22a01:~ # fgrep cib_native_signon_raw node_1-pacemaker-start.log >> Jul 31 08:19:20 [31096] bc1hs22a01 crmd: debug: >> cib_native_signon_raw: Connection unsuccessful (0 (nil)) >> Jul 31 08:19:20 [31096] bc1hs22a01 crmd: debug: >> cib_native_signon_raw: Connection to CIB failed: Transport endpoint is >> not connected >> Jul 31 08:19:20 [31092] bc1hs22a01 stonithd: debug: >> cib_native_signon_raw: Connection unsuccessful (0 (nil)) >> Jul 31 08:19:20 [31092] bc1hs22a01 stonithd: debug: >> cib_native_signon_raw: Connection to CIB failed: Transport endpoint is >> not connected >> Jul 31 08:19:21 [31096] bc1hs22a01 crmd: debug: >> cib_native_signon_raw: Connection to CIB successful >> Jul 31 08:19:21 [31092] bc1hs22a01 stonithd: debug: >> cib_native_signon_raw: Connection to CIB successful >> Jul 31 08:19:25 [31094] bc1hs22a01 attrd: debug: >> cib_native_signon_raw: Connection to CIB successful >> >> Best, >> >> Cédric >> >> <node_1-corosync-start.log><node_1-pacemaker-start.log><node_1-corosync-node_2_join.log><node_1-pacemaker-node_2_join.log><node_2-corosync-start.log><node_2-pacemaker-start.log>_______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org