I have a system with 6 nodes, two as controllers, 4 as payloads. Controller switch-over will be trigger every night. In some cases seems payload immnd will send some message but controller will not be able to send message back. In this case clms will not be able to send message also, thus block all nodes including rebooted controller joining cluster. Please see message log below:
May 6 20:48:40 localhost osafdtmd[2725]: DTM: add New incoming connection to fd : 21 May 6 20:48:40 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1040f old epoch: 7 new epoch:0 May 6 20:48:40 localhost osafimmd[2765]: Detected new IMMND process at node 1040f old epoch: 7 new epoch:0 May 6 20:48:40 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:40 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:41 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:41 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:42 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:42 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:43 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:43 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:44 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:44 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:45 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:45 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:46 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:46 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:47 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:47 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:48 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:48 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:49 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:49 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:50 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:50 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:51 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:51 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:52 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:52 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:53 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:53 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:54 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:54 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:55 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:55 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:56 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:56 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:57 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:57 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:58 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:58 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:48:59 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:48:59 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:00 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:00 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:01 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:01 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:02 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:02 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:03 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:03 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:04 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:04 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:05 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:05 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:06 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:06 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:07 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:07 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:08 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:08 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:09 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:09 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:10 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:10 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:11 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:11 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:12 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:12 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:13 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:13 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:14 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:14 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:15 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:15 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:16 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:16 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:17 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:17 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:18 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:18 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:19 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:19 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:20 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:20 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:21 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:21 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:22 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:22 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:23 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:23 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:24 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:24 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:25 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:25 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:26 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:26 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:27 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:27 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:28 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:28 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:29 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:29 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:30 localhost osafimmd[2765]: IMMD - MDS Send Failed May 6 20:49:30 localhost osafimmd[2765]: Failed to send accept message to IMMND 1040f May 6 20:49:31 localhost osafimmnd[2778]: Global discard node received for nodeId:1040f pid:1079 May 6 20:49:31 localhost osafimmnd[2778]: Implementer disconnected 12 <0, 1040f(down)> (MsgQueueService66575) May 6 20:49:46 localhost osafimmd[2765]: Node 1040f request sync sync-pid:1324 epoch:0 May 6 20:49:48 localhost osafimmnd[2778]: Announce sync, epoch:8 May 6 20:49:48 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER May 6 20:49:48 localhost osafimmd[2765]: Successfully announced sync. New ruling epoch:8 May 6 20:49:48 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_R_AVAILABLE May 6 20:49:48 localhost immload: Sync starting May 6 20:49:48 localhost immload: Synced 623 objects in total May 6 20:49:48 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_FULLY_AVAILABLE 12197 May 6 20:49:48 localhost osafimmnd[2778]: Epoch set to 8 in ImmModel May 6 20:49:48 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1050f old epoch: 7 new epoch:8 May 6 20:49:48 localhost immload: Sync ending normally May 6 20:49:48 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1030f old epoch: 7 new epoch:8 May 6 20:49:48 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 10e0f old epoch: 7 new epoch:8 May 6 20:49:48 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1040f old epoch: 0 new epoch:8 May 6 20:49:48 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM SERVER READY May 6 20:49:49 localhost osafclmd[2825]: Duplicate node join request for CLM node: 'PL-4'. Specify a unique node name in/etc/opensaf/node_name May 6 20:50:04 localhost osafclmd[2825]: Duplicate node join request for CLM node: 'PL-4'. Specify a unique node name in/etc/opensaf/node_name May 6 20:50:19 localhost osafclmd[2825]: Duplicate node join request for CLM node: 'PL-4'. Specify a unique node name in/etc/opensaf/node_name May 6 20:50:19 localhost osafdtmd[2725]: DTM:dtm_comm_socket_recv() failed rc : 21 May 6 20:50:19 localhost osafimmnd[2778]: Global discard node received for nodeId:1040f pid:1324 May 6 20:50:19 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:50:19 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:50:19 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:50:19 localhost osafimmnd[2778]: Implementer connected: 30 (MsgQueueService66575) <551, 1050f> May 6 20:50:19 localhost osafimmnd[2778]: Implementer locally disconnected. Marking it as doomed 30 <551, 1050f> (MsgQueueService66575) May 6 20:50:19 localhost osafimmnd[2778]: Implementer disconnected 30 <551, 1050f> (MsgQueueService66575) May 6 20:51:55 localhost kernel: : device bond1.120 entered promiscuous mode May 6 20:51:55 localhost kernel: : device bond1 entered promiscuous mode May 6 20:51:55 localhost kernel: : device eth4 entered promiscuous mode May 6 20:52:11 localhost osafdtmd[2725]: DTM: add New incoming connection to fd : 21 May 6 20:52:11 localhost osafimmd[2765]: Node 1040f request sync sync-pid:1079 epoch:0 May 6 20:52:11 localhost osafimmnd[2778]: Announce sync, epoch:9 May 6 20:52:11 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER May 6 20:52:11 localhost osafimmd[2765]: Successfully announced sync. New ruling epoch:9 May 6 20:52:11 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_R_AVAILABLE May 6 20:52:11 localhost immload: Sync starting May 6 20:52:12 localhost immload: Synced 622 objects in total May 6 20:52:12 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_FULLY_AVAILABLE 12197 May 6 20:52:12 localhost osafimmnd[2778]: Epoch set to 9 in ImmModel May 6 20:52:12 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1050f old epoch: 8 new epoch:9 May 6 20:52:12 localhost immload: Sync ending normally May 6 20:52:12 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1030f old epoch: 8 new epoch:9 May 6 20:52:12 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 10e0f old epoch: 8 new epoch:9 May 6 20:52:12 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1040f old epoch: 0 new epoch:9 May 6 20:52:12 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM SERVER READY May 6 20:52:12 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:52:12 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:52:12 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:52:21 localhost osafdtmd[2725]: DTM: add New incoming connection to fd : 81 May 6 20:52:21 localhost osafimmd[2765]: New IMMND process is on STANDBY Controller at 1060f May 6 20:52:21 localhost osafimmd[2765]: IMMND on controller (not currently coord) requests sync May 6 20:52:21 localhost osafimmd[2765]: Node 1060f request sync sync-pid:2619 epoch:0 May 6 20:52:22 localhost osafimmnd[2778]: Announce sync, epoch:10 May 6 20:52:22 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_READY --> IMM_SERVER_SYNC_SERVER May 6 20:52:22 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_R_AVAILABLE May 6 20:52:22 localhost osafimmd[2765]: Successfully announced sync. New ruling epoch:10 May 6 20:52:22 localhost immload: Sync starting May 6 20:52:27 localhost immload: Synced 622 objects in total May 6 20:52:27 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_FULLY_AVAILABLE 12197 May 6 20:52:27 localhost immload: Sync ending normally May 6 20:52:28 localhost osafimmnd[2778]: Epoch set to 10 in ImmModel May 6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1050f old epoch: 9 new epoch:10 May 6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1060f old epoch: 0 new epoch:10 May 6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1030f old epoch: 9 new epoch:10 May 6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 10e0f old epoch: 9 new epoch:10 May 6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at node 1040f old epoch: 9 new epoch:10 May 6 20:52:28 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_SYNC_SERVER --> IMM SERVER READY May 6 20:52:28 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:52:28 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:52:28 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2 May 6 20:52:29 localhost osafimmnd[2778]: Implementer (applier) connected: 31 (@safAmfService1060f) <0, 1060f> Thanks. Ted ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
