I have a system with 6 nodes, two as controllers, 4 as payloads. Controller 
switch-over will be trigger every  night. In some cases seems payload immnd 
will send some message but controller will not be able to send message back. In 
this case clms will not be able to send message also, thus block all nodes 
including rebooted controller joining cluster. Please see message log below:

May  6 20:48:40 localhost osafdtmd[2725]: DTM: add New incoming connection to 
fd : 21
May  6 20:48:40 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1040f old epoch: 7  new epoch:0
May  6 20:48:40 localhost osafimmd[2765]: Detected new IMMND process at node 
1040f old epoch: 7  new epoch:0
May  6 20:48:40 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:40 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:41 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:41 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:42 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:42 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:43 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:43 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:44 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:44 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:45 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:45 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:46 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:46 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:47 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:47 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:48 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:48 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:49 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:49 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:50 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:50 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:51 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:51 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:52 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:52 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:53 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:53 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:54 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:54 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:55 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:55 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:56 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:56 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:57 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:57 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:58 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:58 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:48:59 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:48:59 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:00 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:00 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:01 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:01 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:02 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:02 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:03 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:03 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:04 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:04 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:05 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:05 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:06 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:06 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:07 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:07 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:08 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:08 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:09 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:09 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:10 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:10 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:11 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:11 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:12 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:12 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:13 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:13 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:14 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:14 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:15 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:15 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:16 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:16 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:17 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:17 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:18 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:18 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:19 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:19 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:20 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:20 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:21 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:21 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:22 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:22 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:23 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:23 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:24 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:24 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:25 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:25 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:26 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:26 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:27 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:27 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:28 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:28 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:29 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:29 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:30 localhost osafimmd[2765]: IMMD - MDS Send Failed
May  6 20:49:30 localhost osafimmd[2765]: Failed to send accept message to 
IMMND 1040f
May  6 20:49:31 localhost osafimmnd[2778]: Global discard node received for 
nodeId:1040f pid:1079
May  6 20:49:31 localhost osafimmnd[2778]: Implementer disconnected 12 <0, 
1040f(down)> (MsgQueueService66575)
May  6 20:49:46 localhost osafimmd[2765]: Node 1040f request sync sync-pid:1324 
epoch:0
May  6 20:49:48 localhost osafimmnd[2778]: Announce sync, epoch:8
May  6 20:49:48 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
May  6 20:49:48 localhost osafimmd[2765]: Successfully announced sync. New 
ruling epoch:8
May  6 20:49:48 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_R_AVAILABLE
May  6 20:49:48 localhost immload: Sync starting
May  6 20:49:48 localhost immload: Synced 623 objects in total
May  6 20:49:48 localhost osafimmnd[2778]: NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 12197
May  6 20:49:48 localhost osafimmnd[2778]: Epoch set to 8 in ImmModel
May  6 20:49:48 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1050f old epoch: 7  new epoch:8
May  6 20:49:48 localhost immload: Sync ending normally
May  6 20:49:48 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1030f old epoch: 7  new epoch:8
May  6 20:49:48 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 10e0f old epoch: 7  new epoch:8
May  6 20:49:48 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1040f old epoch: 0  new epoch:8
May  6 20:49:48 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM SERVER READY
May  6 20:49:49 localhost osafclmd[2825]: Duplicate node join request for CLM 
node: 'PL-4'. Specify a unique node name in/etc/opensaf/node_name
May  6 20:50:04 localhost osafclmd[2825]: Duplicate node join request for CLM 
node: 'PL-4'. Specify a unique node name in/etc/opensaf/node_name
May  6 20:50:19 localhost osafclmd[2825]: Duplicate node join request for CLM 
node: 'PL-4'. Specify a unique node name in/etc/opensaf/node_name
May  6 20:50:19 localhost osafdtmd[2725]: DTM:dtm_comm_socket_recv() failed rc 
: 21
May  6 20:50:19 localhost osafimmnd[2778]: Global discard node received for 
nodeId:1040f pid:1324
May  6 20:50:19 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:50:19 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:50:19 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:50:19 localhost osafimmnd[2778]: Implementer connected: 30 
(MsgQueueService66575) <551, 1050f>
May  6 20:50:19 localhost osafimmnd[2778]: Implementer locally disconnected. 
Marking it as doomed 30 <551, 1050f> (MsgQueueService66575)
May  6 20:50:19 localhost osafimmnd[2778]: Implementer disconnected 30 <551, 
1050f> (MsgQueueService66575)
May  6 20:51:55 localhost kernel: : device bond1.120 entered promiscuous mode
May  6 20:51:55 localhost kernel: : device bond1 entered promiscuous mode
May  6 20:51:55 localhost kernel: : device eth4 entered promiscuous mode
May  6 20:52:11 localhost osafdtmd[2725]: DTM: add New incoming connection to 
fd : 21
May  6 20:52:11 localhost osafimmd[2765]: Node 1040f request sync sync-pid:1079 
epoch:0
May  6 20:52:11 localhost osafimmnd[2778]: Announce sync, epoch:9
May  6 20:52:11 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
May  6 20:52:11 localhost osafimmd[2765]: Successfully announced sync. New 
ruling epoch:9
May  6 20:52:11 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_R_AVAILABLE
May  6 20:52:11 localhost immload: Sync starting
May  6 20:52:12 localhost immload: Synced 622 objects in total
May  6 20:52:12 localhost osafimmnd[2778]: NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 12197
May  6 20:52:12 localhost osafimmnd[2778]: Epoch set to 9 in ImmModel
May  6 20:52:12 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1050f old epoch: 8  new epoch:9
May  6 20:52:12 localhost immload: Sync ending normally
May  6 20:52:12 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1030f old epoch: 8  new epoch:9
May  6 20:52:12 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 10e0f old epoch: 8  new epoch:9
May  6 20:52:12 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1040f old epoch: 0  new epoch:9
May  6 20:52:12 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM SERVER READY
May  6 20:52:12 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:52:12 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:52:12 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:52:21 localhost osafdtmd[2725]: DTM: add New incoming connection to 
fd : 81
May  6 20:52:21 localhost osafimmd[2765]: New IMMND process is on STANDBY 
Controller at 1060f
May  6 20:52:21 localhost osafimmd[2765]: IMMND on controller (not currently 
coord) requests sync
May  6 20:52:21 localhost osafimmd[2765]: Node 1060f request sync sync-pid:2619 
epoch:0
May  6 20:52:22 localhost osafimmnd[2778]: Announce sync, epoch:10
May  6 20:52:22 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
May  6 20:52:22 localhost osafimmnd[2778]: NODE STATE-> IMM_NODE_R_AVAILABLE
May  6 20:52:22 localhost osafimmd[2765]: Successfully announced sync. New 
ruling epoch:10
May  6 20:52:22 localhost immload: Sync starting
May  6 20:52:27 localhost immload: Synced 622 objects in total
May  6 20:52:27 localhost osafimmnd[2778]: NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 12197
May  6 20:52:27 localhost immload: Sync ending normally
May  6 20:52:28 localhost osafimmnd[2778]: Epoch set to 10 in ImmModel
May  6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1050f old epoch: 9  new epoch:10
May  6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1060f old epoch: 0  new epoch:10
May  6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1030f old epoch: 9  new epoch:10
May  6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 10e0f old epoch: 9  new epoch:10
May  6 20:52:28 localhost osafimmd[2765]: ACT: New Epoch for IMMND process at 
node 1040f old epoch: 9  new epoch:10
May  6 20:52:28 localhost osafimmnd[2778]: SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM SERVER READY
May  6 20:52:28 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:52:28 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:52:28 localhost osafclmd[2825]: clms_mds_msg_send FAILED: 2
May  6 20:52:29 localhost osafimmnd[2778]: Implementer (applier) connected: 31 
(@safAmfService1060f) <0, 1060f>

Thanks.

Ted
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to