Hi Mathi,

On the first SC, I used amf-state to check the various entities and all of them 
are unlocked/enabled and in-service.  All processes are up and look good. It is 
on the second however, where no components are enabled. I don't think it gets 
to the point of assigning service on  the second SC as I see no further output 
after the cold sync. It is after this point I would expect to see the service 
assignments and transitions occur. Given that opensafd is hanging waiting for 
amfnd to send the nid_notify I am wondering if there is some handshake with 
amfd that is not occurring.   Does the fact that I am not using PLM/OpenHPI 
have any bearing in this?

Chad

-----Original Message-----
From: Mathivanan Naickan Palanivelu [mailto:[email protected]] 
Sent: Thursday, October 01, 2015 5:13 AM
To: Chad Lingrell
Cc: [email protected]
Subject: Re: [users] Second Opensaf controller node hanging after cold sync

Hi,

In what state are the SUs? what is the output of $/etc/init.d/opensafd status 
And the output of the different amf entities(su,si,comp, etc.) on that node 
i.e. output of $ amf-state <entity_name> all

Perhaps, there is some component that is failing to come up?

Thanks,
Mathi.

----- [email protected] wrote:

> Hello,
> 
> I am currently working on getting OpenSaf (without PLM/openhpi) 
> working in a VM environment. I have so far been able to bring up one 
> system controller and get all my required components enabled. However, 
> when I go to bring up the second system controller, it runs through 
> the startup, completes the cold sync and then it appears nothing 
> further happens.  I have determined that opensafd startup on the 
> second SC is hanging on amfnd startup which I surmise is due to the 
> nid_notify call not being hit. It does not seem to be connectivity 
> related since both sides are communicating.
> 
> I have turned on trace debug on for amfd on both SCs and for amfnd on 
> the SC I am trying to bring up, but so far it is not pointing to 
> anything obvious (but then again I am still working to understand the 
> code flow).
> 
> Looking for any pointers to help isolate the issue.
> 
> Version: OpenSAF 4.5.0
> Compiled config: ./configure --disable-ais-plm --enable-tipc 
> --enable-imm-pbe
> 
> In the mds.log I do see an ERR log but I do not know if is related:
> 
> Sep 30 20:39:01.736442 osafamfnd[5679] INFO  |MDTM: User Recd msg
> len=16
> Sep 30 20:39:01.736460 osafamfnd[5679] INFO  |MDTM: Sending data to 
> upper layer for a single recd message Sep 30 20:39:01.736478 
> osafamfnd[5679] DBG  |MCM_DB : Entering :
> mds_svc_tbl_get
> Sep 30 20:39:01.736494 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> mds_svc_tbl_get
> Sep 30 20:39:01.736511 osafamfnd[5679] DBG  |MCM_DB : Entering :
> mds_svc_tbl_get_role
> Sep 30 20:39:01.736528 osafamfnd[5679] DBG  |MCM_DB : Entering :
> mds_svc_tbl_query
> Sep 30 20:39:01.736544 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> mds_svc_tbl_query
> Sep 30 20:39:01.736561 osafamfnd[5679] DBG  |MCM_DB : Entering :
> mds_vdest_tbl_get_role
> Sep 30 20:39:01.736578 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> mds_vdest_tbl_get_role
> Sep 30 20:39:01.736595 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> mds_svc_tbl_get_role
> Sep 30 20:39:01.736612 osafamfnd[5679] DBG  |MDS_SND_RCV : Entering 
> mds_mcm_process_recv_snd_msg_common
> Sep 30 20:39:01.736629 osafamfnd[5679] DBG  |MDS_SND_RCV : Entering 
> mds_mcm_do_decode_full_or_flat Sep 30 20:39:01.736647 osafamfnd[5679] 
> DBG  |MDS_SND_RCV : calling callback ptr Sep 30 20:39:01.736682 
> osafamfnd[5679] DBG  |MDS_SND_RCV : Leaving 
> mds_mcm_do_decode_full_or_flat Sep 30 20:39:01.736751 osafamfnd[5679] 
> ERR  |MDS_SND_RCV: Receive callback failed svc_id = CLMA(35)
>                                                                       
>           ^^^^^
> Sep 30 20:39:01.736803 osafamfnd[5679] DBG  |MCM_DB : Entering :
> mds_svc_tbl_get
> Sep 30 20:39:01.736828 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> mds_svc_tbl_get
> Sep 30 20:39:01.736851 osafamfnd[5679] DBG  |MCM_DB : Entering :
> mds_subtn_tbl_get
> Sep 30 20:39:01.736869 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> mds_subtn_tbl_get
> Sep 30 20:39:01.736887 osafamfnd[5679] INFO  |MDS_SND_RCV:
> Successfully recd the response also
> Sep 30 20:39:01.736905 osafamfnd[5679] INFO  |MDS_SND_RCV: Deleting 
> the sync send entry with xch_id=4
> 
> Here is the output syslog:
> Sep 30 20:38:59 host-s2 opensafd: Starting OpenSAF Services Sep 30 
> 20:38:59 host-s2 kernel: [95954.028402] TIPC: Activated (version 
> 2.0.0) Sep 30 20:38:59 host-s2 kernel: [95954.029613] NET: Registered 
> protocol family 30 Sep 30 20:38:59 host-s2 kernel: [95954.030680] 
> TIPC: Started in single node mode Sep 30 20:38:59 host-s2 kernel: 
> [95954.033582] TIPC: Started in network mode Sep 30 20:38:59 host-s2 
> kernel: [95954.034575] TIPC: Own node address <1.1.2>, network 
> identity 1234 Sep 30 20:38:59 host-s2 kernel: [95954.038442] TIPC: 
> Enabled bearer <eth:eth4>, discovery domain <1.1.0>, priority 10 Sep 
> 30 20:38:59 host-s2 kernel: [95954.040652] TIPC: Established link 
> <1.1.2:eth4-1.1.1:eth4> on network plane A Sep 30 20:38:59 host-s2 
> osafrded[5588]: Started Sep 30 20:38:59 host-s2 osafrded[5588]: NO 
> Peer rde@2010f has active state => Assigning Standby role to this node 
> Sep 30 20:39:00 host-s2 osaffmd[5596]: Started Sep 30 20:39:00 host-s2 
> osafimmd[5605]: Started Sep 30 20:39:00 host-s2 osafimmd[5605]: NO 
> 2PBE configured with
> IMMSV_PEER_SC_MAX_WAIT: 30 seconds
> Sep 30 20:39:00 host-s2 osafimmnd[5618]: Started Sep 30 20:39:00 
> host-s2 osafimmnd[5618]: NO Persistent Back-End capability configured, 
> Pbe file:imm.db (suffix may get added) Sep 30 20:39:00 host-s2 
> osafimmnd[5618]: NO SERVER STATE:
> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Sep 30 20:39:00 
> host-s2 osafimmnd[5618]: NO 2PBE configured, 
> IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Sep 30 20:39:00 host-s2 
> osafimmnd[5618]: NO SERVER STATE:
> IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Sep 30 
> 20:39:00 host-s2 osafimmnd[5618]: NO SERVER STATE:
> IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Sep 30 20:39:00 
> host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_ISOLATED Sep 30 
> 20:39:00 host-s2 osafimmd[5605]: NO SBY: Ruling epoch noted
> as:155
> Sep 30 20:39:00 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 
> 20:39:00 host-s2 osafimmd[5605]: NO SBY:
> SaImmRepositoryInitModeT changed and noted as 'SA_IMM_KEEP_REPOSITORY'
> Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO NODE STATE-> 
> IMM_NODE_W_AVAILABLE Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO 
> SERVER STATE:
> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Sep 30 20:39:01 
> host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2462 
> Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO RepositoryInitModeT is 
> SA_IMM_KEEP_REPOSITORY Sep 30 20:39:01 host-s2 osafimmnd[5618]: WA IMM 
> Access Control mode is DISABLED!
> Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Epoch set to 155 in 
> ImmModel Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for 
> IMMND process at node 2010f old epoch: 154  new epoch:155 Sep 30 
> 20:39:01 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 
> 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process 
> at node 2020f old epoch: 0  new epoch:155 Sep 30 20:39:01 host-s2 
> osafimmnd[5618]: NO SERVER STATE:
> IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Sep 30 20:39:01 host-s2 
> osaflogd[5627]: Started Sep 30 20:39:01 host-s2 osaflogd[5627]: NO No 
> or invalid log service configuration object Sep 30 20:39:01 host-s2 
> osaflogd[5627]: NO Log config system: high 0 low 0, application: high 
> 0 low 0 Sep 30 20:39:01 host-s2 osaflogd[5627]: NO log root directory 
> is:
> /var/log/opensaf/saflog
> Sep 30 20:39:01 host-s2 osaflogd[5627]: NO LGS_MBCSV_VERSION = 3 Sep 
> 30 20:39:01 host-s2 osafntfd[5639]: Started Sep 30 20:38:59 host-s2 
> opensafd: Starting OpenSAF Services Sep 30 20:38:59 host-s2 kernel: 
> [95954.028402] TIPC: Activated (version 2.0.0) Sep 30 20:38:59 host-s2 
> kernel: [95954.029613] NET: Registered protocol family 30 Sep 30 
> 20:38:59 host-s2 kernel: [95954.030680] TIPC: Started in single node 
> mode Sep 30 20:38:59 host-s2 kernel: [95954.033582] TIPC: Started in 
> network mode Sep 30 20:38:59 host-s2 kernel: [95954.034575] TIPC: Own 
> node address <1.1.2>, network identity 1234 Sep 30 20:38:59 host-s2 
> kernel: [95954.038442] TIPC: Enabled bearer <eth:eth4>, discovery 
> domain <1.1.0>, priority 10 Sep 30 20:38:59 host-s2 kernel: 
> [95954.040652] TIPC: Established link <1.1.2:eth4-1.1.1:eth4> on 
> network plane A Sep 30 20:38:59 host-s2 osafrded[5588]: Started Sep 30 
> 20:38:59 host-s2 osafrded[5588]: NO Peer rde@2010f has active state => 
> Assigning Standby role to this node Sep 30 20:39:00 host-s2 
> osaffmd[5596]: Started Sep 30 20:39:00 host-s2 osafimmd[5605]: Started 
> Sep 30 20:39:00 host-s2 osafimmd[5605]: NO 2PBE configured with
> IMMSV_PEER_SC_MAX_WAIT: 30 seconds
> Sep 30 20:39:00 host-s2 osafimmnd[5618]: Started Sep 30 20:39:00 
> host-s2 osafimmnd[5618]: NO Persistent Back-End capability configured, 
> Pbe file:imm.db (suffix may get added) Sep 30 20:39:00 host-s2 
> osafimmnd[5618]: NO SERVER STATE:
> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Sep 30 20:39:00 
> host-s2 osafimmnd[5618]: NO 2PBE configured, 
> IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Sep 30 20:39:00 host-s2 
> osafimmnd[5618]: NO SERVER STATE:
> IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Sep 30 
> 20:39:00 host-s2 osafimmnd[5618]: NO SERVER STATE:
> IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Sep 30 20:39:00 
> host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_ISOLATED Sep 30 
> 20:39:00 host-s2 osafimmd[5605]: NO SBY: Ruling epoch noted
> as:155
> Sep 30 20:39:00 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 
> 20:39:00 host-s2 osafimmd[5605]: NO SBY:
> SaImmRepositoryInitModeT changed and noted as 'SA_IMM_KEEP_REPOSITORY'
> Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO NODE STATE-> 
> IMM_NODE_W_AVAILABLE Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO 
> SERVER STATE:
> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Sep 30 20:39:01 
> host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2462 
> Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO RepositoryInitModeT is 
> SA_IMM_KEEP_REPOSITORY Sep 30 20:39:01 host-s2 osafimmnd[5618]: WA IMM 
> Access Control mode is DISABLED!
> Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Epoch set to 155 in 
> ImmModel Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for 
> IMMND process at node 2010f old epoch: 154  new epoch:155 Sep 30 
> 20:39:01 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 
> 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process 
> at node 2020f old epoch: 0  new epoch:155 Sep 30 20:39:01 host-s2 
> osafimmnd[5618]: NO SERVER STATE:
> IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Sep 30 20:39:01 host-s2 
> osaflogd[5627]: Started Sep 30 20:39:01 host-s2 osaflogd[5627]: NO No 
> or invalid log service configuration object Sep 30 20:39:01 host-s2 
> osaflogd[5627]: NO Log config system: high 0 low 0, application: high 
> 0 low 0 Sep 30 20:39:01 host-s2 osaflogd[5627]: NO log root directory 
> is:
> /var/log/opensaf/saflog
> Sep 30 20:39:01 host-s2 osaflogd[5627]: NO LGS_MBCSV_VERSION = 3 Sep 
> 30 20:39:01 host-s2 osafntfd[5639]: Started Sep 30 20:39:01 host-s2 
> osafntfd[5639]: Started Sep 30 20:39:01 host-s2 osafclmd[5648]: 
> logtrace: trace enabled to file /var/log/opensaf/osafclmd, 
> mask=0xffffffff Sep 30 20:39:01 host-s2 osafclmd[5648]: Started Sep 30 
> 20:39:01 host-s2 osafclmna[5657]: logtrace: trace enabled to file 
> /var/log/opensaf/osafclmna, mask=0xffffffff Sep 30 20:39:01 host-s2 
> osafclmna[5657]: Started Sep 30 20:39:01 host-s2 osafclmna[5657]: NO 
> safNode=SC-2,safCluster=Q50clmCluster Joined cluster, nodeid=2020f Sep 
> 30 20:39:01 host-s2 osafamfd[5665]: logtrace: trace enabled to file 
> /var/log/opensaf/osafamfd, mask=0xffffffff Sep 30 20:39:01 host-s2 
> osafamfd[5665]: Started Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO 
> PBE-OI established on other SC. Dumping incrementally to file imm.db 
> Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO STARTING SLAVE PBE 
> process.
> Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO 
> pbe-db-file-path:/etc/opensaf/imm.db.2020f VETERAN:1 B:0 Sep 30 
> 20:39:01 host-s2 osafimmpbed: IN arg[0] == 
> '/usr/lib64/opensaf/osafimmpbed'
> Sep 30 20:39:01 host-s2 osafimmpbed: IN arg[1] == '--pbe2B'
> Sep 30 20:39:01 host-s2 osafimmpbed: IN arg[2] == 
> '/etc/opensaf/imm.db.2020f'
> Sep 30 20:39:01 host-s2 osafimmpbed: IN Generating DB file from 
> current IMM state. DB file: /etc/opensaf/imm.db.2020f Sep 30 20:39:01 
> host-s2 osafimmpbed: NO Successfully opened empty local sqlite pbe 
> file /tmp/imm.db.RIranA Sep 30 20:39:01 host-s2 osafimmd[5605]: NO 
> SBY: Ruling epoch noted
> as:156
> Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND 
> process at node 2020f old epoch: 155  new epoch:156 Sep 30 20:39:01 
> host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process at node 
> 2010f old epoch: 155  new epoch:156 Sep 30 20:39:01 host-s2 
> osafimmd[5605]: NO IMMND coord at 2010f Sep 30 20:39:01 host-s2 
> osafimmnd[5618]: NO Epoch set to 156 in ImmModel Sep 30 20:39:01 
> host-s2 osafamfd[5665]: NO got si 
> :safSi=Management-2N,safApp=ManagementApp
> Sep 30 20:39:01 host-s2 osafamfd: Last message 'NO got si :safSi=Man'
> repeated 13 times, suppressed by syslog-ng on host-s2 Sep 30 20:39:01 
> host-s2 osafimmnd[5618]: NO Implementer (applier)
> connected: 12 (@safAmfService2020f) <8, 2020f> Sep 30 20:39:01 host-s2 
> osafimmpbed: NO Successfully dumped to file /tmp/imm.db.RIranA Sep 30 
> 20:39:01 host-s2 osafimmpbed: NO Moved /tmp/imm.db.RIranA to 
> /etc/opensaf/imm.db.2020f.tmp Sep 30 20:39:01 host-s2 osafimmpbed: NO 
> Moved /etc/opensaf/imm.db.2020f.tmp to /etc/opensaf/imm.db.2020f Sep 
> 30 20:39:01 host-s2 osafimmpbed: NO Removed obsolete journal file:
> /etc/opensaf/imm.db.2020f-journal
> Sep 30 20:39:01 host-s2 osafimmpbed: NO Removed obsolete db file:
> /etc/opensaf/imm.db.2020f.prev
> Sep 30 20:39:01 host-s2 osafimmpbed: IN Re-attaching to the new 
> version of /etc/opensaf/imm.db.2020f Sep 30 20:39:01 host-s2 
> osafimmpbed: NO Successfully opened pre-existing sqlite pbe file 
> /etc/opensaf/imm.db.2020f Sep 30 20:39:01 host-s2 osafimmpbed: IN 
> saImmRepositoryInit:
> SA_IMM_KEEP_REPOSITORY - attaching to repository Sep 30 20:39:01 
> host-s2 osafimmpbed: NO pbeDaemon starting with
> obj-count:808
> Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Implementer (applier)
> connected: 13 (@OpenSafImmPBE) <403, 2020f> Sep 30 20:39:01 host-s2 
> osafimmnd[5618]: NO Implementer connected: 14
> (OsafImmPbeRt_B) <404, 2020f>
> Sep 30 20:39:01 host-s2 osafamfnd[5679]: logtrace: trace enabled to 
> file /var/log/opensaf/osafamfnd, mask=0xffffffff Sep 30 20:39:01 
> host-s2 osafamfnd[5679]: Started Sep 30 20:39:01 host-s2 
> osafimmd[5605]: NO IMMND coord at 2010f Sep 30 20:39:01 host-s2 
> osafimmpbed: NO Update epoch 156 committing with 
> ccbId:10000000a/4294967306 Sep 30 20:39:03 host-s2 osafimmnd[5618]: NO 
> PBE slave established on this SC. Dumping incrementally to file imm.db 
> Sep 30 20:39:04 host-s2 osafamfd[5665]: NO Cold sync complete!
> 
> Thanks,
> 
> Chad
> ----------------------------------------------------------------------
> -------- _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to