Hi Mathi, On the first SC, I used amf-state to check the various entities and all of them are unlocked/enabled and in-service. All processes are up and look good. It is on the second however, where no components are enabled. I don't think it gets to the point of assigning service on the second SC as I see no further output after the cold sync. It is after this point I would expect to see the service assignments and transitions occur. Given that opensafd is hanging waiting for amfnd to send the nid_notify I am wondering if there is some handshake with amfd that is not occurring. Does the fact that I am not using PLM/OpenHPI have any bearing in this?
Chad -----Original Message----- From: Mathivanan Naickan Palanivelu [mailto:[email protected]] Sent: Thursday, October 01, 2015 5:13 AM To: Chad Lingrell Cc: [email protected] Subject: Re: [users] Second Opensaf controller node hanging after cold sync Hi, In what state are the SUs? what is the output of $/etc/init.d/opensafd status And the output of the different amf entities(su,si,comp, etc.) on that node i.e. output of $ amf-state <entity_name> all Perhaps, there is some component that is failing to come up? Thanks, Mathi. ----- [email protected] wrote: > Hello, > > I am currently working on getting OpenSaf (without PLM/openhpi) > working in a VM environment. I have so far been able to bring up one > system controller and get all my required components enabled. However, > when I go to bring up the second system controller, it runs through > the startup, completes the cold sync and then it appears nothing > further happens. I have determined that opensafd startup on the > second SC is hanging on amfnd startup which I surmise is due to the > nid_notify call not being hit. It does not seem to be connectivity > related since both sides are communicating. > > I have turned on trace debug on for amfd on both SCs and for amfnd on > the SC I am trying to bring up, but so far it is not pointing to > anything obvious (but then again I am still working to understand the > code flow). > > Looking for any pointers to help isolate the issue. > > Version: OpenSAF 4.5.0 > Compiled config: ./configure --disable-ais-plm --enable-tipc > --enable-imm-pbe > > In the mds.log I do see an ERR log but I do not know if is related: > > Sep 30 20:39:01.736442 osafamfnd[5679] INFO |MDTM: User Recd msg > len=16 > Sep 30 20:39:01.736460 osafamfnd[5679] INFO |MDTM: Sending data to > upper layer for a single recd message Sep 30 20:39:01.736478 > osafamfnd[5679] DBG |MCM_DB : Entering : > mds_svc_tbl_get > Sep 30 20:39:01.736494 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > mds_svc_tbl_get > Sep 30 20:39:01.736511 osafamfnd[5679] DBG |MCM_DB : Entering : > mds_svc_tbl_get_role > Sep 30 20:39:01.736528 osafamfnd[5679] DBG |MCM_DB : Entering : > mds_svc_tbl_query > Sep 30 20:39:01.736544 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > mds_svc_tbl_query > Sep 30 20:39:01.736561 osafamfnd[5679] DBG |MCM_DB : Entering : > mds_vdest_tbl_get_role > Sep 30 20:39:01.736578 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > mds_vdest_tbl_get_role > Sep 30 20:39:01.736595 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > mds_svc_tbl_get_role > Sep 30 20:39:01.736612 osafamfnd[5679] DBG |MDS_SND_RCV : Entering > mds_mcm_process_recv_snd_msg_common > Sep 30 20:39:01.736629 osafamfnd[5679] DBG |MDS_SND_RCV : Entering > mds_mcm_do_decode_full_or_flat Sep 30 20:39:01.736647 osafamfnd[5679] > DBG |MDS_SND_RCV : calling callback ptr Sep 30 20:39:01.736682 > osafamfnd[5679] DBG |MDS_SND_RCV : Leaving > mds_mcm_do_decode_full_or_flat Sep 30 20:39:01.736751 osafamfnd[5679] > ERR |MDS_SND_RCV: Receive callback failed svc_id = CLMA(35) > > ^^^^^ > Sep 30 20:39:01.736803 osafamfnd[5679] DBG |MCM_DB : Entering : > mds_svc_tbl_get > Sep 30 20:39:01.736828 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > mds_svc_tbl_get > Sep 30 20:39:01.736851 osafamfnd[5679] DBG |MCM_DB : Entering : > mds_subtn_tbl_get > Sep 30 20:39:01.736869 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > mds_subtn_tbl_get > Sep 30 20:39:01.736887 osafamfnd[5679] INFO |MDS_SND_RCV: > Successfully recd the response also > Sep 30 20:39:01.736905 osafamfnd[5679] INFO |MDS_SND_RCV: Deleting > the sync send entry with xch_id=4 > > Here is the output syslog: > Sep 30 20:38:59 host-s2 opensafd: Starting OpenSAF Services Sep 30 > 20:38:59 host-s2 kernel: [95954.028402] TIPC: Activated (version > 2.0.0) Sep 30 20:38:59 host-s2 kernel: [95954.029613] NET: Registered > protocol family 30 Sep 30 20:38:59 host-s2 kernel: [95954.030680] > TIPC: Started in single node mode Sep 30 20:38:59 host-s2 kernel: > [95954.033582] TIPC: Started in network mode Sep 30 20:38:59 host-s2 > kernel: [95954.034575] TIPC: Own node address <1.1.2>, network > identity 1234 Sep 30 20:38:59 host-s2 kernel: [95954.038442] TIPC: > Enabled bearer <eth:eth4>, discovery domain <1.1.0>, priority 10 Sep > 30 20:38:59 host-s2 kernel: [95954.040652] TIPC: Established link > <1.1.2:eth4-1.1.1:eth4> on network plane A Sep 30 20:38:59 host-s2 > osafrded[5588]: Started Sep 30 20:38:59 host-s2 osafrded[5588]: NO > Peer rde@2010f has active state => Assigning Standby role to this node > Sep 30 20:39:00 host-s2 osaffmd[5596]: Started Sep 30 20:39:00 host-s2 > osafimmd[5605]: Started Sep 30 20:39:00 host-s2 osafimmd[5605]: NO > 2PBE configured with > IMMSV_PEER_SC_MAX_WAIT: 30 seconds > Sep 30 20:39:00 host-s2 osafimmnd[5618]: Started Sep 30 20:39:00 > host-s2 osafimmnd[5618]: NO Persistent Back-End capability configured, > Pbe file:imm.db (suffix may get added) Sep 30 20:39:00 host-s2 > osafimmnd[5618]: NO SERVER STATE: > IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Sep 30 20:39:00 > host-s2 osafimmnd[5618]: NO 2PBE configured, > IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Sep 30 20:39:00 host-s2 > osafimmnd[5618]: NO SERVER STATE: > IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Sep 30 > 20:39:00 host-s2 osafimmnd[5618]: NO SERVER STATE: > IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Sep 30 20:39:00 > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_ISOLATED Sep 30 > 20:39:00 host-s2 osafimmd[5605]: NO SBY: Ruling epoch noted > as:155 > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 > 20:39:00 host-s2 osafimmd[5605]: NO SBY: > SaImmRepositoryInitModeT changed and noted as 'SA_IMM_KEEP_REPOSITORY' > Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO NODE STATE-> > IMM_NODE_W_AVAILABLE Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO > SERVER STATE: > IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Sep 30 20:39:01 > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2462 > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO RepositoryInitModeT is > SA_IMM_KEEP_REPOSITORY Sep 30 20:39:01 host-s2 osafimmnd[5618]: WA IMM > Access Control mode is DISABLED! > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Epoch set to 155 in > ImmModel Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for > IMMND process at node 2010f old epoch: 154 new epoch:155 Sep 30 > 20:39:01 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 > 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process > at node 2020f old epoch: 0 new epoch:155 Sep 30 20:39:01 host-s2 > osafimmnd[5618]: NO SERVER STATE: > IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Sep 30 20:39:01 host-s2 > osaflogd[5627]: Started Sep 30 20:39:01 host-s2 osaflogd[5627]: NO No > or invalid log service configuration object Sep 30 20:39:01 host-s2 > osaflogd[5627]: NO Log config system: high 0 low 0, application: high > 0 low 0 Sep 30 20:39:01 host-s2 osaflogd[5627]: NO log root directory > is: > /var/log/opensaf/saflog > Sep 30 20:39:01 host-s2 osaflogd[5627]: NO LGS_MBCSV_VERSION = 3 Sep > 30 20:39:01 host-s2 osafntfd[5639]: Started Sep 30 20:38:59 host-s2 > opensafd: Starting OpenSAF Services Sep 30 20:38:59 host-s2 kernel: > [95954.028402] TIPC: Activated (version 2.0.0) Sep 30 20:38:59 host-s2 > kernel: [95954.029613] NET: Registered protocol family 30 Sep 30 > 20:38:59 host-s2 kernel: [95954.030680] TIPC: Started in single node > mode Sep 30 20:38:59 host-s2 kernel: [95954.033582] TIPC: Started in > network mode Sep 30 20:38:59 host-s2 kernel: [95954.034575] TIPC: Own > node address <1.1.2>, network identity 1234 Sep 30 20:38:59 host-s2 > kernel: [95954.038442] TIPC: Enabled bearer <eth:eth4>, discovery > domain <1.1.0>, priority 10 Sep 30 20:38:59 host-s2 kernel: > [95954.040652] TIPC: Established link <1.1.2:eth4-1.1.1:eth4> on > network plane A Sep 30 20:38:59 host-s2 osafrded[5588]: Started Sep 30 > 20:38:59 host-s2 osafrded[5588]: NO Peer rde@2010f has active state => > Assigning Standby role to this node Sep 30 20:39:00 host-s2 > osaffmd[5596]: Started Sep 30 20:39:00 host-s2 osafimmd[5605]: Started > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO 2PBE configured with > IMMSV_PEER_SC_MAX_WAIT: 30 seconds > Sep 30 20:39:00 host-s2 osafimmnd[5618]: Started Sep 30 20:39:00 > host-s2 osafimmnd[5618]: NO Persistent Back-End capability configured, > Pbe file:imm.db (suffix may get added) Sep 30 20:39:00 host-s2 > osafimmnd[5618]: NO SERVER STATE: > IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Sep 30 20:39:00 > host-s2 osafimmnd[5618]: NO 2PBE configured, > IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Sep 30 20:39:00 host-s2 > osafimmnd[5618]: NO SERVER STATE: > IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Sep 30 > 20:39:00 host-s2 osafimmnd[5618]: NO SERVER STATE: > IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Sep 30 20:39:00 > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_ISOLATED Sep 30 > 20:39:00 host-s2 osafimmd[5605]: NO SBY: Ruling epoch noted > as:155 > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 > 20:39:00 host-s2 osafimmd[5605]: NO SBY: > SaImmRepositoryInitModeT changed and noted as 'SA_IMM_KEEP_REPOSITORY' > Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO NODE STATE-> > IMM_NODE_W_AVAILABLE Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO > SERVER STATE: > IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Sep 30 20:39:01 > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2462 > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO RepositoryInitModeT is > SA_IMM_KEEP_REPOSITORY Sep 30 20:39:01 host-s2 osafimmnd[5618]: WA IMM > Access Control mode is DISABLED! > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Epoch set to 155 in > ImmModel Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for > IMMND process at node 2010f old epoch: 154 new epoch:155 Sep 30 > 20:39:01 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 > 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process > at node 2020f old epoch: 0 new epoch:155 Sep 30 20:39:01 host-s2 > osafimmnd[5618]: NO SERVER STATE: > IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Sep 30 20:39:01 host-s2 > osaflogd[5627]: Started Sep 30 20:39:01 host-s2 osaflogd[5627]: NO No > or invalid log service configuration object Sep 30 20:39:01 host-s2 > osaflogd[5627]: NO Log config system: high 0 low 0, application: high > 0 low 0 Sep 30 20:39:01 host-s2 osaflogd[5627]: NO log root directory > is: > /var/log/opensaf/saflog > Sep 30 20:39:01 host-s2 osaflogd[5627]: NO LGS_MBCSV_VERSION = 3 Sep > 30 20:39:01 host-s2 osafntfd[5639]: Started Sep 30 20:39:01 host-s2 > osafntfd[5639]: Started Sep 30 20:39:01 host-s2 osafclmd[5648]: > logtrace: trace enabled to file /var/log/opensaf/osafclmd, > mask=0xffffffff Sep 30 20:39:01 host-s2 osafclmd[5648]: Started Sep 30 > 20:39:01 host-s2 osafclmna[5657]: logtrace: trace enabled to file > /var/log/opensaf/osafclmna, mask=0xffffffff Sep 30 20:39:01 host-s2 > osafclmna[5657]: Started Sep 30 20:39:01 host-s2 osafclmna[5657]: NO > safNode=SC-2,safCluster=Q50clmCluster Joined cluster, nodeid=2020f Sep > 30 20:39:01 host-s2 osafamfd[5665]: logtrace: trace enabled to file > /var/log/opensaf/osafamfd, mask=0xffffffff Sep 30 20:39:01 host-s2 > osafamfd[5665]: Started Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO > PBE-OI established on other SC. Dumping incrementally to file imm.db > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO STARTING SLAVE PBE > process. > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO > pbe-db-file-path:/etc/opensaf/imm.db.2020f VETERAN:1 B:0 Sep 30 > 20:39:01 host-s2 osafimmpbed: IN arg[0] == > '/usr/lib64/opensaf/osafimmpbed' > Sep 30 20:39:01 host-s2 osafimmpbed: IN arg[1] == '--pbe2B' > Sep 30 20:39:01 host-s2 osafimmpbed: IN arg[2] == > '/etc/opensaf/imm.db.2020f' > Sep 30 20:39:01 host-s2 osafimmpbed: IN Generating DB file from > current IMM state. DB file: /etc/opensaf/imm.db.2020f Sep 30 20:39:01 > host-s2 osafimmpbed: NO Successfully opened empty local sqlite pbe > file /tmp/imm.db.RIranA Sep 30 20:39:01 host-s2 osafimmd[5605]: NO > SBY: Ruling epoch noted > as:156 > Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND > process at node 2020f old epoch: 155 new epoch:156 Sep 30 20:39:01 > host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process at node > 2010f old epoch: 155 new epoch:156 Sep 30 20:39:01 host-s2 > osafimmd[5605]: NO IMMND coord at 2010f Sep 30 20:39:01 host-s2 > osafimmnd[5618]: NO Epoch set to 156 in ImmModel Sep 30 20:39:01 > host-s2 osafamfd[5665]: NO got si > :safSi=Management-2N,safApp=ManagementApp > Sep 30 20:39:01 host-s2 osafamfd: Last message 'NO got si :safSi=Man' > repeated 13 times, suppressed by syslog-ng on host-s2 Sep 30 20:39:01 > host-s2 osafimmnd[5618]: NO Implementer (applier) > connected: 12 (@safAmfService2020f) <8, 2020f> Sep 30 20:39:01 host-s2 > osafimmpbed: NO Successfully dumped to file /tmp/imm.db.RIranA Sep 30 > 20:39:01 host-s2 osafimmpbed: NO Moved /tmp/imm.db.RIranA to > /etc/opensaf/imm.db.2020f.tmp Sep 30 20:39:01 host-s2 osafimmpbed: NO > Moved /etc/opensaf/imm.db.2020f.tmp to /etc/opensaf/imm.db.2020f Sep > 30 20:39:01 host-s2 osafimmpbed: NO Removed obsolete journal file: > /etc/opensaf/imm.db.2020f-journal > Sep 30 20:39:01 host-s2 osafimmpbed: NO Removed obsolete db file: > /etc/opensaf/imm.db.2020f.prev > Sep 30 20:39:01 host-s2 osafimmpbed: IN Re-attaching to the new > version of /etc/opensaf/imm.db.2020f Sep 30 20:39:01 host-s2 > osafimmpbed: NO Successfully opened pre-existing sqlite pbe file > /etc/opensaf/imm.db.2020f Sep 30 20:39:01 host-s2 osafimmpbed: IN > saImmRepositoryInit: > SA_IMM_KEEP_REPOSITORY - attaching to repository Sep 30 20:39:01 > host-s2 osafimmpbed: NO pbeDaemon starting with > obj-count:808 > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Implementer (applier) > connected: 13 (@OpenSafImmPBE) <403, 2020f> Sep 30 20:39:01 host-s2 > osafimmnd[5618]: NO Implementer connected: 14 > (OsafImmPbeRt_B) <404, 2020f> > Sep 30 20:39:01 host-s2 osafamfnd[5679]: logtrace: trace enabled to > file /var/log/opensaf/osafamfnd, mask=0xffffffff Sep 30 20:39:01 > host-s2 osafamfnd[5679]: Started Sep 30 20:39:01 host-s2 > osafimmd[5605]: NO IMMND coord at 2010f Sep 30 20:39:01 host-s2 > osafimmpbed: NO Update epoch 156 committing with > ccbId:10000000a/4294967306 Sep 30 20:39:03 host-s2 osafimmnd[5618]: NO > PBE slave established on this SC. Dumping incrementally to file imm.db > Sep 30 20:39:04 host-s2 osafamfd[5665]: NO Cold sync complete! > > Thanks, > > Chad > ---------------------------------------------------------------------- > -------- _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
