Chad, Is it possible that the standby node is not a part of the cluster? What is the membership status as seen in the CLM node objects? >>> Sep 30 20:39:01.736751 osafamfnd[5679] ERR |MDS_SND_RCV: Receive callback failed svc_id = CLMA(35) >>>
May be the above error is causing AMFND to fail in the processing of the CLM membership callback and thus AMFND is not spawning applications on a node that is not a member of the cluster? If that is the case, then it is strange why the above error is occuring! (I hope the imm.xml is newly generated after re-building with those build options) Mathi. ----- [email protected] wrote: > Hi Mathi, > > On the first SC, I used amf-state to check the various entities and > all of them are unlocked/enabled and in-service. All processes are up > and look good. It is on the second however, where no components are > enabled. I don't think it gets to the point of assigning service on > the second SC as I see no further output after the cold sync. It is > after this point I would expect to see the service assignments and > transitions occur. Given that opensafd is hanging waiting for amfnd to > send the nid_notify I am wondering if there is some handshake with > amfd that is not occurring. Does the fact that I am not using > PLM/OpenHPI have any bearing in this? > > Chad > > -----Original Message----- > From: Mathivanan Naickan Palanivelu [mailto:[email protected]] > > Sent: Thursday, October 01, 2015 5:13 AM > To: Chad Lingrell > Cc: [email protected] > Subject: Re: [users] Second Opensaf controller node hanging after cold > sync > > Hi, > > In what state are the SUs? what is the output of $/etc/init.d/opensafd > status And the output of the different amf entities(su,si,comp, etc.) > on that node i.e. output of $ amf-state <entity_name> all > > Perhaps, there is some component that is failing to come up? > > Thanks, > Mathi. > > ----- [email protected] wrote: > > > Hello, > > > > I am currently working on getting OpenSaf (without PLM/openhpi) > > working in a VM environment. I have so far been able to bring up one > > > system controller and get all my required components enabled. > However, > > when I go to bring up the second system controller, it runs through > > > the startup, completes the cold sync and then it appears nothing > > further happens. I have determined that opensafd startup on the > > second SC is hanging on amfnd startup which I surmise is due to the > > > nid_notify call not being hit. It does not seem to be connectivity > > related since both sides are communicating. > > > > I have turned on trace debug on for amfd on both SCs and for amfnd > on > > the SC I am trying to bring up, but so far it is not pointing to > > anything obvious (but then again I am still working to understand > the > > code flow). > > > > Looking for any pointers to help isolate the issue. > > > > Version: OpenSAF 4.5.0 > > Compiled config: ./configure --disable-ais-plm --enable-tipc > > --enable-imm-pbe > > > > In the mds.log I do see an ERR log but I do not know if is related: > > > > Sep 30 20:39:01.736442 osafamfnd[5679] INFO |MDTM: User Recd msg > > len=16 > > Sep 30 20:39:01.736460 osafamfnd[5679] INFO |MDTM: Sending data to > > > upper layer for a single recd message Sep 30 20:39:01.736478 > > osafamfnd[5679] DBG |MCM_DB : Entering : > > mds_svc_tbl_get > > Sep 30 20:39:01.736494 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > > mds_svc_tbl_get > > Sep 30 20:39:01.736511 osafamfnd[5679] DBG |MCM_DB : Entering : > > mds_svc_tbl_get_role > > Sep 30 20:39:01.736528 osafamfnd[5679] DBG |MCM_DB : Entering : > > mds_svc_tbl_query > > Sep 30 20:39:01.736544 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > > mds_svc_tbl_query > > Sep 30 20:39:01.736561 osafamfnd[5679] DBG |MCM_DB : Entering : > > mds_vdest_tbl_get_role > > Sep 30 20:39:01.736578 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > > mds_vdest_tbl_get_role > > Sep 30 20:39:01.736595 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > > mds_svc_tbl_get_role > > Sep 30 20:39:01.736612 osafamfnd[5679] DBG |MDS_SND_RCV : Entering > > > mds_mcm_process_recv_snd_msg_common > > Sep 30 20:39:01.736629 osafamfnd[5679] DBG |MDS_SND_RCV : Entering > > > mds_mcm_do_decode_full_or_flat Sep 30 20:39:01.736647 > osafamfnd[5679] > > DBG |MDS_SND_RCV : calling callback ptr Sep 30 20:39:01.736682 > > osafamfnd[5679] DBG |MDS_SND_RCV : Leaving > > mds_mcm_do_decode_full_or_flat Sep 30 20:39:01.736751 > osafamfnd[5679] > > ERR |MDS_SND_RCV: Receive callback failed svc_id = CLMA(35) > > > > > ^^^^^ > > Sep 30 20:39:01.736803 osafamfnd[5679] DBG |MCM_DB : Entering : > > mds_svc_tbl_get > > Sep 30 20:39:01.736828 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > > mds_svc_tbl_get > > Sep 30 20:39:01.736851 osafamfnd[5679] DBG |MCM_DB : Entering : > > mds_subtn_tbl_get > > Sep 30 20:39:01.736869 osafamfnd[5679] DBG |MCM_DB : Leaving : S : > > mds_subtn_tbl_get > > Sep 30 20:39:01.736887 osafamfnd[5679] INFO |MDS_SND_RCV: > > Successfully recd the response also > > Sep 30 20:39:01.736905 osafamfnd[5679] INFO |MDS_SND_RCV: Deleting > > > the sync send entry with xch_id=4 > > > > Here is the output syslog: > > Sep 30 20:38:59 host-s2 opensafd: Starting OpenSAF Services Sep 30 > > 20:38:59 host-s2 kernel: [95954.028402] TIPC: Activated (version > > 2.0.0) Sep 30 20:38:59 host-s2 kernel: [95954.029613] NET: > Registered > > protocol family 30 Sep 30 20:38:59 host-s2 kernel: [95954.030680] > > TIPC: Started in single node mode Sep 30 20:38:59 host-s2 kernel: > > [95954.033582] TIPC: Started in network mode Sep 30 20:38:59 host-s2 > > > kernel: [95954.034575] TIPC: Own node address <1.1.2>, network > > identity 1234 Sep 30 20:38:59 host-s2 kernel: [95954.038442] TIPC: > > Enabled bearer <eth:eth4>, discovery domain <1.1.0>, priority 10 Sep > > > 30 20:38:59 host-s2 kernel: [95954.040652] TIPC: Established link > > <1.1.2:eth4-1.1.1:eth4> on network plane A Sep 30 20:38:59 host-s2 > > osafrded[5588]: Started Sep 30 20:38:59 host-s2 osafrded[5588]: NO > > Peer rde@2010f has active state => Assigning Standby role to this > node > > Sep 30 20:39:00 host-s2 osaffmd[5596]: Started Sep 30 20:39:00 > host-s2 > > osafimmd[5605]: Started Sep 30 20:39:00 host-s2 osafimmd[5605]: NO > > 2PBE configured with > > IMMSV_PEER_SC_MAX_WAIT: 30 seconds > > Sep 30 20:39:00 host-s2 osafimmnd[5618]: Started Sep 30 20:39:00 > > host-s2 osafimmnd[5618]: NO Persistent Back-End capability > configured, > > Pbe file:imm.db (suffix may get added) Sep 30 20:39:00 host-s2 > > osafimmnd[5618]: NO SERVER STATE: > > IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Sep 30 20:39:00 > > > host-s2 osafimmnd[5618]: NO 2PBE configured, > > IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Sep 30 20:39:00 host-s2 > > osafimmnd[5618]: NO SERVER STATE: > > IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Sep 30 > > 20:39:00 host-s2 osafimmnd[5618]: NO SERVER STATE: > > IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Sep 30 > 20:39:00 > > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_ISOLATED Sep 30 > > 20:39:00 host-s2 osafimmd[5605]: NO SBY: Ruling epoch noted > > as:155 > > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep > 30 > > 20:39:00 host-s2 osafimmd[5605]: NO SBY: > > SaImmRepositoryInitModeT changed and noted as > 'SA_IMM_KEEP_REPOSITORY' > > Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO NODE STATE-> > > IMM_NODE_W_AVAILABLE Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO > > SERVER STATE: > > IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Sep 30 20:39:01 > > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE > 2462 > > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO RepositoryInitModeT is > > SA_IMM_KEEP_REPOSITORY Sep 30 20:39:01 host-s2 osafimmnd[5618]: WA > IMM > > Access Control mode is DISABLED! > > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Epoch set to 155 in > > ImmModel Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch > for > > IMMND process at node 2010f old epoch: 154 new epoch:155 Sep 30 > > 20:39:01 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 > > 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process > > > at node 2020f old epoch: 0 new epoch:155 Sep 30 20:39:01 host-s2 > > osafimmnd[5618]: NO SERVER STATE: > > IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Sep 30 20:39:01 host-s2 > > > osaflogd[5627]: Started Sep 30 20:39:01 host-s2 osaflogd[5627]: NO > No > > or invalid log service configuration object Sep 30 20:39:01 host-s2 > > > osaflogd[5627]: NO Log config system: high 0 low 0, application: > high > > 0 low 0 Sep 30 20:39:01 host-s2 osaflogd[5627]: NO log root > directory > > is: > > /var/log/opensaf/saflog > > Sep 30 20:39:01 host-s2 osaflogd[5627]: NO LGS_MBCSV_VERSION = 3 Sep > > > 30 20:39:01 host-s2 osafntfd[5639]: Started Sep 30 20:38:59 host-s2 > > > opensafd: Starting OpenSAF Services Sep 30 20:38:59 host-s2 kernel: > > > [95954.028402] TIPC: Activated (version 2.0.0) Sep 30 20:38:59 > host-s2 > > kernel: [95954.029613] NET: Registered protocol family 30 Sep 30 > > 20:38:59 host-s2 kernel: [95954.030680] TIPC: Started in single node > > > mode Sep 30 20:38:59 host-s2 kernel: [95954.033582] TIPC: Started in > > > network mode Sep 30 20:38:59 host-s2 kernel: [95954.034575] TIPC: > Own > > node address <1.1.2>, network identity 1234 Sep 30 20:38:59 host-s2 > > > kernel: [95954.038442] TIPC: Enabled bearer <eth:eth4>, discovery > > domain <1.1.0>, priority 10 Sep 30 20:38:59 host-s2 kernel: > > [95954.040652] TIPC: Established link <1.1.2:eth4-1.1.1:eth4> on > > network plane A Sep 30 20:38:59 host-s2 osafrded[5588]: Started Sep > 30 > > 20:38:59 host-s2 osafrded[5588]: NO Peer rde@2010f has active state > => > > Assigning Standby role to this node Sep 30 20:39:00 host-s2 > > osaffmd[5596]: Started Sep 30 20:39:00 host-s2 osafimmd[5605]: > Started > > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO 2PBE configured with > > IMMSV_PEER_SC_MAX_WAIT: 30 seconds > > Sep 30 20:39:00 host-s2 osafimmnd[5618]: Started Sep 30 20:39:00 > > host-s2 osafimmnd[5618]: NO Persistent Back-End capability > configured, > > Pbe file:imm.db (suffix may get added) Sep 30 20:39:00 host-s2 > > osafimmnd[5618]: NO SERVER STATE: > > IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Sep 30 20:39:00 > > > host-s2 osafimmnd[5618]: NO 2PBE configured, > > IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Sep 30 20:39:00 host-s2 > > osafimmnd[5618]: NO SERVER STATE: > > IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Sep 30 > > 20:39:00 host-s2 osafimmnd[5618]: NO SERVER STATE: > > IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Sep 30 > 20:39:00 > > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_ISOLATED Sep 30 > > 20:39:00 host-s2 osafimmd[5605]: NO SBY: Ruling epoch noted > > as:155 > > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep > 30 > > 20:39:00 host-s2 osafimmd[5605]: NO SBY: > > SaImmRepositoryInitModeT changed and noted as > 'SA_IMM_KEEP_REPOSITORY' > > Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO NODE STATE-> > > IMM_NODE_W_AVAILABLE Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO > > SERVER STATE: > > IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Sep 30 20:39:01 > > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE > 2462 > > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO RepositoryInitModeT is > > SA_IMM_KEEP_REPOSITORY Sep 30 20:39:01 host-s2 osafimmnd[5618]: WA > IMM > > Access Control mode is DISABLED! > > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Epoch set to 155 in > > ImmModel Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch > for > > IMMND process at node 2010f old epoch: 154 new epoch:155 Sep 30 > > 20:39:01 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30 > > 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process > > > at node 2020f old epoch: 0 new epoch:155 Sep 30 20:39:01 host-s2 > > osafimmnd[5618]: NO SERVER STATE: > > IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Sep 30 20:39:01 host-s2 > > > osaflogd[5627]: Started Sep 30 20:39:01 host-s2 osaflogd[5627]: NO > No > > or invalid log service configuration object Sep 30 20:39:01 host-s2 > > > osaflogd[5627]: NO Log config system: high 0 low 0, application: > high > > 0 low 0 Sep 30 20:39:01 host-s2 osaflogd[5627]: NO log root > directory > > is: > > /var/log/opensaf/saflog > > Sep 30 20:39:01 host-s2 osaflogd[5627]: NO LGS_MBCSV_VERSION = 3 Sep > > > 30 20:39:01 host-s2 osafntfd[5639]: Started Sep 30 20:39:01 host-s2 > > > osafntfd[5639]: Started Sep 30 20:39:01 host-s2 osafclmd[5648]: > > logtrace: trace enabled to file /var/log/opensaf/osafclmd, > > mask=0xffffffff Sep 30 20:39:01 host-s2 osafclmd[5648]: Started Sep > 30 > > 20:39:01 host-s2 osafclmna[5657]: logtrace: trace enabled to file > > /var/log/opensaf/osafclmna, mask=0xffffffff Sep 30 20:39:01 host-s2 > > > osafclmna[5657]: Started Sep 30 20:39:01 host-s2 osafclmna[5657]: NO > > > safNode=SC-2,safCluster=Q50clmCluster Joined cluster, nodeid=2020f > Sep > > 30 20:39:01 host-s2 osafamfd[5665]: logtrace: trace enabled to file > > > /var/log/opensaf/osafamfd, mask=0xffffffff Sep 30 20:39:01 host-s2 > > osafamfd[5665]: Started Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO > > > PBE-OI established on other SC. Dumping incrementally to file imm.db > > > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO STARTING SLAVE PBE > > process. > > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO > > pbe-db-file-path:/etc/opensaf/imm.db.2020f VETERAN:1 B:0 Sep 30 > > 20:39:01 host-s2 osafimmpbed: IN arg[0] == > > '/usr/lib64/opensaf/osafimmpbed' > > Sep 30 20:39:01 host-s2 osafimmpbed: IN arg[1] == '--pbe2B' > > Sep 30 20:39:01 host-s2 osafimmpbed: IN arg[2] == > > '/etc/opensaf/imm.db.2020f' > > Sep 30 20:39:01 host-s2 osafimmpbed: IN Generating DB file from > > current IMM state. DB file: /etc/opensaf/imm.db.2020f Sep 30 > 20:39:01 > > host-s2 osafimmpbed: NO Successfully opened empty local sqlite pbe > > file /tmp/imm.db.RIranA Sep 30 20:39:01 host-s2 osafimmd[5605]: NO > > SBY: Ruling epoch noted > > as:156 > > Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND > > > process at node 2020f old epoch: 155 new epoch:156 Sep 30 20:39:01 > > > host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process at node > > > 2010f old epoch: 155 new epoch:156 Sep 30 20:39:01 host-s2 > > osafimmd[5605]: NO IMMND coord at 2010f Sep 30 20:39:01 host-s2 > > osafimmnd[5618]: NO Epoch set to 156 in ImmModel Sep 30 20:39:01 > > host-s2 osafamfd[5665]: NO got si > > :safSi=Management-2N,safApp=ManagementApp > > Sep 30 20:39:01 host-s2 osafamfd: Last message 'NO got si > :safSi=Man' > > repeated 13 times, suppressed by syslog-ng on host-s2 Sep 30 > 20:39:01 > > host-s2 osafimmnd[5618]: NO Implementer (applier) > > connected: 12 (@safAmfService2020f) <8, 2020f> Sep 30 20:39:01 > host-s2 > > osafimmpbed: NO Successfully dumped to file /tmp/imm.db.RIranA Sep > 30 > > 20:39:01 host-s2 osafimmpbed: NO Moved /tmp/imm.db.RIranA to > > /etc/opensaf/imm.db.2020f.tmp Sep 30 20:39:01 host-s2 osafimmpbed: > NO > > Moved /etc/opensaf/imm.db.2020f.tmp to /etc/opensaf/imm.db.2020f Sep > > > 30 20:39:01 host-s2 osafimmpbed: NO Removed obsolete journal file: > > /etc/opensaf/imm.db.2020f-journal > > Sep 30 20:39:01 host-s2 osafimmpbed: NO Removed obsolete db file: > > /etc/opensaf/imm.db.2020f.prev > > Sep 30 20:39:01 host-s2 osafimmpbed: IN Re-attaching to the new > > version of /etc/opensaf/imm.db.2020f Sep 30 20:39:01 host-s2 > > osafimmpbed: NO Successfully opened pre-existing sqlite pbe file > > /etc/opensaf/imm.db.2020f Sep 30 20:39:01 host-s2 osafimmpbed: IN > > saImmRepositoryInit: > > SA_IMM_KEEP_REPOSITORY - attaching to repository Sep 30 20:39:01 > > host-s2 osafimmpbed: NO pbeDaemon starting with > > obj-count:808 > > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Implementer (applier) > > connected: 13 (@OpenSafImmPBE) <403, 2020f> Sep 30 20:39:01 host-s2 > > > osafimmnd[5618]: NO Implementer connected: 14 > > (OsafImmPbeRt_B) <404, 2020f> > > Sep 30 20:39:01 host-s2 osafamfnd[5679]: logtrace: trace enabled to > > > file /var/log/opensaf/osafamfnd, mask=0xffffffff Sep 30 20:39:01 > > host-s2 osafamfnd[5679]: Started Sep 30 20:39:01 host-s2 > > osafimmd[5605]: NO IMMND coord at 2010f Sep 30 20:39:01 host-s2 > > osafimmpbed: NO Update epoch 156 committing with > > ccbId:10000000a/4294967306 Sep 30 20:39:03 host-s2 osafimmnd[5618]: > NO > > PBE slave established on this SC. Dumping incrementally to file > imm.db > > Sep 30 20:39:04 host-s2 osafamfd[5665]: NO Cold sync complete! > > > > Thanks, > > > > Chad > > > ---------------------------------------------------------------------- > > -------- _______________________________________________ > > Opensaf-users mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
