I have resolved the issue.  One of the components being initialized performs 
interface reconfiguration (removing/rebinding/renaming) of a couple of 
interfaces (eth2/eth3).  Tipc was configured against an interface above these 
two interfaces (eth4) and (for a reason I have not yet determined) this 
reconfiguration interfered with the tipc link.  Moving tipc back to eth0 made 
this issue disappear.

Chad

-----Original Message-----
From: Mathivanan Naickan Palanivelu [mailto:[email protected]] 
Sent: Thursday, October 01, 2015 9:30 AM
To: Chad Lingrell
Cc: [email protected]
Subject: Re: [users] Second Opensaf controller node hanging after cold sync

Chad,

Is it possible that the standby node is not a part of the cluster?
What is the membership status as seen in the CLM node objects?
>>>
Sep 30 20:39:01.736751 osafamfnd[5679] ERR  |MDS_SND_RCV: Receive callback 
failed svc_id = CLMA(35)
>>>

May be the above error is causing AMFND to fail in the processing of the CLM 
membership callback and thus AMFND is not spawning applications on a node that 
is not a member of the cluster?
If that is the case, then it is strange why the above error is occuring!

(I hope the imm.xml is newly generated after re-building with those build 
options)

Mathi.


----- [email protected] wrote:

> Hi Mathi,
> 
> On the first SC, I used amf-state to check the various entities and 
> all of them are unlocked/enabled and in-service.  All processes are up 
> and look good. It is on the second however, where no components are 
> enabled. I don't think it gets to the point of assigning service on 
> the second SC as I see no further output after the cold sync. It is 
> after this point I would expect to see the service assignments and 
> transitions occur. Given that opensafd is hanging waiting for amfnd to 
> send the nid_notify I am wondering if there is some handshake with
> amfd that is not occurring.   Does the fact that I am not using
> PLM/OpenHPI have any bearing in this?
> 
> Chad
> 
> -----Original Message-----
> From: Mathivanan Naickan Palanivelu [mailto:[email protected]]
> 
> Sent: Thursday, October 01, 2015 5:13 AM
> To: Chad Lingrell
> Cc: [email protected]
> Subject: Re: [users] Second Opensaf controller node hanging after cold 
> sync
> 
> Hi,
> 
> In what state are the SUs? what is the output of $/etc/init.d/opensafd 
> status And the output of the different amf entities(su,si,comp, etc.) 
> on that node i.e. output of $ amf-state <entity_name> all
> 
> Perhaps, there is some component that is failing to come up?
> 
> Thanks,
> Mathi.
> 
> ----- [email protected] wrote:
> 
> > Hello,
> > 
> > I am currently working on getting OpenSaf (without PLM/openhpi) 
> > working in a VM environment. I have so far been able to bring up one
> 
> > system controller and get all my required components enabled.
> However,
> > when I go to bring up the second system controller, it runs through
> 
> > the startup, completes the cold sync and then it appears nothing 
> > further happens.  I have determined that opensafd startup on the 
> > second SC is hanging on amfnd startup which I surmise is due to the
> 
> > nid_notify call not being hit. It does not seem to be connectivity 
> > related since both sides are communicating.
> > 
> > I have turned on trace debug on for amfd on both SCs and for amfnd
> on
> > the SC I am trying to bring up, but so far it is not pointing to 
> > anything obvious (but then again I am still working to understand
> the
> > code flow).
> > 
> > Looking for any pointers to help isolate the issue.
> > 
> > Version: OpenSAF 4.5.0
> > Compiled config: ./configure --disable-ais-plm --enable-tipc 
> > --enable-imm-pbe
> > 
> > In the mds.log I do see an ERR log but I do not know if is related:
> > 
> > Sep 30 20:39:01.736442 osafamfnd[5679] INFO  |MDTM: User Recd msg
> > len=16
> > Sep 30 20:39:01.736460 osafamfnd[5679] INFO  |MDTM: Sending data to
> 
> > upper layer for a single recd message Sep 30 20:39:01.736478 
> > osafamfnd[5679] DBG  |MCM_DB : Entering :
> > mds_svc_tbl_get
> > Sep 30 20:39:01.736494 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> > mds_svc_tbl_get
> > Sep 30 20:39:01.736511 osafamfnd[5679] DBG  |MCM_DB : Entering :
> > mds_svc_tbl_get_role
> > Sep 30 20:39:01.736528 osafamfnd[5679] DBG  |MCM_DB : Entering :
> > mds_svc_tbl_query
> > Sep 30 20:39:01.736544 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> > mds_svc_tbl_query
> > Sep 30 20:39:01.736561 osafamfnd[5679] DBG  |MCM_DB : Entering :
> > mds_vdest_tbl_get_role
> > Sep 30 20:39:01.736578 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> > mds_vdest_tbl_get_role
> > Sep 30 20:39:01.736595 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> > mds_svc_tbl_get_role
> > Sep 30 20:39:01.736612 osafamfnd[5679] DBG  |MDS_SND_RCV : Entering
> 
> > mds_mcm_process_recv_snd_msg_common
> > Sep 30 20:39:01.736629 osafamfnd[5679] DBG  |MDS_SND_RCV : Entering
> 
> > mds_mcm_do_decode_full_or_flat Sep 30 20:39:01.736647
> osafamfnd[5679]
> > DBG  |MDS_SND_RCV : calling callback ptr Sep 30 20:39:01.736682 
> > osafamfnd[5679] DBG  |MDS_SND_RCV : Leaving 
> > mds_mcm_do_decode_full_or_flat Sep 30 20:39:01.736751
> osafamfnd[5679]
> > ERR  |MDS_SND_RCV: Receive callback failed svc_id = CLMA(35)
> >                                                                     
>  
> >           ^^^^^
> > Sep 30 20:39:01.736803 osafamfnd[5679] DBG  |MCM_DB : Entering :
> > mds_svc_tbl_get
> > Sep 30 20:39:01.736828 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> > mds_svc_tbl_get
> > Sep 30 20:39:01.736851 osafamfnd[5679] DBG  |MCM_DB : Entering :
> > mds_subtn_tbl_get
> > Sep 30 20:39:01.736869 osafamfnd[5679] DBG  |MCM_DB : Leaving : S :
> > mds_subtn_tbl_get
> > Sep 30 20:39:01.736887 osafamfnd[5679] INFO  |MDS_SND_RCV:
> > Successfully recd the response also
> > Sep 30 20:39:01.736905 osafamfnd[5679] INFO  |MDS_SND_RCV: Deleting
> 
> > the sync send entry with xch_id=4
> > 
> > Here is the output syslog:
> > Sep 30 20:38:59 host-s2 opensafd: Starting OpenSAF Services Sep 30
> > 20:38:59 host-s2 kernel: [95954.028402] TIPC: Activated (version
> > 2.0.0) Sep 30 20:38:59 host-s2 kernel: [95954.029613] NET:
> Registered
> > protocol family 30 Sep 30 20:38:59 host-s2 kernel: [95954.030680]
> > TIPC: Started in single node mode Sep 30 20:38:59 host-s2 kernel: 
> > [95954.033582] TIPC: Started in network mode Sep 30 20:38:59 host-s2
> 
> > kernel: [95954.034575] TIPC: Own node address <1.1.2>, network 
> > identity 1234 Sep 30 20:38:59 host-s2 kernel: [95954.038442] TIPC:
> > Enabled bearer <eth:eth4>, discovery domain <1.1.0>, priority 10 Sep
> 
> > 30 20:38:59 host-s2 kernel: [95954.040652] TIPC: Established link 
> > <1.1.2:eth4-1.1.1:eth4> on network plane A Sep 30 20:38:59 host-s2
> > osafrded[5588]: Started Sep 30 20:38:59 host-s2 osafrded[5588]: NO 
> > Peer rde@2010f has active state => Assigning Standby role to this
> node
> > Sep 30 20:39:00 host-s2 osaffmd[5596]: Started Sep 30 20:39:00
> host-s2
> > osafimmd[5605]: Started Sep 30 20:39:00 host-s2 osafimmd[5605]: NO 
> > 2PBE configured with
> > IMMSV_PEER_SC_MAX_WAIT: 30 seconds
> > Sep 30 20:39:00 host-s2 osafimmnd[5618]: Started Sep 30 20:39:00
> > host-s2 osafimmnd[5618]: NO Persistent Back-End capability
> configured,
> > Pbe file:imm.db (suffix may get added) Sep 30 20:39:00 host-s2
> > osafimmnd[5618]: NO SERVER STATE:
> > IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Sep 30 20:39:00
> 
> > host-s2 osafimmnd[5618]: NO 2PBE configured, 
> > IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Sep 30 20:39:00 host-s2
> > osafimmnd[5618]: NO SERVER STATE:
> > IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Sep 30
> > 20:39:00 host-s2 osafimmnd[5618]: NO SERVER STATE:
> > IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Sep 30
> 20:39:00
> > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_ISOLATED Sep 30
> > 20:39:00 host-s2 osafimmd[5605]: NO SBY: Ruling epoch noted
> > as:155
> > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep
> 30
> > 20:39:00 host-s2 osafimmd[5605]: NO SBY:
> > SaImmRepositoryInitModeT changed and noted as
> 'SA_IMM_KEEP_REPOSITORY'
> > Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO NODE STATE-> 
> > IMM_NODE_W_AVAILABLE Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO 
> > SERVER STATE:
> > IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Sep 30 20:39:01
> > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE
> 2462
> > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO RepositoryInitModeT is 
> > SA_IMM_KEEP_REPOSITORY Sep 30 20:39:01 host-s2 osafimmnd[5618]: WA
> IMM
> > Access Control mode is DISABLED!
> > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Epoch set to 155 in 
> > ImmModel Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch
> for
> > IMMND process at node 2010f old epoch: 154  new epoch:155 Sep 30
> > 20:39:01 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30
> > 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process
> 
> > at node 2020f old epoch: 0  new epoch:155 Sep 30 20:39:01 host-s2
> > osafimmnd[5618]: NO SERVER STATE:
> > IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Sep 30 20:39:01 host-s2
> 
> > osaflogd[5627]: Started Sep 30 20:39:01 host-s2 osaflogd[5627]: NO
> No
> > or invalid log service configuration object Sep 30 20:39:01 host-s2
> 
> > osaflogd[5627]: NO Log config system: high 0 low 0, application:
> high
> > 0 low 0 Sep 30 20:39:01 host-s2 osaflogd[5627]: NO log root
> directory
> > is:
> > /var/log/opensaf/saflog
> > Sep 30 20:39:01 host-s2 osaflogd[5627]: NO LGS_MBCSV_VERSION = 3 Sep
> 
> > 30 20:39:01 host-s2 osafntfd[5639]: Started Sep 30 20:38:59 host-s2
> 
> > opensafd: Starting OpenSAF Services Sep 30 20:38:59 host-s2 kernel:
> 
> > [95954.028402] TIPC: Activated (version 2.0.0) Sep 30 20:38:59
> host-s2
> > kernel: [95954.029613] NET: Registered protocol family 30 Sep 30
> > 20:38:59 host-s2 kernel: [95954.030680] TIPC: Started in single node
> 
> > mode Sep 30 20:38:59 host-s2 kernel: [95954.033582] TIPC: Started in
> 
> > network mode Sep 30 20:38:59 host-s2 kernel: [95954.034575] TIPC:
> Own
> > node address <1.1.2>, network identity 1234 Sep 30 20:38:59 host-s2
> 
> > kernel: [95954.038442] TIPC: Enabled bearer <eth:eth4>, discovery 
> > domain <1.1.0>, priority 10 Sep 30 20:38:59 host-s2 kernel:
> > [95954.040652] TIPC: Established link <1.1.2:eth4-1.1.1:eth4> on 
> > network plane A Sep 30 20:38:59 host-s2 osafrded[5588]: Started Sep
> 30
> > 20:38:59 host-s2 osafrded[5588]: NO Peer rde@2010f has active state
> =>
> > Assigning Standby role to this node Sep 30 20:39:00 host-s2
> > osaffmd[5596]: Started Sep 30 20:39:00 host-s2 osafimmd[5605]:
> Started
> > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO 2PBE configured with
> > IMMSV_PEER_SC_MAX_WAIT: 30 seconds
> > Sep 30 20:39:00 host-s2 osafimmnd[5618]: Started Sep 30 20:39:00
> > host-s2 osafimmnd[5618]: NO Persistent Back-End capability
> configured,
> > Pbe file:imm.db (suffix may get added) Sep 30 20:39:00 host-s2
> > osafimmnd[5618]: NO SERVER STATE:
> > IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Sep 30 20:39:00
> 
> > host-s2 osafimmnd[5618]: NO 2PBE configured, 
> > IMMSV_PBE_FILE_SUFFIX:.2020f (sync) Sep 30 20:39:00 host-s2
> > osafimmnd[5618]: NO SERVER STATE:
> > IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Sep 30
> > 20:39:00 host-s2 osafimmnd[5618]: NO SERVER STATE:
> > IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Sep 30
> 20:39:00
> > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_ISOLATED Sep 30
> > 20:39:00 host-s2 osafimmd[5605]: NO SBY: Ruling epoch noted
> > as:155
> > Sep 30 20:39:00 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep
> 30
> > 20:39:00 host-s2 osafimmd[5605]: NO SBY:
> > SaImmRepositoryInitModeT changed and noted as
> 'SA_IMM_KEEP_REPOSITORY'
> > Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO NODE STATE-> 
> > IMM_NODE_W_AVAILABLE Sep 30 20:39:00 host-s2 osafimmnd[5618]: NO 
> > SERVER STATE:
> > IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Sep 30 20:39:01
> > host-s2 osafimmnd[5618]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE
> 2462
> > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO RepositoryInitModeT is 
> > SA_IMM_KEEP_REPOSITORY Sep 30 20:39:01 host-s2 osafimmnd[5618]: WA
> IMM
> > Access Control mode is DISABLED!
> > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Epoch set to 155 in 
> > ImmModel Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch
> for
> > IMMND process at node 2010f old epoch: 154  new epoch:155 Sep 30
> > 20:39:01 host-s2 osafimmd[5605]: NO IMMND coord at 2010f Sep 30
> > 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process
> 
> > at node 2020f old epoch: 0  new epoch:155 Sep 30 20:39:01 host-s2
> > osafimmnd[5618]: NO SERVER STATE:
> > IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Sep 30 20:39:01 host-s2
> 
> > osaflogd[5627]: Started Sep 30 20:39:01 host-s2 osaflogd[5627]: NO
> No
> > or invalid log service configuration object Sep 30 20:39:01 host-s2
> 
> > osaflogd[5627]: NO Log config system: high 0 low 0, application:
> high
> > 0 low 0 Sep 30 20:39:01 host-s2 osaflogd[5627]: NO log root
> directory
> > is:
> > /var/log/opensaf/saflog
> > Sep 30 20:39:01 host-s2 osaflogd[5627]: NO LGS_MBCSV_VERSION = 3 Sep
> 
> > 30 20:39:01 host-s2 osafntfd[5639]: Started Sep 30 20:39:01 host-s2
> 
> > osafntfd[5639]: Started Sep 30 20:39:01 host-s2 osafclmd[5648]: 
> > logtrace: trace enabled to file /var/log/opensaf/osafclmd, 
> > mask=0xffffffff Sep 30 20:39:01 host-s2 osafclmd[5648]: Started Sep
> 30
> > 20:39:01 host-s2 osafclmna[5657]: logtrace: trace enabled to file 
> > /var/log/opensaf/osafclmna, mask=0xffffffff Sep 30 20:39:01 host-s2
> 
> > osafclmna[5657]: Started Sep 30 20:39:01 host-s2 osafclmna[5657]: NO
> 
> > safNode=SC-2,safCluster=Q50clmCluster Joined cluster, nodeid=2020f
> Sep
> > 30 20:39:01 host-s2 osafamfd[5665]: logtrace: trace enabled to file
> 
> > /var/log/opensaf/osafamfd, mask=0xffffffff Sep 30 20:39:01 host-s2
> > osafamfd[5665]: Started Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO
> 
> > PBE-OI established on other SC. Dumping incrementally to file imm.db
> 
> > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO STARTING SLAVE PBE 
> > process.
> > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO 
> > pbe-db-file-path:/etc/opensaf/imm.db.2020f VETERAN:1 B:0 Sep 30
> > 20:39:01 host-s2 osafimmpbed: IN arg[0] == 
> > '/usr/lib64/opensaf/osafimmpbed'
> > Sep 30 20:39:01 host-s2 osafimmpbed: IN arg[1] == '--pbe2B'
> > Sep 30 20:39:01 host-s2 osafimmpbed: IN arg[2] == 
> > '/etc/opensaf/imm.db.2020f'
> > Sep 30 20:39:01 host-s2 osafimmpbed: IN Generating DB file from 
> > current IMM state. DB file: /etc/opensaf/imm.db.2020f Sep 30
> 20:39:01
> > host-s2 osafimmpbed: NO Successfully opened empty local sqlite pbe 
> > file /tmp/imm.db.RIranA Sep 30 20:39:01 host-s2 osafimmd[5605]: NO
> > SBY: Ruling epoch noted
> > as:156
> > Sep 30 20:39:01 host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND
> 
> > process at node 2020f old epoch: 155  new epoch:156 Sep 30 20:39:01
> 
> > host-s2 osafimmd[5605]: NO SBY: New Epoch for IMMND process at node
> 
> > 2010f old epoch: 155  new epoch:156 Sep 30 20:39:01 host-s2
> > osafimmd[5605]: NO IMMND coord at 2010f Sep 30 20:39:01 host-s2
> > osafimmnd[5618]: NO Epoch set to 156 in ImmModel Sep 30 20:39:01
> > host-s2 osafamfd[5665]: NO got si
> > :safSi=Management-2N,safApp=ManagementApp
> > Sep 30 20:39:01 host-s2 osafamfd: Last message 'NO got si
> :safSi=Man'
> > repeated 13 times, suppressed by syslog-ng on host-s2 Sep 30
> 20:39:01
> > host-s2 osafimmnd[5618]: NO Implementer (applier)
> > connected: 12 (@safAmfService2020f) <8, 2020f> Sep 30 20:39:01
> host-s2
> > osafimmpbed: NO Successfully dumped to file /tmp/imm.db.RIranA Sep
> 30
> > 20:39:01 host-s2 osafimmpbed: NO Moved /tmp/imm.db.RIranA to 
> > /etc/opensaf/imm.db.2020f.tmp Sep 30 20:39:01 host-s2 osafimmpbed:
> NO
> > Moved /etc/opensaf/imm.db.2020f.tmp to /etc/opensaf/imm.db.2020f Sep
> 
> > 30 20:39:01 host-s2 osafimmpbed: NO Removed obsolete journal file:
> > /etc/opensaf/imm.db.2020f-journal
> > Sep 30 20:39:01 host-s2 osafimmpbed: NO Removed obsolete db file:
> > /etc/opensaf/imm.db.2020f.prev
> > Sep 30 20:39:01 host-s2 osafimmpbed: IN Re-attaching to the new 
> > version of /etc/opensaf/imm.db.2020f Sep 30 20:39:01 host-s2
> > osafimmpbed: NO Successfully opened pre-existing sqlite pbe file 
> > /etc/opensaf/imm.db.2020f Sep 30 20:39:01 host-s2 osafimmpbed: IN
> > saImmRepositoryInit:
> > SA_IMM_KEEP_REPOSITORY - attaching to repository Sep 30 20:39:01
> > host-s2 osafimmpbed: NO pbeDaemon starting with
> > obj-count:808
> > Sep 30 20:39:01 host-s2 osafimmnd[5618]: NO Implementer (applier)
> > connected: 13 (@OpenSafImmPBE) <403, 2020f> Sep 30 20:39:01 host-s2
> 
> > osafimmnd[5618]: NO Implementer connected: 14
> > (OsafImmPbeRt_B) <404, 2020f>
> > Sep 30 20:39:01 host-s2 osafamfnd[5679]: logtrace: trace enabled to
> 
> > file /var/log/opensaf/osafamfnd, mask=0xffffffff Sep 30 20:39:01
> > host-s2 osafamfnd[5679]: Started Sep 30 20:39:01 host-s2
> > osafimmd[5605]: NO IMMND coord at 2010f Sep 30 20:39:01 host-s2
> > osafimmpbed: NO Update epoch 156 committing with
> > ccbId:10000000a/4294967306 Sep 30 20:39:03 host-s2 osafimmnd[5618]:
> NO
> > PBE slave established on this SC. Dumping incrementally to file
> imm.db
> > Sep 30 20:39:04 host-s2 osafamfd[5665]: NO Cold sync complete!
> > 
> > Thanks,
> > 
> > Chad
> >
> ----------------------------------------------------------------------
> > -------- _______________________________________________
> > Opensaf-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to