We are using opensaf 4.4.0.  In one of our environments a cluster was up and 
running, then each node in the cluster was stopped and started with the 
opensafd script.  All nodes restarted and rejoined the cluster successfully 
without errors except for one payload node (appbox4).  The payload node would 
not rejoin the cluster even after appbox4 machine was rebooted.  Each attempt 
to start the node resulted in the opensaf processes going defunct and hanging 
until opensafd stop command was executed.  The resolution was to stop and start 
the entire cluster.  This solution is not a good solution for a continuously 
available system.   so we would like to know the following:

1)  What could possibly cause this problem?

2)  Is there another way of resolving this situation other than stopping and 
starting the entire cluster?


We really would appreciate any suggestions or help with this issue.


The payload, active controller, and standby controller  messages logs contain 
the following for one such start attempt:

Payload messages log:
Feb  6 12:54:22 appbox4 opensafd: Starting OpenSAF Services
Feb  6 12:54:22 appbox4 osafdtmd[63170]: Started
Feb  6 12:54:22 appbox4 osafimmnd[63192]: Started
Feb  6 12:54:22 appbox4 osafdtmd[63170]: NO Established contact with 'appbox1'
Feb  6 12:54:22 appbox4 osafdtmd[63170]: NO Established contact with 'dbbox2'
Feb  6 12:54:22 appbox4 osafdtmd[63170]: NO Established contact with 'appbox3'
Feb  6 12:54:22 appbox4 osafdtmd[63170]: NO Established contact with 'dbbox1'
Feb  6 12:54:22 appbox4 osafdtmd[63170]: NO Established contact with 'appbox2'
Feb  6 12:54:22 appbox4 osafimmnd[63192]: NO SERVER STATE: IMM_SERVER_ANONYMOUS 
--> IMM_SERVER_CLUSTER_WAITING
Feb  6 12:54:23 appbox4 osafimmnd[63192]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Feb  6 12:54:23 appbox4 osafimmnd[63192]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Feb  6 12:54:23 appbox4 osafimmnd[63192]: NO NODE STATE-> IMM_NODE_ISOLATED
Feb  6 12:54:23 appbox4 osafimmnd[63192]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Feb  6 12:54:23 appbox4 osafimmnd[63192]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
Feb  6 12:54:25 appbox4 osafimmnd[63192]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 2316
Feb  6 12:54:25 appbox4 osafimmnd[63192]: NO RepositoryInitModeT is 
SA_IMM_INIT_FROM_FILE
Feb  6 12:54:25 appbox4 osafimmnd[63192]: NO Epoch set to 27 in ImmModel
Feb  6 12:54:25 appbox4 osafimmnd[63192]: NO SERVER STATE: 
IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY
Feb  6 12:54:25 appbox4 osafclmna[63225]: Started
Feb  6 12:54:25 appbox4 osafclmna[63225]: NO 
safNode=appbox4,safCluster=myClmCluster Joined cluster, nodeid=20d0f
Feb  6 12:54:25 appbox4 osafamfnd[63240]: Started

Here's where things hang and the opensaf processes go defunct on appbox4 and 
the opensafd stop command was executed
Feb  6 12:57:57 appbox4 opensafd: Stopping OpenSAF Services

we are not sure if it is significant or not but the last messages when the Node 
Director was ok are:

Feb  6 10:16:39 appbox4 osafamfnd[13603]: ER ncsmds_api for 0 FAILED, 
dest=20d0f0000acd4
Feb  6 10:16:49 appbox4 osafamfnd[13603]: saImmOmInitialize FAILED, rc = 5


Active controller messages log:
Feb  6 12:54:22 appbox3 osafdtmd[22105]: NO Established contact with 'appbox4'
Feb  6 12:54:23 appbox3 osafimmd[22159]: NO Node 20d0f request sync 
sync-pid:63192 epoch:0
Feb  6 12:54:23 appbox3 osafimmnd[22175]: NO Announce sync, epoch:27
Feb  6 12:54:23 appbox3 osafimmnd[22175]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Feb  6 12:54:23 appbox3 osafimmd[22159]: NO Successfully announced sync. New 
ruling epoch:27
Feb  6 12:54:23 appbox3 osafimmnd[22175]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Feb  6 12:54:23 appbox3 osafimmloadd: NO Sync starting
Feb  6 12:54:25 appbox3 osafimmloadd: IN Synced 3291 objects in total
Feb  6 12:54:25 appbox3 osafimmnd[22175]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 15141
Feb  6 12:54:25 appbox3 osafimmloadd: NO Sync ending normally
Feb  6 12:54:25 appbox3 osafimmnd[22175]: NO Epoch set to 27 in ImmModel
Feb  6 12:54:25 appbox3 osafimmd[22159]: NO ACT: New Epoch for IMMND process at 
node 20b0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox3 osafimmd[22159]: NO ACT: New Epoch for IMMND process at 
node 20e0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox3 osafimmnd[22175]: NO SERVER STATE: 
IMM_SERVER_SYNC_SERVER --> IMM SERVER READY
Feb  6 12:54:25 appbox3 osafimmd[22159]: NO ACT: New Epoch for IMMND process at 
node 20f0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox3 osafimmd[22159]: NO ACT: New Epoch for IMMND process at 
node 20a0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox3 osafimmd[22159]: NO ACT: New Epoch for IMMND process at 
node 20c0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox3 osafimmd[22159]: NO ACT: New Epoch for IMMND process at 
node 20d0f old epoch: 0  new epoch:27
Feb  6 12:54:25 appbox3 osafamfd[22260]: NO Node 'appbox4' joined the cluster



Standby controller messages log:
Feb  6 12:54:22 appbox1 osafdtmd[14345]: NO Established contact with 'appbox4'
Feb  6 12:54:23 appbox1 osafimmd[14398]: NO SBY: Ruling epoch noted as:27
Feb  6 12:54:23 appbox1 osafimmd[14398]: NO IMMND coord at 20b0f
Feb  6 12:54:23 appbox1 osafimmnd[14414]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Feb  6 12:54:25 appbox1 osafimmnd[14414]: NO NODE STATE-> 
IMM_NODE_FULLY_AVAILABLE 15642
Feb  6 12:54:25 appbox1 osafimmnd[14414]: NO Epoch set to 27 in ImmModel
Feb  6 12:54:25 appbox1 osafimmd[14398]: NO SBY: New Epoch for IMMND process at 
node 20b0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox1 osafimmd[14398]: NO IMMND coord at 20b0f
Feb  6 12:54:25 appbox1 osafimmd[14398]: NO SBY: New Epoch for IMMND process at 
node 20e0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox1 osafimmd[14398]: NO SBY: New Epoch for IMMND process at 
node 20f0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox1 osafimmd[14398]: NO SBY: New Epoch for IMMND process at 
node 20a0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox1 osafimmd[14398]: NO SBY: New Epoch for IMMND process at 
node 20c0f old epoch: 26  new epoch:27
Feb  6 12:54:25 appbox1 osafimmd[14398]: NO SBY: New Epoch for IMMND process at 
node 20d0f old epoch: 0  new epoch:27


Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| www.NetCracker.com
Proven Partner to Communications Service Providers




________________________________
The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to