The issue came up again. Could anyone tell me how to identify where is wrong? I am using OpenSAF 4.2.0.
Below is what is seen from the node requesting sync.: ----------------------------------------------------- Jul 15 09:42:52 WR20-64_32 osafimmnd[30384]: Started Jul 15 09:42:52 WR20-64_32 osafimmnd[30384]: Initialization Success Jul 15 09:42:52 WR20-64_32 osafimmnd[30384]: Director Service is up Jul 15 09:42:52 WR20-64_32 /etc/redhat-lsb/lsb_start_daemon: osafimmnd startup - OK Jul 15 09:42:52 WR20-64_32 /etc/redhat-lsb/lsb_log_message: - OK Jul 15 09:42:52 WR20-64_32 osafimmnd[30384]: SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Jul 15 09:42:52 WR20-64_32 osafimmnd[30384]: SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Jul 15 09:42:53 WR20-64_32 osafimmnd[30384]: REQUESTING SYNC Jul 15 09:42:53 WR20-64_32 osafimmnd[30384]: SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Jul 15 09:42:53 WR20-64_32 osafimmnd[30384]: NODE STATE-> IMM_NODE_ISOLATED Jul 15 09:42:57 WR20-64_32 osafdtmd[30313]: DTM: dtm_node_add failed .node_ip : 192.168.211.181 Jul 15 09:43:13 WR20-64_32 osafimmnd[30384]: This node still waiting to be sync'ed after 20 seconds Jul 15 09:43:33 WR20-64_32 osafimmnd[30384]: This node still waiting to be sync'ed after 40 seconds Jul 15 09:43:53 WR20-64_32 osafimmnd[30384]: This node still waiting to be sync'ed after 60 seconds Jul 15 09:44:13 WR20-64_32 osafimmnd[30384]: This node still waiting to be sync'ed after 80 seconds Jul 15 09:44:33 WR20-64_32 osafimmnd[30384]: REQUESTING SYNC AGAIN 1000 Jul 15 09:44:33 WR20-64_32 osafimmnd[30384]: This node still waiting to be sync'ed after 100 seconds Jul 15 09:44:33 WR20-64_32 osafimmnd[30384]: Redundant sync request, when IMM_NODE_ISOLATED Jul 15 09:44:53 WR20-64_32 osafimmnd[30384]: This node still waiting to be sync'ed after 120 seconds Jul 15 09:45:13 WR20-64_32 osafimmnd[30384]: This node still waiting to be sync'ed after 140 seconds Jul 15 09:45:26 WR20-64_32 syslog-ng[3560]: STATS: dropped 0 Jul 15 09:45:33 WR20-64_32 osafimmnd[30384]: This node still waiting to be sync'ed after 160 seconds And below is from another SC: ---------------------------------------------------------------- Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Implementer connected: 10 (MsgQueueService133135) <0, 2080f> Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safSISU=safSu=PL-5\#safSg=NoRed\#safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safCSIComp=safComp=CPND\#safSu=PL-5\#safSg=NoRed\#safApp=OpenSAF,safCsi=CPND,safSi=NoRed1,safApp=OpenSAF' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safCSIComp=safComp=GLND\#safSu=PL-5\#safSg=NoRed\#safApp=OpenSAF,safCsi=GLND,safSi=NoRed1,safApp=OpenSAF' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safCSIComp=safComp=MQND\#safSu=PL-5\#safSg=NoRed\#safApp=OpenSAF,safCsi=MQND,safSi=NoRed1,safApp=OpenSAF' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safCSIComp=safComp=IMMND\#safSu=PL-5\#safSg=NoRed\#safApp=OpenSAF,safCsi=IMMND,safSi=NoRed1,safApp=OpenSAF' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safCSIComp=safComp=SMFND\#safSu=PL-5\#safSg=NoRed\#safApp=OpenSAF,safCsi=SMFND,safSi=NoRed1,safApp=OpenSAF' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safCSIComp=safComp=AMFWDOG\#safSu=PL-5\#safSg=NoRed\#safApp=OpenSAF,safCsi=AMFWDOG,safSi=NoRed1,safApp=OpenSAF' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safSISU=safSu=PL-5\#safSg=DpAmfFGW\#safApp=DpAmfFGWType,safSi=DpAmfFGW,safApp=DpAmfFGWType' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safCSIComp=safComp=DpAmfFGW\#safSu=PL-5\#safSg=DpAmfFGW\#safApp=DpAmfFGWType,safCsi=DpAmfFGW,safSi=DpAmfFGW,safApp=DpAmfFGWType' by Impl id: 3 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safCkpt=fgw,safApp=safCkptService' by Impl id: 8 Jul 16 01:42:14 WR20-64_25 osafimmnd[22998]: Create runtime object 'safReplica=safNode=PL-5\#safCluster=myClmCluster,safCkpt=fgw,safApp=safCkptService' by Impl id: 8 Jul 16 01:42:52 WR20-64_25 osafdtmd[22925]: DTM: add New incoming connection to fd : 77 Jul 16 01:42:52 WR20-64_25 osafdtmd[22925]: DTM: add New incoming connection to fd : 78 Jul 16 01:42:53 WR20-64_25 osaffmd[22965]: Peer Node_id 328207 : EE_ID safEE=Linux_os_hosting_clm_node,safHE=4500_slot_14,safDomain=domain_1 Jul 16 01:42:53 WR20-64_25 osafimmd[22981]: New IMMND process is on ACTIVE Controller at 2030f Jul 16 01:42:53 WR20-64_25 osafimmd[22981]: New IMMND process is on STANDBY Controller at 5020f Jul 16 01:42:53 WR20-64_25 osafimmd[22981]: IMMND on controller (not currently coord) requests sync Jul 16 01:42:53 WR20-64_25 osafimmd[22981]: Node 2030f request sync sync-pid:940 epoch:0 Jul 16 01:42:53 WR20-64_25 osafimmd[22981]: IMMND on controller (not currently coord) requests sync Jul 16 01:42:53 WR20-64_25 osafimmd[22981]: Node 5020f request sync sync-pid:30384 epoch:0 Jul 16 01:44:33 WR20-64_25 osafimmd[22981]: IMMND on controller (not currently coord) requests sync Jul 16 01:44:33 WR20-64_25 osafimmd[22981]: Node 2030f request sync sync-pid:940 epoch:0 Jul 16 01:44:33 WR20-64_25 osafimmd[22981]: IMMND on controller (not currently coord) requests sync Jul 16 01:44:33 WR20-64_25 osafimmd[22981]: Node 5020f request sync sync-pid:30384 epoch:0 Jul 16 01:46:13 WR20-64_25 osafimmd[22981]: IMMND on controller (not currently coord) requests sync Jul 16 01:46:13 WR20-64_25 osafimmd[22981]: Node 2030f request sync sync-pid:940 epoch:0 Jul 16 01:46:13 WR20-64_25 osafimmd[22981]: IMMND on controller (not currently coord) requests sync Jul 16 01:46:13 WR20-64_25 osafimmd[22981]: Node 5020f request sync sync-pid:30384 epoch:0 Jul 16 01:47:53 WR20-64_25 osafimmd[22981]: IMMND on controller (not currently coord) requests sync Jul 16 01:47:53 WR20-64_25 osafimmd[22981]: Node 2030f request sync sync-pid:940 epoch:0 Jul 16 01:47:53 WR20-64_25 osafimmd[22981]: IMMND on controller (not currently coord) requests sync Jul 16 01:47:53 WR20-64_25 osafimmd[22981]: Node 5020f request sync sync-pid:30384 epoch:0 Thanks. Ted -----Original Message----- From: Anders Bjornerstedt [mailto:[email protected]] Sent: Monday, July 14, 2014 4:56 PM To: Yao Cheng LIANG Cc: [email protected]; santosh satapathy Subject: Re: [users] One of the controller wait for sync Hi, Sounds like you dont have a shared file system mounted between SC1 and SC2. That means you can not run what is called 1PBE which relies on a shared filesystem. But you van run 0PBE or 2PBE. PBE = Persistent Back End. But in any case, your initial problem of SC2 not getting synced is strange. If you have not already done so, you need to read the documentation for the IMM. Either the OpenSAF_IMMSV_PR.doc or the osaf/services/saf/immsv/README. In particular the overview parts that explain imm loading, imm sync and PBE. /Anders Bjornerstedt Yao Cheng LIANG wrote: > Dear Anders, > > Thanks for clarification. Here "sync" I mean cop the file from one node to > the other. > > Ted > > -----Original Message----- > From: Anders Bjornerstedt [mailto:[email protected]] > Sent: Monday, July 14, 2014 4:48 PM > To: Yao Cheng LIANG > Cc: [email protected]; santosh satapathy > Subject: Re: [users] One of the controller wait for sync > > Hi, > > There is no such thing as "sync the imm.xml file". > Sync is a protocol where the IMMND at one of the SCs broadcastrs the imm > contents (from memory) to any nodes that are "empty" and ready to receive the > sync/data. Any node that has sent a sync request is ready to receive the sync. > > An imm.cml file can be used for loading. > Sync is performaed by nodes that *missed* loading. > > (An imm.xml file can also be used to create a ccb using 'immcfg -f' but I > dont think that is what you meant). > > /Anders Bjornerstedt > > Yao Cheng LIANG wrote: > >> Thanks. I resolved the issue by sync the imm.xml file on two >> controllers. /Ted >> >> -----Original Message----- >> From: Anders Bjornerstedt [mailto:[email protected]] >> Sent: Monday, July 14, 2014 4:13 PM >> To: Yao Cheng LIANG >> Cc: [email protected]; santosh satapathy >> Subject: Re: [users] One of the controller wait for sync >> >> Hi , >> >> The sync request from SC2 clearly reaches SC1. >> Is any sync started at SC1 ? >> I cant see because the syslog snippet from SC1 is minimal, truncated right >> after the request arrives. >> >> /Anders Bjornerstedt >> >> Yao Cheng LIANG wrote: >> >> >>> Dear all, >>> >>> I am using OpenSAF 4.2.2, and when I start SC-2 after SC-1, below message >>> appears in /var/log/message on sc-1: >>> ------------------------------------------------------ >>> Jul 12 22:35:26 localhost osaffmd[11690]: Peer Node_id 328207 : >>> EE_ID >>> safEE=Linux_os_hosting_clm_node,safHE=4500_slot_14,safDomain=domain_ >>> 1 Jul 12 22:35:26 localhost osafimmd[11706]: New IMMND process is on >>> STANDBY Controller at 5020f Jul 12 22:35:26 localhost osafimmd[11706]: >>> IMMND on controller (not currently coord) requests sync Jul 12 >>> 22:35:26 localhost osafimmd[11706]: Node 5020f request sync >>> sync-pid:8930 epoch:0 >>> -------------------------------------------------------------------- >>> - >>> - >>> -------------------------------------------------------- >>> >>> while on sc-2, below message appears in /var/log/message: >>> -------------------------------------------------------------------- >>> - >>> - >>> -------------------------------------------------------- >>> Jul 12 22:35:26 WR20-64_32 opensafd: Starting OpenSAF Services Jul >>> 12 >>> 22:35:26 WR20-64_32 osafdtmd[8860]: Started Jul 12 22:35:26 >>> WR20-64_32 >>> /etc/redhat-lsb/lsb_start_daemon: osafdtmd startup - OK Jul 12 >>> 22:35:26 WR20-64_32 /etc/redhat-lsb/lsb_log_message: - OK Jul 12 >>> 22:35:26 WR20-64_32 osafrded[8878]: Started >>> >>> Jul 12 22:35:26 WR20-64_32 osafrded[8878]: Started Jul 12 22:35:26 >>> WR20-64_32 /etc/redhat-lsb/lsb_start_daemon: osafrded startup - OK >>> Jul >>> 12 22:35:26 WR20-64_32 /etc/redhat-lsb/lsb_log_message: - OK Jul 12 >>> 22:35:26 WR20-64_32 osafrded[8878]: rde@5030f<mailto:rde@5030f> has >>> active state => Standby role Jul 12 22:35:26 WR20-64_32 osaffmd[8897]: >>> Started Jul 12 22:35:26 WR20-64_32 osaffmd[8897]: EE_ID : >>> safEE=Linux_os_hosting_clm_node,safHE=4500_slot_14,safDomain=domain_ >>> 1 Jul 12 22:35:26 WR20-64_32 /etc/redhat-lsb/lsb_start_daemon: >>> osaffmd startup - OK Jul 12 22:35:26 WR20-64_32 >>> /etc/redhat-lsb/lsb_log_message: - OK Jul 12 22:35:26 WR20-64_32 >>> osafimmd[8913]: Started Jul 12 22:35:26 WR20-64_32 osafimmd[8913]: >>> Initialization Success, role STANDBY Jul 12 22:35:26 WR20-64_32 >>> /etc/redhat-lsb/lsb_start_daemon: osafimmd startup - OK Jul 12 >>> 22:35:26 WR20-64_32 /etc/redhat-lsb/lsb_log_message: - OK Jul 12 >>> 22:35:26 WR20-64_32 osafimmnd[8930]: Started Jul 12 22:35:26 >>> WR20-64_32 osafimmnd[8930]: Initialization Success Jul 12 22:35:26 >>> WR20-64_32 osafimmnd[8930]: Director Service is up Jul 12 22:35:26 >>> WR20-64_32 /etc/redhat-lsb/lsb_start_daemon: osafimmnd startup - OK >>> Jul 12 22:35:26 WR20-64_32 /etc/redhat-lsb/lsb_log_message: - OK >>> Jul >>> 12 22:35:26 WR20-64_32 osafimmnd[8930]: SERVER STATE: >>> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Jul 12 22:35:26 >>> WR20-64_32 osafimmnd[8930]: SERVER STATE: IMM_SERVER_CLUSTER_WAITING >>> --> IMM_SERVER_LOADING_PENDING Jul 12 22:35:26 WR20-64_32 >>> osafimmnd[8930]: REQUESTING SYNC Jul 12 22:35:26 WR20-64_32 >>> osafimmnd[8930]: SERVER STATE: IMM_SERVER_LOADING_PENDING --> >>> IMM_SERVER_SYNC_PENDING Jul 12 22:35:26 WR20-64_32 osafimmnd[8930]: >>> NODE STATE-> IMM_NODE_ISOLATED Jul 12 22:35:46 WR20-64_32 >>> osafimmnd[8930]: This node still waiting to be sync'ed after 20 >>> seconds Jul 12 22:36:06 WR20-64_32 osafimmnd[8930]: This node still >>> waiting to be sync'ed after 40 seconds Jul 12 22:36:26 WR20-64_32 >>> osafimmnd[8930]: This node still waiting to be sync'ed after 60 >>> seconds Jul 12 22:36:46 WR20-64_32 osafimmnd[8930]: This node still >>> waiting to be sync'ed after 80 seconds Jul 12 22:37:06 WR20-64_32 >>> osafimmnd[8930]: REQUESTING SYNC AGAIN 1000 Jul 12 22:37:06 >>> WR20-64_32 >>> osafimmnd[8930]: This node still waiting to be sync'ed after 100 >>> seconds Jul 12 22:37:06 WR20-64_32 osafimmnd[8930]: Redundant sync >>> request, when IMM_NODE_ISOLATED Jul 12 22:37:16 WR20-64_32 >>> osafdtmd[8860]: DTM:dtm_comm_socket_recv() failed rc : 22 Jul 12 >>> 22:37:26 WR20-64_32 osafimmnd[8930]: This node still waiting to be >>> sync'ed after 120 seconds Jul 12 22:37:46 WR20-64_32 osafimmnd[8930]: >>> This node still waiting to be sync'ed after 140 seconds Jul 12 >>> 22:37:52 WR20-64_32 osafimmd[8913]: IMMND DOWN on active controller >>> f3 detected at standby immd!! f2. Possible failover Jul 12 22:37:52 >>> WR20-64_32 osafimmd[8913]: Resend of fevs message 1855, will not >>> mbcp to peer IMMD Jul 12 22:37:52 WR20-64_32 osafimmd[8913]: Message >>> count:1856 + 1 != 1856 Jul 12 22:38:06 WR20-64_32 osafimmnd[8930]: >>> This node still waiting to be sync'ed after 160 seconds Jul 12 >>> 22:38:26 WR20-64_32 osafimmnd[8930]: This node still waiting to be >>> sync'ed after 180 seconds Jul 12 22:38:46 WR20-64_32 osafimmnd[8930]: >>> REQUESTING SYNC AGAIN 2000 Jul 12 22:38:46 WR20-64_32 osafimmnd[8930]: >>> This node still waiting to be sync'ed after 200 seconds Jul 12 >>> 22:38:46 WR20-64_32 osafimmnd[8930]: Redundant sync request, when >>> IMM_NODE_ISOLATED Jul 12 22:38:53 WR20-64_32 osafdtmd[8860]: DTM: >>> add New incoming connection to fd : 22 Jul 12 22:39:06 WR20-64_32 >>> osafimmnd[8930]: This node still waiting to be sync'ed after 220 >>> seconds Jul 12 22:39:26 WR20-64_32 osafimmnd[8930]: This node still >>> waiting to be sync'ed after 240 seconds Jul 12 22:39:46 WR20-64_32 >>> osafimmnd[8930]: This node still waiting to be sync'ed after 260 >>> seconds Jul 12 22:40:06 WR20-64_32 osafimmnd[8930]: This node still >>> waiting to be sync'ed after 280 seconds Jul 12 22:40:26 WR20-64_32 >>> osafimmnd[8930]: REQUESTING SYNC AGAIN 3000 Jul 12 22:40:26 >>> WR20-64_32 >>> osafimmnd[8930]: This node still waiting to be sync'ed after 300 >>> seconds Jul 12 22:40:26 WR20-64_32 osafimmnd[8930]: Redundant sync >>> request, when IMM_NODE_ISOLATED >>> -------------------------------------------------------------------- >>> - >>> - >>> --------------------------------------------------------- >>> >>> But I reverse the order - i.e. to start sc-2 and then sc-2, both >>> controller can be started successfully >>> >>> Could anyone tell me what's wrong? >>> >>> Thanks. >>> >>> Ted >>> >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> This message (including any attachments) is for the named >>> addressee(s)'s use only. It may contain sensitive, confidential, >>> private proprietary or legally privileged information intended for a >>> specific individual and purpose, and is protected by law. If you are >>> not the intended recipient, please immediately delete it and all >>> copies of it from your system, destroy any hard copies of it and >>> notify the sender. Any use, disclosure, copying, or distribution of >>> this message and/or any attachments is strictly prohibited. >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> -------------------------------------------------------------------- >>> - >>> - >>> -------- _______________________________________________ >>> Opensaf-users mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >>> >>> >>> >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> This message (including any attachments) is for the named >> addressee(s)'s use only. It may contain sensitive, confidential, >> private proprietary or legally privileged information intended for a >> specific individual and purpose, and is protected by law. If you are >> not the intended recipient, please immediately delete it and all >> copies of it from your system, destroy any hard copies of it and >> notify the sender. Any use, disclosure, copying, or distribution of >> this message and/or any attachments is strictly prohibited. >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > This message (including any attachments) is for the named > addressee(s)'s use only. It may contain sensitive, confidential, > private proprietary or legally privileged information intended for a > specific individual and purpose, and is protected by law. If you are > not the intended recipient, please immediately delete it and all > copies of it from your system, destroy any hard copies of it and > notify the sender. Any use, disclosure, copying, or distribution of > this message and/or any attachments is strictly prohibited. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This message (including any attachments) is for the named addressee(s)'s use only. It may contain sensitive, confidential, private proprietary or legally privileged information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, please immediately delete it and all copies of it from your system, destroy any hard copies of it and notify the sender. Any use, disclosure, copying, or distribution of this message and/or any attachments is strictly prohibited. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
