I tested as below: With TIPC: changeset: 21a73092ad97 1. With 550 amf component on single node. I unlock-in and all were instantiated. 2. Configured 550 amf component dynamically on each node and unlock-in all components and all were instantiated on each node(Total components in cluster was 550*2== 1100).
So, everything worked fine. But opensafd stop was hung and all opensaf processes were in defunct state. root 24136 1 7 17:42 ? 00:00:05 /usr/local/lib/opensaf/osafdtmd root 24243 1 6 17:42 ? 00:00:04 /usr/local/lib/opensaf/osafamfd --tracemask=0xffffffff root 24253 1 7 17:42 ? 00:00:04 /usr/local/lib/opensaf/osafamfnd --tracemask=0xffffffff root 28880 24253 0 17:43 ? 00:00:00 [osaf-clmd] <defunct> root 28881 24253 0 17:43 ? 00:00:00 [osaf-ckptd] <defunct> root 28882 24253 0 17:43 ? 00:00:00 [osaf-evtd] <defunct> root 28883 24253 0 17:43 ? 00:00:00 [osaf-fmd] <defunct> root 28884 24253 0 17:43 ? 00:00:00 [osaf-lckd] <defunct> root 28885 24253 0 17:43 ? 00:00:00 [osaf-logd] <defunct> root 28886 24253 0 17:43 ? 00:00:00 [osaf-msgd] <defunct> root 28887 24253 0 17:43 ? 00:00:00 [osaf-ntfd] <defunct> root 28888 24253 0 17:43 ? 00:00:00 [osaf-rded] <defunct> root 28889 24253 0 17:43 ? 00:00:00 [osaf-smfd] <defunct> root 28890 24253 0 17:43 ? 00:00:00 [osaf-immd] <defunct> root 28891 24253 0 17:43 ? 00:00:00 [osaf-ckptnd] <defunct> root 28892 24253 0 17:43 ? 00:00:00 [osaf-lcknd] <defunct> root 28893 24253 0 17:43 ? 00:00:00 [osaf-msgnd] <defunct> root 28894 24253 0 17:43 ? 00:00:00 [osaf-clmna] <defunct> root 28895 24253 0 17:43 ? 00:00:00 [osaf-immnd] <defunct> root 28896 24253 0 17:43 ? 00:00:00 [osaf-smfnd] <defunct> root 28897 24253 0 17:43 ? 00:00:00 [osaf-amfwd] <defunct> With TCP: changeset: 21a73092ad97 (With #define DTM_INTRANODE_MAX_PROCESSES 650) 1. With 550 amf component on single node. I unlock-in and all were instantiated. 2. Configured 550 amf component dynamically on each node and unlock-in all components and all were instantiated on each node(Total components in cluster was 550*2== 1100). So, everything worked fine. But opensafd stop was hung and all opensaf processes were in defunct state as shown above. Now, I took dump and started SC-1 and when component were up and running on SC-1, then started SC-2. All came up fine. Now, I restarted SC-2 and SC-2 failed to come up with following errors: Oct 22 17:59:47 PM_SC-2 osafamfd[32221]: MDTM:socket_recv() = 0, conn lost with dh server, exiting library err :Success Oct 22 17:59:48 PM_SC-2 osafimmnd[32164]: NO Implementer locally disconnected. Marking it as doomed 15 <9, 2020f> (@safAmfService2020f) Oct 22 17:59:48 PM_SC-2 osafamfnd[32238]: NO saAmfCtDefQuiescingCompleteTimeout for 'safVersion=4.0.0,safCompType=AmfDemo1' initialized with saAmfCtDefCallbackTimeout Oct 22 17:59:48 PM_SC-2 osafimmnd[32164]: NO Implementer disconnected 15 <9, 2020f> (@safAmfService2020f) Oct 22 17:59:50 PM_SC-2 osafamfnd[32238]: ER ncsmds_api for 0 FAILED, dest=0 Oct 22 17:59:50 PM_SC-2 osafamfnd[32238]: NO saAmfCtDefQuiescingCompleteTimeout for 'safVersion=4.0.0,safCompType=AmfDemo1' initialized with saAmfCtDefCallbackTimeout Oct 22 17:59:50 PM_SC-2 osafamfnd[32238]: ER ncsmds_api for 0 FAILED, dest=0 All components registration failed. Thanks -Nagu > -----Original Message----- > From: Neelakanta Reddy > Sent: 22 October 2014 12:43 > To: Shu Wang; [email protected] > Cc: Lisa Ann Lentz-Liddell > Subject: Re: [users] Max number of SUs/Components in a cluster? > > Hi , > > There is a limitation DTM_INTRANODE_MAX_PROCESSES in present opensaf. > There is an enhancement ticket > opened.(https://sourceforge.net/p/opensaf/tickets/1187/) > Based on the discussion limit can be increased or can be configurable. > > /Neel. > > > > > On Wednesday 22 October 2014 03:56 AM, Shu Wang wrote: > > When testing the cluster for max size what configuration do you use? Are we > able to get a copy of that to understand the look of the cluster? > > What is the max number of : > > - service groups per cluster tested? > > - service units per cluster tested? > > - components per cluster tested? > > - service units on a node tested? > > - components on a node tested? > > > > While continuing to investigate the below problem, we found : > > On a controller node, when the component count for that node reaches 78, > opensaf fails > > On a payload node, when the component count for that node reaches > > 90, opensaf fails > > > > There are : > > On a controller node, there are 22 opensaf components > > On a payload node, there are 10 opensaf components > > > > Total components running on a node: > > Controller node: 78 + 22 = 100 > > Payload node: 90 + 10 = 100 > > > > We looked through the OpenSAF code for defines that had a value of > > 100. In osaf/services/infrastructure/dtms/dtm > > > > dtm_intra.c:#define DTM_INTRANODE_MAX_PROCESSES 100 > > > > We changed that value to 115 and retried our test. Increasing to 115 > > allowed > the total component count to go above 100 and of course then OpenSAF failed > when we unlocked the SU that pushed the component count to over 115. > > > > Does anyone know of any ill effect if we change the > DTM_INTRANODE_MAX_PROCESSES to a larger value e.g. 250? The > dtm_intra.c is using it for the number of items in 2 different arrays and > that is > contained within that .c. > > > > We tried this change only with OpenSAF 4.4 and TCP. > > > > Thanks. > > > > Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| > > www.NetCracker.com Proven Partner to Communications Service Providers > > > > > > -----Original Message----- > > From: Neelakanta Reddy [mailto:[email protected]] > > Sent: Tuesday, October 21, 2014 2:55 AM > > To: [email protected]; Shu Wang > > Cc: Lisa Ann Lentz-Liddell > > Subject: Re: [users] Max number of SUs/Components in a cluster? > > > > Hi , > > > > Comments inline. > > > > /Neel. > > > > On Tuesday 21 October 2014 04:41 AM, Shu Wang wrote: > >> The IMM documentation states: > >> > >> Applications that intend to add their own imm classes and imm objects need > to be aware that capacity is limited. OpenSAF4.1 has been system tested with > up > to 350 000 objects of average size 300 bytes. It is not advisable to generate > larger imm-contents than that. > >> What is the definition of an object? > > The 300 bytes is the size of each object, (which is the accumulated size of > > the > assigned attributes for a class). The size of the object depends upon the > number > of attributes and different type of attributes of a class. > >> We have a cluster defined across 6 nodes with a total of 12 SGs, a total > >> of 64 > SUs, and a total of 292 components. We can start OpenSAF successfully across > the nodes and unlock all SUs with no problems. > >> > >> The cluster definition was increased to 6 nodes, 15 SGs, a total of 56 > >> SUs, and > a total of 388 components. We are able to start OpenSAF on all nodes > successfully but as soon as a little over 300 components have been unlocked, > things start to fall apart. The opensaf processes start to die and the > cluster is no > longer usable. > > The SU's, SG's and components are internally objects for IMM. Incresing to > > "6 > nodes, 15 SGs, a total of 56 SUs, and a total of 388 components" > > should not have caused any IMM related problems. > > > >> Oct 19 16:29:06 colobus osafamfnd[3649]: NO Assigned > 'safSi=amfSDFSISI1.3,safApp=olcApp' ACTIVE to > 'safSu=amfSDFSISU1.4,safSg=amfSDFSISG1,safApp=olcApp' > >> Oct 19 16:29:07 colobus osafamfnd[3649]: NO > 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,saf > App=olcApp' faulted due to 'avaDown' : Recovery is 'componentRestart' > >> Oct 19 16:29:09 colobus osafamfnd[3649]: NO > 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,saf > App=olcApp' faulted due to 'avaDown' : Recovery is 'componentRestart' > >> Oct 19 16:29:09 colobus osafntfd[3587]: ER ntfs_mds_msg_send FAILED > >> Oct 19 16:29:09 colobus osafntfd[3587]: ER ntfs_mds_msg_send to ntfa > >> failed rc: 2 Oct 19 16:29:09 colobus osafntfd[3587]: ER > >> ntfs_mds_msg_send FAILED Oct 19 16:29:09 colobus osafntfd[3587]: ER > >> ntfs_mds_msg_send FAILED .... > >> Oct 19 16:33:24 colobus ntpd_initres[2608]: host name not found: > >> 0.rhel.pool.ntp.org .... > >> Oct 19 16:35:18 colobus osafimmnd[3549]: NO Implementer disconnected > >> 14 <0, 22b0f> (MsgQueueService142095) Oct 19 16:35:19 colobus > >> osafclmd[3602]: NO proc_initialize_msg: send failed. > >> dest:22b0f00007a77 Oct 19 16:35:19 colobus osafimmnd[3549]: NO Global > discard node received for nodeId:22b0f pid:31333 Oct 19 16:35:20 colobus > osafamfnd[3649]: NO > 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,saf > App=olcApp' faulted due to 'avaDown' : Recovery is 'componentRestart' > >> Oct 19 16:35:22 colobus osafamfnd[3649]: NO > 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,saf > App=olcApp' faulted due to 'avaDown' : Recovery is 'componentRestart' > >> Oct 19 16:35:24 colobus osafamfnd[3649]: NO > 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,saf > App=olcApp' faulted due to 'avaDown' : Recovery is 'componentRestart' > >> Oct 19 16:35:26 colobus osafimmnd[3549]: NO Implementer connected: 16 > >> (MsgQueueService142095) <12021, 2280f> Oct 19 16:35:26 colobus > >> osafimmnd[3549]: NO Implementer locally disconnected. Marking it as > >> doomed 16 <12021, 2280f> (MsgQueueService142095) Oct 19 16:35:26 > >> colobus osafimmnd[3549]: NO Implementer disconnected 16 <12021, > >> 2280f> > >> (MsgQueueService142095) Oct 19 16:35:26 colobus osafamfd[3631]: NO > >> Node 'bedrazzas.monkey.lab' left the cluster > >> > >> Have we reached a max of the number of SUs/Components that can be > started within a single OpenSAF cluster? > > OpenSAF 4.4/4.5 is tested for 70 nodes. > >> We have tried the above with OpenSAF 4.4 and OpenSAF 4.5 and with both > TCP and TIPC, all fail similarly. > > This should have been an application problem or adjustments related to > timeouts. Please share the syslog messages of all the nodes. > >> Thank you! > >> > >> Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| > >> www.NetCracker.com Proven Partner to Communications Service Providers > >> > >> > >> > >> > >> ________________________________ > >> The information transmitted herein is intended only for the person or > >> entity > to which it is addressed and may contain confidential, proprietary and/or > privileged material. Any review, retransmission, dissemination or other use > of, > or taking of any action in reliance upon, this information by persons or > entities > other than the intended recipient is prohibited. If you received this in > error, > please contact the sender and delete the material from any computer. > >> --------------------------------------------------------------------- > >> - > >> -------- Comprehensive Server Monitoring with Site24x7. > >> Monitor 10 servers for $9/Month. > >> Get alerted through email, SMS, voice calls or mobile push notifications. > >> Take corrective actions from your mobile device. > >> http://p.sf.net/sfu/Zoho > >> _______________________________________________ > >> Opensaf-users mailing list > >> [email protected] > >> https://lists.sourceforge.net/lists/listinfo/opensaf-users > > > > > > ________________________________ > > The information transmitted herein is intended only for the person or > > entity to > which it is addressed and may contain confidential, proprietary and/or > privileged material. Any review, retransmission, dissemination or other use > of, > or taking of any action in reliance upon, this information by persons or > entities > other than the intended recipient is prohibited. If you received this in > error, > please contact the sender and delete the material from any computer. > > > ------------------------------------------------------------------------------ > Comprehensive Server Monitoring with Site24x7. > Monitor 10 servers for $9/Month. > Get alerted through email, SMS, voice calls or mobile push notifications. > Take corrective actions from your mobile device. > http://p.sf.net/sfu/Zoho > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
