Hi , There is a limitation DTM_INTRANODE_MAX_PROCESSES in present opensaf. There is an enhancement ticket opened.(https://sourceforge.net/p/opensaf/tickets/1187/) Based on the discussion limit can be increased or can be configurable.
/Neel. On Wednesday 22 October 2014 03:56 AM, Shu Wang wrote: > When testing the cluster for max size what configuration do you use? Are we > able to get a copy of that to understand the look of the cluster? > What is the max number of : > - service groups per cluster tested? > - service units per cluster tested? > - components per cluster tested? > - service units on a node tested? > - components on a node tested? > > While continuing to investigate the below problem, we found : > On a controller node, when the component count for that node reaches 78, > opensaf fails > On a payload node, when the component count for that node reaches 90, > opensaf fails > > There are : > On a controller node, there are 22 opensaf components > On a payload node, there are 10 opensaf components > > Total components running on a node: > Controller node: 78 + 22 = 100 > Payload node: 90 + 10 = 100 > > We looked through the OpenSAF code for defines that had a value of 100. In > osaf/services/infrastructure/dtms/dtm > > dtm_intra.c:#define DTM_INTRANODE_MAX_PROCESSES 100 > > We changed that value to 115 and retried our test. Increasing to 115 allowed > the total component count to go above 100 and of course then OpenSAF failed > when we unlocked the SU that pushed the component count to over 115. > > Does anyone know of any ill effect if we change the > DTM_INTRANODE_MAX_PROCESSES to a larger value e.g. 250? The dtm_intra.c is > using it for the number of items in 2 different arrays and that is contained > within that .c. > > We tried this change only with OpenSAF 4.4 and TCP. > > Thanks. > > Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| www.NetCracker.com > Proven Partner to Communications Service Providers > > > -----Original Message----- > From: Neelakanta Reddy [mailto:[email protected]] > Sent: Tuesday, October 21, 2014 2:55 AM > To: [email protected]; Shu Wang > Cc: Lisa Ann Lentz-Liddell > Subject: Re: [users] Max number of SUs/Components in a cluster? > > Hi , > > Comments inline. > > /Neel. > > On Tuesday 21 October 2014 04:41 AM, Shu Wang wrote: >> The IMM documentation states: >> >> Applications that intend to add their own imm classes and imm objects need >> to be aware that capacity is limited. OpenSAF4.1 has been system tested with >> up to 350 000 objects of average size 300 bytes. It is not advisable to >> generate larger imm-contents than that. >> What is the definition of an object? > The 300 bytes is the size of each object, (which is the accumulated size of > the assigned attributes for a class). The size of the object depends upon the > number of attributes and different type of attributes of a class. >> We have a cluster defined across 6 nodes with a total of 12 SGs, a total of >> 64 SUs, and a total of 292 components. We can start OpenSAF successfully >> across the nodes and unlock all SUs with no problems. >> >> The cluster definition was increased to 6 nodes, 15 SGs, a total of 56 SUs, >> and a total of 388 components. We are able to start OpenSAF on all nodes >> successfully but as soon as a little over 300 components have been unlocked, >> things start to fall apart. The opensaf processes start to die and the >> cluster is no longer usable. > The SU's, SG's and components are internally objects for IMM. Incresing to "6 > nodes, 15 SGs, a total of 56 SUs, and a total of 388 components" > should not have caused any IMM related problems. > >> Oct 19 16:29:06 colobus osafamfnd[3649]: NO Assigned >> 'safSi=amfSDFSISI1.3,safApp=olcApp' ACTIVE to >> 'safSu=amfSDFSISU1.4,safSg=amfSDFSISG1,safApp=olcApp' >> Oct 19 16:29:07 colobus osafamfnd[3649]: NO >> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp' >> faulted due to 'avaDown' : Recovery is 'componentRestart' >> Oct 19 16:29:09 colobus osafamfnd[3649]: NO >> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp' >> faulted due to 'avaDown' : Recovery is 'componentRestart' >> Oct 19 16:29:09 colobus osafntfd[3587]: ER ntfs_mds_msg_send FAILED >> Oct 19 16:29:09 colobus osafntfd[3587]: ER ntfs_mds_msg_send to ntfa >> failed rc: 2 Oct 19 16:29:09 colobus osafntfd[3587]: ER >> ntfs_mds_msg_send FAILED Oct 19 16:29:09 colobus osafntfd[3587]: ER >> ntfs_mds_msg_send FAILED .... >> Oct 19 16:33:24 colobus ntpd_initres[2608]: host name not found: >> 0.rhel.pool.ntp.org .... >> Oct 19 16:35:18 colobus osafimmnd[3549]: NO Implementer disconnected >> 14 <0, 22b0f> (MsgQueueService142095) Oct 19 16:35:19 colobus >> osafclmd[3602]: NO proc_initialize_msg: send failed. >> dest:22b0f00007a77 Oct 19 16:35:19 colobus osafimmnd[3549]: NO Global >> discard node received for nodeId:22b0f pid:31333 Oct 19 16:35:20 colobus >> osafamfnd[3649]: NO >> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp' >> faulted due to 'avaDown' : Recovery is 'componentRestart' >> Oct 19 16:35:22 colobus osafamfnd[3649]: NO >> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp' >> faulted due to 'avaDown' : Recovery is 'componentRestart' >> Oct 19 16:35:24 colobus osafamfnd[3649]: NO >> 'safComp=amfOELCElfComp2.3.1,safSu=amfOELCSU2.3,safSg=amfOELCSG2,safApp=olcApp' >> faulted due to 'avaDown' : Recovery is 'componentRestart' >> Oct 19 16:35:26 colobus osafimmnd[3549]: NO Implementer connected: 16 >> (MsgQueueService142095) <12021, 2280f> Oct 19 16:35:26 colobus >> osafimmnd[3549]: NO Implementer locally disconnected. Marking it as >> doomed 16 <12021, 2280f> (MsgQueueService142095) Oct 19 16:35:26 >> colobus osafimmnd[3549]: NO Implementer disconnected 16 <12021, 2280f> >> (MsgQueueService142095) Oct 19 16:35:26 colobus osafamfd[3631]: NO >> Node 'bedrazzas.monkey.lab' left the cluster >> >> Have we reached a max of the number of SUs/Components that can be started >> within a single OpenSAF cluster? > OpenSAF 4.4/4.5 is tested for 70 nodes. >> We have tried the above with OpenSAF 4.4 and OpenSAF 4.5 and with both TCP >> and TIPC, all fail similarly. > This should have been an application problem or adjustments related to > timeouts. Please share the syslog messages of all the nodes. >> Thank you! >> >> Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| >> www.NetCracker.com Proven Partner to Communications Service Providers >> >> >> >> >> ________________________________ >> The information transmitted herein is intended only for the person or entity >> to which it is addressed and may contain confidential, proprietary and/or >> privileged material. Any review, retransmission, dissemination or other use >> of, or taking of any action in reliance upon, this information by persons or >> entities other than the intended recipient is prohibited. If you received >> this in error, please contact the sender and delete the material from any >> computer. >> ---------------------------------------------------------------------- >> -------- Comprehensive Server Monitoring with Site24x7. >> Monitor 10 servers for $9/Month. >> Get alerted through email, SMS, voice calls or mobile push notifications. >> Take corrective actions from your mobile device. >> http://p.sf.net/sfu/Zoho >> _______________________________________________ >> Opensaf-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/opensaf-users > > > ________________________________ > The information transmitted herein is intended only for the person or entity > to which it is addressed and may contain confidential, proprietary and/or > privileged material. Any review, retransmission, dissemination or other use > of, or taking of any action in reliance upon, this information by persons or > entities other than the intended recipient is prohibited. If you received > this in error, please contact the sender and delete the material from any > computer. ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
