Hi Thierry, My machine has 30GB RAM ..and 389-ds version is 1.3.4
ldapsearch shows the values for nsslapd-cachememsize updated to 200MB. ldapsearch -LLL -o ldif-wrap=no -D "cn=directory manager" -w 'mypassword' -b 'cn=userRoot,cn=ldbm database,cn=plugins,cn=config'|grep nsslapd-cachememsize nsslapd-cachememsize: 209715200 So, it seems to have updated though seeing that warning(WARNING: ipaca: entry cache size 10485760B is less than db size 11599872B) in the log confuses me a bit. Thers one more entry that I found from the ldapsearch to be bit low nsslapd-dncachememsize: 10485760 maxdncachesize: 10485760 Should I update these as well to a higher value At the time when the issue happened, the memory usage as well as the overall load of the system was very low . I will try reproducing the issue atleast in my QA env..probably by trying to mock simultaneous parallel logins to a large number of hosts thanks Rakesh On Mon, Aug 29, 2016 at 8:16 PM, thierry bordaz <tbor...@redhat.com> wrote: > Hi Rakesh, > > Those tuning may depend on the memory available on your machine. > nsslapd-cachememsize allows the entry cache to consume up to 200Mb but its > memory footprint is known to go above. > 200Mb both looks pretty good to me. How large is your machine ? What is > your version of 389-ds ? > > Those warnings do not change your settings. It just raise that entry cache > of 'ipaca' and 'retrocl' are small but it is fine. The size of the entry > cache is important mostly in userRoot. > You may double check the actual values, after restart, with ldapsearch on > 'cn=userRoot,cn=ldbm database,cn=plugins,cn=config' and 'cn=config,cn=ldbm > database,cn=plugins,cn=config'. > > A step is to know what will be response time of DS to know if it is > responsible of the hang or not. > The logs and possibly pstack during those intermittent hangs will help to > determine that. > > regards > thierry > > > > > > On 08/29/2016 04:25 PM, Rakesh Rajasekharan wrote: > > I tried increasing the nsslapd-dbcachesize and nsslapd-cachememsize in my > QA envs to 200MB. > > However, in my log files, I still see this message > [29/Aug/2016:04:34:37 +0000] - WARNING: ipaca: entry cache size 10485760B > is less than db size 11599872B; We recommend to increase the entry cache > size nsslapd-cachememsize. > [29/Aug/2016:04:34:37 +0000] - WARNING: changelog: entry cache size > 2097152B is less than db size 441647104B; We recommend to increase the > entry cache size nsslapd-cachememsize. > > these are my ldif files that i used to modify the values > modify entry cache size > cat modify-cache-mem-size.ldif > dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config > changetype: modify > replace: nsslapd-cachememsize > nsslapd-cachememsize: 209715200 > > modify db cache size > cat modfy-db-cache-size.ldif > dn: cn=config,cn=ldbm database,cn=plugins,cn=config > changetype: modify > replace: nsslapd-dbcachesize > nsslapd-dbcachesize: 209715200 > > After modifying , i restarted IPA services > > Is there anything else that I need to take care of as the logs suggest > its still not getting the updated values > > Thanks > Rakesh > > On Mon, Aug 29, 2016 at 6:07 PM, Rakesh Rajasekharan < > rakesh.rajasekha...@gmail.com> wrote: > >> Hi Thierry, >> >> Coz of the issues we had to revert back to earlier running openldap in >> production. >> >> I have now done a few TCP related changes in sysctl.conf and have also >> increased the nsslapd-dbcachesize and nsslapd-cachememsize to 200MB >> >> I will again start migrating hosts back to IPA and see if I face the >> earlier issue. >> >> I will update back once I have something >> >> >> Thanks, >> Rakesh >> >> >> >> On Thu, Aug 25, 2016 at 2:17 PM, thierry bordaz < <tbor...@redhat.com> >> tbor...@redhat.com> wrote: >> >>> >>> >>> On 08/25/2016 10:15 AM, Rakesh Rajasekharan wrote: >>> >>> All of the troubleshooting seems fine. >>> >>> >>> However, Running libconv.pl gives me this output >>> >>> ----- Recommendations ----- >>> >>> 1. You have unindexed components, this can be caused from a search on >>> an unindexed attribute, or your returned results exceeded the >>> allidsthreshold. Unindexed components are not recommended. To refuse >>> unindexed searches, switch 'nsslapd-require-index' to 'on' under your >>> database entry (e.g. cn=UserRoot,cn=ldbm database,cn=plugins,cn=config). >>> >>> 2. You have a significant difference between binds and unbinds. You >>> may want to investigate this difference. >>> >>> >>> I feel, this could be a pointer to things going slow.. and IPA hanging. >>> I think i now have something that I can try and nail down this issue. >>> >>> On a sidenote, I was earlier running openldap and migrated over to >>> Freeipa, >>> >>> Thanks >>> Rakesh >>> >>> >>> >>> On Wed, Aug 24, 2016 at 12:38 PM, Petr Spacek < <pspa...@redhat.com> >>> pspa...@redhat.com> wrote: >>> >>>> On 23.8.2016 18:44, Rakesh Rajasekharan wrote: >>>> > I think thers something seriously wrong with my system >>>> > >>>> > not able to run any IPA commands >>>> > >>>> > klist >>>> > Ticket cache: KEYRING:persistent:0:0 >>>> > Default principal: <ad...@xyz.com>ad...@xyz.com >>>> > >>>> > Valid starting Expires Service principal >>>> > 2016-08-23T16:26:36 2016-08-24T16:26:22 krbtgt/ <xyz....@xyz.com> >>>> xyz....@xyz.com >>>> > >>>> > >>>> > [root@prod-ipa-master-1a :~] ipactl status >>>> > Directory Service: RUNNING >>>> > krb5kdc Service: RUNNING >>>> > kadmin Service: RUNNING >>>> > ipa_memcached Service: RUNNING >>>> > httpd Service: RUNNING >>>> > pki-tomcatd Service: RUNNING >>>> > ipa-otpd Service: RUNNING >>>> > ipa: INFO: The ipactl command was successful >>>> > >>>> > >>>> > >>>> > [root@prod-ipa-master :~] ipa user-find p-testuser >>>> > ipa: ERROR: Kerberos error: ('Unspecified GSS failure. Minor code may >>>> > provide more information', 851968)/("Cannot contact any KDC for realm >>>> ' >>>> > XYZ.COM'", -1765328228) >>>> >>> >>> Hi Rakesh, >>> >>> Having a reproducible test case would you rerun the command above. >>> During its processing you may monitor DS process load (top). If it is >>> high, you may get some pstacks of it. >>> Also would you attach the part of DS access logs taken during the >>> command. >>> >>> regards >>> thierry >>> >>> > >>>> >>>> This is weird because the server seems to be up. >>>> >>>> Please follow >>>> http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos >>>> >>>> Petr^2 Spacek >>>> >>>> > >>>> > >>>> > Thanks >>>> > >>>> > Rakesh >>>> > >>>> > On Tue, Aug 23, 2016 at 10:01 PM, Rakesh Rajasekharan < >>>> > rakesh.rajasekha...@gmail.com> wrote: >>>> > >>>> >> i changed the loggin level to 4 . Modifying nsslapd-accesslog-level >>>> >> >>>> >> But, the hang is still there. though I dont see the sigfault now >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> On Tue, Aug 23, 2016 at 9:02 PM, Rakesh Rajasekharan < >>>> >> <rakesh.rajasekha...@gmail.com>rakesh.rajasekha...@gmail.com> wrote: >>>> >> >>>> >>> My disk was getting filled too fast >>>> >>> >>>> >>> logs under /var/log/dirsrv was coming around 5 gb quickly filling up >>>> >>> >>>> >>> Is there a way to make the logging less verbose >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Tue, Aug 23, 2016 at 6:41 PM, Petr Spacek <pspa...@redhat.com> >>>> wrote: >>>> >>> >>>> >>>> On 23.8.2016 15:07, Rakesh Rajasekharan wrote: >>>> >>>>> I was able to fix that may be temporarily... when i checked the >>>> >>>> network.. >>>> >>>>> there was another process that was running and consuming a lot of >>>> >>>> network ( >>>> >>>>> i have no idea who did that. I need to seriously start restricting >>>> >>>> people >>>> >>>>> access to this machine ) >>>> >>>>> >>>> >>>>> after killing that perfomance improved drastically >>>> >>>>> >>>> >>>>> But now, suddenly I started experiencing the same hang. >>>> >>>>> >>>> >>>>> This time , I gert the following error when checked dmesg >>>> >>>>> >>>> >>>>> [ 301.236976] ns-slapd[3124]: segfault at 0 ip 00007f1de416951c >>>> sp >>>> >>>>> 00007f1dee1dba70 error 4 in libcos-plugin.so[7f1de4166000+b000] >>>> >>>>> [ 1116.248431] TCP: request_sock_TCP: Possible SYN flooding on >>>> port 88. >>>> >>>>> Sending cookies. Check SNMP counters. >>>> >>>>> [11831.397037] ns-slapd[22550]: segfault at 0 ip 00007f533d82251c >>>> sp >>>> >>>>> 00007f5347894a70 error 4 in libcos-plugin.so[7f533d81f000+b000] >>>> >>>>> [11832.727989] ns-slapd[22606]: segfault at 0 ip 00007f6231eb951c >>>> sp >>>> >>>>> 00007f623bf2ba70 error 4 in libcos-plugin.so[7f6231eb6000+b00 >>>> >>>> >>>> >>>> Okay, this one is serious. The LDAP server crashed. >>>> >>>> >>>> >>>> 1. Make sure all your packages are up-to-date. >>>> >>>> >>>> >>>> Please see >>>> >>>> <http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d> >>>> http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d >>>> >>>> ebugging-crashes >>>> >>>> for further instructions how to debug this. >>>> >>>> >>>> >>>> Petr^2 Spacek >>>> >>>> >>>> >>>>> >>>> >>>>> and in /var/log/dirsrv/example-com/errors >>>> >>>>> >>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3291138 (rc: 32) >>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3291139 (rc: 32) >>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3291140 (rc: 32) >>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3291141 (rc: 32) >>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3291142 (rc: 32) >>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3291143 (rc: 32) >>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3291144 (rc: 32) >>>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3291145 (rc: 32) >>>> >>>>> [23/Aug/2016:12:49:50 +0000] - Retry count exceeded in delete >>>> >>>>> [23/Aug/2016:12:49:50 +0000] DSRetroclPlugin - >>>> delete_changerecord: >>>> >>>> could >>>> >>>>> not delete change record 3292734 (rc: 51) >>>> >>>>> >>>> >>>>> >>>> >>>>> Can i do something about this error.. I treid to restart ipa a >>>> couple >>>> >>>> of >>>> >>>>> time but that did not help >>>> >>>>> >>>> >>>>> Thanks >>>> >>>>> Rakesh >>>> >>>>> >>>> >>>>> On Mon, Aug 22, 2016 at 2:27 PM, Petr Spacek <pspa...@redhat.com> >>>> >>>> wrote: >>>> >>>>> >>>> >>>>>> On 19.8.2016 19:32, Rakesh Rajasekharan wrote: >>>> >>>>>>> I am running my set up on AWS cloud, and entropy is low at >>>> around >>>> >>>> 180 . >>>> >>>>>>> >>>> >>>>>>> I plan to increase it bu installing haveged . But, would low >>>> entropy >>>> >>>> by >>>> >>>>>> any >>>> >>>>>>> chance cause this issue of intermittent hang . >>>> >>>>>>> Also, the hang is mostly observed when registering around 20 >>>> clients >>>> >>>>>>> together >>>> >>>>>> >>>> >>>>>> Possibly, I'm not sure. If you want to dig into this, I would do >>>> this: >>>> >>>>>> 1. look what process hangs on client (using pstree command or so) >>>> >>>>>> $ pstree >>>> >>>>>> >>>> >>>>>> 2. look to what server and port is the hanging client connected >>>> to >>>> >>>>>> $ lsof -p <PID of the hanging process> >>>> >>>>>> >>>> >>>>>> 3. jump to server and see what process is bound to the target >>>> port >>>> >>>>>> $ netstat -pn >>>> >>>>>> >>>> >>>>>> 4. see where the process if hanging >>>> >>>>>> $ strace -p <PID of the hanging process> >>>> >>>>>> >>>> >>>>>> I hope it helps. >>>> >>>>>> >>>> >>>>>> Petr^2 Spacek >>>> >>>>>> >>>> >>>>>>> On Fri, Aug 19, 2016 at 7:24 PM, Rakesh Rajasekharan < >>>> >>>>>>> <rakesh.rajasekha...@gmail.com>rakesh.rajasekha...@gmail.com> >>>> wrote: >>>> >>>>>>> >>>> >>>>>>>> yes there seems to be something thats worrying.. I have faced >>>> this >>>> >>>> today >>>> >>>>>>>> as well. >>>> >>>>>>>> There are few hosts around 280 odd left and when i try adding >>>> them >>>> >>>> to >>>> >>>>>> IPA >>>> >>>>>>>> , the slowness begins.. >>>> >>>>>>>> >>>> >>>>>>>> all the ipa commands like ipa user-find.. etc becomes very >>>> slow in >>>> >>>>>>>> responding. >>>> >>>>>>>> >>>> >>>>>>>> the SYNC_RECV are not many though just around 80-90 and today >>>> that >>>> >>>> was >>>> >>>>>>>> around 20 only >>>> >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> I have for now increased tcp_max_syn_backlog to 5000. >>>> >>>>>>>> For now the slowness seems to have gone.. but I will do a try >>>> >>>> adding the >>>> >>>>>>>> clients again tomorrow and see how it goes >>>> >>>>>>>> >>>> >>>>>>>> Thanks >>>> >>>>>>>> Rakesh >>>> >>>>>>>> >>>> >>>>>>>> The issues >>>> >>>>>>>> >>>> >>>>>>>> On Fri, Aug 19, 2016 at 12:58 PM, Petr Spacek < >>>> <pspa...@redhat.com>pspa...@redhat.com> >>>> >>>>>> wrote: >>>> >>>>>>>> >>>> >>>>>>>>> On 18.8.2016 17:23, Rakesh Rajasekharan wrote: >>>> >>>>>>>>>> Hi >>>> >>>>>>>>>> >>>> >>>>>>>>>> I am migrating to freeipa from openldap and have around 4000 >>>> >>>> clients >>>> >>>>>>>>>> >>>> >>>>>>>>>> I had openned a another thread on that, but chose to start a >>>> new >>>> >>>> one >>>> >>>>>>>>> here >>>> >>>>>>>>>> as its a separate issue >>>> >>>>>>>>>> >>>> >>>>>>>>>> I was able to change the nssslapd-maxdescriptors adding an >>>> ldif >>>> >>>> file >>>> >>>>>>>>>> >>>> >>>>>>>>>> cat nsslapd-modify.ldif >>>> >>>>>>>>>> dn: cn=config >>>> >>>>>>>>>> changetype: modify >>>> >>>>>>>>>> replace: nsslapd-maxdescriptors >>>> >>>>>>>>>> nsslapd-maxdescriptors: 17000 >>>> >>>>>>>>>> >>>> >>>>>>>>>> and running the ldapmodify command >>>> >>>>>>>>>> >>>> >>>>>>>>>> I have now started moving clients running an openldap to >>>> Freeipa >>>> >>>> and >>>> >>>>>>>>> have >>>> >>>>>>>>>> today moved close to 2000 clients >>>> >>>>>>>>>> >>>> >>>>>>>>>> However, I have noticed that IPA hangs intermittently. >>>> >>>>>>>>>> >>>> >>>>>>>>>> running a kinit admin returns the below error >>>> >>>>>>>>>> kinit: Generic error (see e-text) while getting initial >>>> >>>> credentials >>>> >>>>>>>>>> >>>> >>>>>>>>>> from the /var/log/messages, I see this entry >>>> >>>>>>>>>> >>>> >>>>>>>>>> prod-ipa-master-int kernel: [104090.315801] TCP: >>>> >>>> request_sock_TCP: >>>> >>>>>>>>>> Possible SYN flooding on port 88. Sending cookies. Check >>>> SNMP >>>> >>>>>> counters. >>>> >>>>>>>>> >>>> >>>>>>>>> I would be worried about this message. Maybe kernel/firewall >>>> is >>>> >>>> doing >>>> >>>>>>>>> something fishy behind your back and blocking some >>>> connections or >>>> >>>> so. >>>> >>>>>>>>> >>>> >>>>>>>>> Petr^2 Spacek >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int systemd[1]: Started >>>> Session >>>> >>>> 4885 >>>> >>>>>> of >>>> >>>>>>>>>> user root. >>>> >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int systemd[1]: Starting >>>> Session >>>> >>>> 4885 >>>> >>>>>> of >>>> >>>>>>>>>> user root. >>>> >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int systemd[1]: Started >>>> Session >>>> >>>> 4886 >>>> >>>>>> of >>>> >>>>>>>>>> user root. >>>> >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int systemd[1]: Starting >>>> Session >>>> >>>> 4886 >>>> >>>>>> of >>>> >>>>>>>>>> user root. >>>> >>>>>>>>>> Aug 18 13:02:40 prod-ipa-master-int python[28984]: >>>> ansible-command >>>> >>>>>>>>> Invoked >>>> >>>>>>>>>> with creates=None executable=None shell=True args= >>>> removes=None >>>> >>>>>>>>> warn=True >>>> >>>>>>>>>> chdir=None >>>> >>>>>>>>>> Aug 18 13:04:37 prod-ipa-master-int sssd_be: GSSAPI Error: >>>> >>>> Unspecified >>>> >>>>>>>>> GSS >>>> >>>>>>>>>> failure. Minor code may provide more information (KDC >>>> returned >>>> >>>> error >>>> >>>>>>>>>> string: PROCESS_TGS) >>>> >>>>>>>>>> >>>> >>>>>>>>>> Could it be possible that its due to the initial load of >>>> adding >>>> >>>> the >>>> >>>>>>>>> clients >>>> >>>>>>>>>> or is there something else that I need to take care of. >>>> >>> >>> >>> >>> >>> >> > >
-- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project