Re: [Freeipa-users] Freeipa 4.2.0 hangs intermittently

Rakesh Rajasekharan Mon, 29 Aug 2016 05:43:47 -0700

Hi Thierry,

Coz of the issues we had to revert back to earlier running openldap in
production.


I have now done a few TCP related changes in sysctl.conf and have also
increased the nsslapd-dbcachesize and nsslapd-cachememsize to 200MB

I will again start migrating hosts back to IPA and see if I face the
earlier issue.

I will update back once I have something


Thanks,
Rakesh



On Thu, Aug 25, 2016 at 2:17 PM, thierry bordaz <tbor...@redhat.com> wrote:

>
>
> On 08/25/2016 10:15 AM, Rakesh Rajasekharan wrote:
>
> All of the troubleshooting seems fine.
>
>
> However, Running libconv.pl gives me this output
>
> ----- Recommendations -----
>
>  1.  You have unindexed components, this can be caused from a search on an
> unindexed attribute, or your returned results exceeded the
> allidsthreshold.  Unindexed components are not recommended. To refuse
> unindexed searches, switch 'nsslapd-require-index' to 'on' under your
> database entry (e.g. cn=UserRoot,cn=ldbm database,cn=plugins,cn=config).
>
>  2.  You have a significant difference between binds and unbinds.  You may
> want to investigate this difference.
>
>
> I feel, this could be a pointer to things going slow.. and IPA hanging. I
> think i now have something that I can try and nail down this issue.
>
> On a sidenote, I was earlier running openldap and migrated over to
> Freeipa,
>
> Thanks
> Rakesh
>
>
>
> On Wed, Aug 24, 2016 at 12:38 PM, Petr Spacek <pspa...@redhat.com> wrote:
>
>> On 23.8.2016 18:44, Rakesh Rajasekharan wrote:
>> > I think thers something seriously wrong with my system
>> >
>> > not able to run any  IPA commands
>> >
>> > klist
>> > Ticket cache: KEYRING:persistent:0:0
>> > Default principal: ad...@xyz.com
>> >
>> > Valid starting       Expires              Service principal
>> > 2016-08-23T16:26:36  2016-08-24T16:26:22  krbtgt/ <xyz....@xyz.com>
>> xyz....@xyz.com
>> >
>> >
>> > [root@prod-ipa-master-1a :~] ipactl status
>> > Directory Service: RUNNING
>> > krb5kdc Service: RUNNING
>> > kadmin Service: RUNNING
>> > ipa_memcached Service: RUNNING
>> > httpd Service: RUNNING
>> > pki-tomcatd Service: RUNNING
>> > ipa-otpd Service: RUNNING
>> > ipa: INFO: The ipactl command was successful
>> >
>> >
>> >
>> > [root@prod-ipa-master :~] ipa user-find p-testuser
>> > ipa: ERROR: Kerberos error: ('Unspecified GSS failure.  Minor code may
>> > provide more information', 851968)/("Cannot contact any KDC for realm '
>> > XYZ.COM'", -1765328228)
>>
>
> Hi Rakesh,
>
> Having a reproducible test case would you rerun the command above.
> During its processing you may monitor DS process load (top). If it is
> high, you may get some pstacks of it.
> Also would you attach the part of DS access logs taken during the command.
>
> regards
> thierry
>
> >
>>
>> This is weird because the server seems to be up.
>>
>> Please follow
>> http://www.freeipa.org/page/Troubleshooting#Authentication.2FKerberos
>>
>> Petr^2 Spacek
>>
>> >
>> >
>> > Thanks
>> >
>> > Rakesh
>> >
>> > On Tue, Aug 23, 2016 at 10:01 PM, Rakesh Rajasekharan <
>> > rakesh.rajasekha...@gmail.com> wrote:
>> >
>> >> i changed the loggin level to 4 . Modifying nsslapd-accesslog-level
>> >>
>> >> But, the hang is still there. though I dont see the sigfault now
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Aug 23, 2016 at 9:02 PM, Rakesh Rajasekharan <
>> >> rakesh.rajasekha...@gmail.com> wrote:
>> >>
>> >>> My disk was getting filled too fast
>> >>>
>> >>> logs under /var/log/dirsrv was coming around 5 gb quickly filling up
>> >>>
>> >>> Is there a way to make the logging less verbose
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Aug 23, 2016 at 6:41 PM, Petr Spacek <pspa...@redhat.com>
>> wrote:
>> >>>
>> >>>> On 23.8.2016 15:07, Rakesh Rajasekharan wrote:
>> >>>>> I was able to fix that may be temporarily... when i checked the
>> >>>> network..
>> >>>>> there was another process that was running and consuming a lot of
>> >>>> network (
>> >>>>> i have no idea who did that. I need to seriously start restricting
>> >>>> people
>> >>>>> access to this machine )
>> >>>>>
>> >>>>> after killing that perfomance improved drastically
>> >>>>>
>> >>>>> But now, suddenly I started experiencing the same hang.
>> >>>>>
>> >>>>> This time , I gert the following error when checked dmesg
>> >>>>>
>> >>>>> [  301.236976] ns-slapd[3124]: segfault at 0 ip 00007f1de416951c sp
>> >>>>> 00007f1dee1dba70 error 4 in libcos-plugin.so[7f1de4166000+b000]
>> >>>>> [ 1116.248431] TCP: request_sock_TCP: Possible SYN flooding on port
>> 88.
>> >>>>> Sending cookies.  Check SNMP counters.
>> >>>>> [11831.397037] ns-slapd[22550]: segfault at 0 ip 00007f533d82251c sp
>> >>>>> 00007f5347894a70 error 4 in libcos-plugin.so[7f533d81f000+b000]
>> >>>>> [11832.727989] ns-slapd[22606]: segfault at 0 ip 00007f6231eb951c sp
>> >>>>> 00007f623bf2ba70 error 4 in libcos-plugin.so[7f6231eb6000+b00
>> >>>>
>> >>>> Okay, this one is serious. The LDAP server crashed.
>> >>>>
>> >>>> 1. Make sure all your packages are up-to-date.
>> >>>>
>> >>>> Please see
>> >>>> http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#d
>> >>>> ebugging-crashes
>> >>>> for further instructions how to debug this.
>> >>>>
>> >>>> Petr^2 Spacek
>> >>>>
>> >>>>>
>> >>>>> and in /var/log/dirsrv/example-com/errors
>> >>>>>
>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3291138 (rc: 32)
>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3291139 (rc: 32)
>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3291140 (rc: 32)
>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3291141 (rc: 32)
>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3291142 (rc: 32)
>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3291143 (rc: 32)
>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3291144 (rc: 32)
>> >>>>> [23/Aug/2016:12:49:36 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3291145 (rc: 32)
>> >>>>> [23/Aug/2016:12:49:50 +0000] - Retry count exceeded in delete
>> >>>>> [23/Aug/2016:12:49:50 +0000] DSRetroclPlugin - delete_changerecord:
>> >>>> could
>> >>>>> not delete change record 3292734 (rc: 51)
>> >>>>>
>> >>>>>
>> >>>>> Can  i do something about this error.. I treid to restart ipa a
>> couple
>> >>>> of
>> >>>>> time but that did not help
>> >>>>>
>> >>>>> Thanks
>> >>>>> Rakesh
>> >>>>>
>> >>>>> On Mon, Aug 22, 2016 at 2:27 PM, Petr Spacek <pspa...@redhat.com>
>> >>>> wrote:
>> >>>>>
>> >>>>>> On 19.8.2016 19:32, Rakesh Rajasekharan wrote:
>> >>>>>>> I am running my set up on AWS cloud, and entropy is low at around
>> >>>> 180 .
>> >>>>>>>
>> >>>>>>> I plan to increase it bu installing haveged . But, would low
>> entropy
>> >>>> by
>> >>>>>> any
>> >>>>>>> chance cause this issue of intermittent hang .
>> >>>>>>> Also, the hang is mostly observed when registering around 20
>> clients
>> >>>>>>> together
>> >>>>>>
>> >>>>>> Possibly, I'm not sure. If you want to dig into this, I would do
>> this:
>> >>>>>> 1. look what process hangs on client (using pstree command or so)
>> >>>>>> $ pstree
>> >>>>>>
>> >>>>>> 2. look to what server and port is the hanging client connected to
>> >>>>>> $ lsof -p <PID of the hanging process>
>> >>>>>>
>> >>>>>> 3. jump to server and see what process is bound to the target port
>> >>>>>> $ netstat -pn
>> >>>>>>
>> >>>>>> 4. see where the process if hanging
>> >>>>>> $ strace -p <PID of the hanging process>
>> >>>>>>
>> >>>>>> I hope it helps.
>> >>>>>>
>> >>>>>> Petr^2 Spacek
>> >>>>>>
>> >>>>>>> On Fri, Aug 19, 2016 at 7:24 PM, Rakesh Rajasekharan <
>> >>>>>>> rakesh.rajasekha...@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>>> yes there seems to be something thats worrying.. I have faced
>> this
>> >>>> today
>> >>>>>>>> as well.
>> >>>>>>>> There are few hosts around 280 odd left and when i try adding
>> them
>> >>>> to
>> >>>>>> IPA
>> >>>>>>>> , the slowness begins..
>> >>>>>>>>
>> >>>>>>>> all the ipa commands like ipa user-find.. etc becomes very slow
>> in
>> >>>>>>>> responding.
>> >>>>>>>>
>> >>>>>>>> the SYNC_RECV are not many though just around 80-90 and today
>> that
>> >>>> was
>> >>>>>>>> around 20 only
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I have for now increased tcp_max_syn_backlog to 5000.
>> >>>>>>>> For now the slowness seems to have gone.. but I will do a try
>> >>>> adding the
>> >>>>>>>> clients again tomorrow and see how it goes
>> >>>>>>>>
>> >>>>>>>> Thanks
>> >>>>>>>> Rakesh
>> >>>>>>>>
>> >>>>>>>> The issues
>> >>>>>>>>
>> >>>>>>>> On Fri, Aug 19, 2016 at 12:58 PM, Petr Spacek <
>> pspa...@redhat.com>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> On 18.8.2016 17:23, Rakesh Rajasekharan wrote:
>> >>>>>>>>>> Hi
>> >>>>>>>>>>
>> >>>>>>>>>> I am migrating to freeipa from openldap and have around 4000
>> >>>> clients
>> >>>>>>>>>>
>> >>>>>>>>>> I had openned a another thread on that, but chose to start a
>> new
>> >>>> one
>> >>>>>>>>> here
>> >>>>>>>>>> as its a separate issue
>> >>>>>>>>>>
>> >>>>>>>>>> I was able to change the nssslapd-maxdescriptors adding an ldif
>> >>>> file
>> >>>>>>>>>>
>> >>>>>>>>>> cat nsslapd-modify.ldif
>> >>>>>>>>>> dn: cn=config
>> >>>>>>>>>> changetype: modify
>> >>>>>>>>>> replace: nsslapd-maxdescriptors
>> >>>>>>>>>> nsslapd-maxdescriptors: 17000
>> >>>>>>>>>>
>> >>>>>>>>>> and running the ldapmodify command
>> >>>>>>>>>>
>> >>>>>>>>>> I have now started moving clients running an openldap to
>> Freeipa
>> >>>> and
>> >>>>>>>>> have
>> >>>>>>>>>> today moved close to 2000 clients
>> >>>>>>>>>>
>> >>>>>>>>>> However, I have noticed that IPA hangs intermittently.
>> >>>>>>>>>>
>> >>>>>>>>>> running a kinit admin returns the below error
>> >>>>>>>>>> kinit: Generic error (see e-text) while getting initial
>> >>>> credentials
>> >>>>>>>>>>
>> >>>>>>>>>> from the /var/log/messages, I see this entry
>> >>>>>>>>>>
>> >>>>>>>>>>  prod-ipa-master-int kernel: [104090.315801] TCP:
>> >>>> request_sock_TCP:
>> >>>>>>>>>> Possible SYN flooding on port 88. Sending cookies.  Check SNMP
>> >>>>>> counters.
>> >>>>>>>>>
>> >>>>>>>>> I would be worried about this message. Maybe kernel/firewall is
>> >>>> doing
>> >>>>>>>>> something fishy behind your back and blocking some connections
>> or
>> >>>> so.
>> >>>>>>>>>
>> >>>>>>>>> Petr^2 Spacek
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int systemd[1]: Started Session
>> >>>> 4885
>> >>>>>> of
>> >>>>>>>>>> user root.
>> >>>>>>>>>> Aug 18 13:00:01 prod-ipa-master-int systemd[1]: Starting
>> Session
>> >>>> 4885
>> >>>>>> of
>> >>>>>>>>>> user root.
>> >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int systemd[1]: Started Session
>> >>>> 4886
>> >>>>>> of
>> >>>>>>>>>> user root.
>> >>>>>>>>>> Aug 18 13:01:01 prod-ipa-master-int systemd[1]: Starting
>> Session
>> >>>> 4886
>> >>>>>> of
>> >>>>>>>>>> user root.
>> >>>>>>>>>> Aug 18 13:02:40 prod-ipa-master-int python[28984]:
>> ansible-command
>> >>>>>>>>> Invoked
>> >>>>>>>>>> with creates=None executable=None shell=True args= removes=None
>> >>>>>>>>> warn=True
>> >>>>>>>>>> chdir=None
>> >>>>>>>>>> Aug 18 13:04:37 prod-ipa-master-int sssd_be: GSSAPI Error:
>> >>>> Unspecified
>> >>>>>>>>> GSS
>> >>>>>>>>>> failure.  Minor code may provide more information (KDC returned
>> >>>> error
>> >>>>>>>>>> string: PROCESS_TGS)
>> >>>>>>>>>>
>> >>>>>>>>>> Could it be possible that its due to the initial load of adding
>> >>>> the
>> >>>>>>>>> clients
>> >>>>>>>>>> or is there something else that I need to take care of.
>>
>
>
>
>
>

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] Freeipa 4.2.0 hangs intermittently

Reply via email to