On 20-04-2020 15:16, thierry bordaz wrote: > On 4/20/20 3:02 PM, Kees Bakker wrote: >> On 20-04-2020 14:51, Rob Crittenden wrote: >>> Kees Bakker via FreeIPA-users wrote: >>>> On 20-04-2020 09:58, Kees Bakker via FreeIPA-users wrote: >>>>> On 20-04-2020 09:09, Florence Blanc-Renaud wrote: >>>>>> On 4/20/20 8:28 AM, Kees Bakker via FreeIPA-users wrote: >>>>>>> Hey, >>>>>>> >>>>>>> I'm looking for advice how to analyse/debug this. >>>>>>> >>>>>>> On one of the masters the dirsrv is unresponsive. It runs, but every >>>>>>> attempt to connect it hangs. >>>>>>> >>>>>>> The command "systemctl status" does not show anything alarming >>>>>>> >>>>>>> ● dirsrv@EXAMPLE-COM.service - 389 Directory Server EXAMPLE-COM. >>>>>>> Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; >>>>>>> vendor preset: disabled) >>>>>>> Active: active (running) since vr 2020-04-17 13:46:25 CEST; 1h >>>>>>> 33min ago >>>>>>> Process: 3123 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl >>>>>>> /etc/dirsrv/slapd-%i/dse.ldif (code=exited, status=0/SUCCESS) >>>>>>> Main PID: 3134 (ns-slapd) >>>>>>> Status: "slapd started: Ready to process requests" >>>>>>> CGroup: >>>>>>> /system.slice/system-dirsrv.slice/dirsrv@EXAMPLE-COM.service >>>>>>> └─3134 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-EXAMPLE-COM >>>>>>> -i /var/run/dirsrv/slapd-EXAMPLE-COM.pid >>>>>>> >>>>>>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>>>>>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>>>>>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>>>>>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>>>>>> apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 2 >>>>>>> apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>>>>>> apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>>>>>> apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>>>>>> apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1 >>>>>>> apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 2 >>>>>>> >>>>>>> However, an ldapsearch command hangs forever >>>>>>> >>>>>>> [root@rotte ~]# ldapsearch -H ldaps://linge.example.com -D >>>>>>> uid=keesbtest,cn=users,cn=accounts,dc=example,dc=com -W -LLL -o >>>>>>> ldif-wrap=no -b cn=users,cn=accounts,dc=example,dc=com >>>>>>> '(&(objectClass=person)(memberOf=cn=admins,cn=groups,cn=accounts,dc=example,dc=com))' >>>>>>> uid >>>>>>> Enter LDAP Password: >>>>>>> >>>>>>> Even if I use the socket >>>>>>> (ldapi://%2fvar%2frun%2fslapd-EXAMPLE-COM.socket) the ldapsearch >>>>>>> command hangs. >>>>>>> >>>>>>> "ipactl status" hangs >>>>>>> >>>>>>> "kinit" hangs >>>>>>> >>>>>>> >>>>>> Hi, >>>>>> you can start by having a look at dirsrv error log in >>>>>> /var/log/dirsrv-slapd-YOUR_DOMAIN/errors, and the journal. >>>>>> >>>>>> The FAQ page of 389 also explains a few troubleshooting steps: >>>>>> http://www.port389.org/docs/389ds/FAQ/faq.html#Troubleshooting >>>>> I did exactly that, look at the "errors" log, but there was no clue, at >>>>> least >>>>> not for me. Strange enough it kept running for a few hours and then it >>>>> was hanging again. >>>>> >>>>> I tried the command "ipctl restart", but that was hanging forever. >>>>> However "systemctl restart dirsrv@MY-DOMAIN" was able to restart >>>>> it after several minutes. Meanwhile the sn-slapd process was using 100% >>>>> CPU. >>>>> >>>>> Another remark I want to make. Every ldap connection (ldapsearch, >>>>> whatever) >>>>> hangs for ever. No timeout, nothing. >>>>> >>>>> When it rains, it pours, they say. There is another master with the same >>>>> symptom. >>>>> I'm getting nervous now. >>>>> >>>>> Thanks for the Troubleshooting link. I'll have to dive into the deep, I >>>>> guess. >>>> Could it be a deadlock? >>>> >>>> [root@linge ~]# grep -a1 '^#0.*lock' slapd-stacktrace.1587374239.txt >>>> Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)): >>>> #0 0x00007ff929430144 in pthread_rwlock_rdlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190cc49c in map_rdlock () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> -- >>>> Thread 7 (Thread 0x7ff8f7255700 (LWP 14490)): >>>> #0 0x00007ff929430144 in pthread_rwlock_rdlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190cc49c in map_rdlock () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> -- >>>> Thread 4 (Thread 0x7ff8f5a52700 (LWP 14493)): >>>> #0 0x00007ff929430144 in pthread_rwlock_rdlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190cc49c in map_rdlock () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> -- >>>> Thread 2 (Thread 0x7ff92c355700 (LWP 15679)): >>>> #0 0x00007ff92943035e in pthread_rwlock_wrlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190b6639 in backend_be_pre_write_cb () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> >>>> Without debuginfo the trace of these threads look like this: >>>> >>>> Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)): >>>> #0 0x00007ff929430144 in pthread_rwlock_rdlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190cc49c in map_rdlock () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> #2 0x00007ff9190b8745 in backend_search_cb () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> #3 0x00007ff92bcd9028 in plugin_call_func () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #4 0x00007ff92bcd92e3 in plugin_call_plugins () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #5 0x00007ff92bccc0d7 in op_shared_search () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #6 0x0000562454427bbe in do_search () >>>> #7 0x000056245441595a in connection_threadmain () >>>> #8 0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so >>>> #9 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0 >>>> #10 0x00007ff928ad888d in clone () at /lib64/libc.so.6 >>>> >>>> Thread 7 (Thread 0x7ff8f7255700 (LWP 14490)): >>>> #0 0x00007ff929430144 in pthread_rwlock_rdlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190cc49c in map_rdlock () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> #2 0x00007ff9190b8745 in backend_search_cb () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> #3 0x00007ff92bcd9028 in plugin_call_func () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #4 0x00007ff92bcd92e3 in plugin_call_plugins () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #5 0x00007ff92bccc0d7 in op_shared_search () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #6 0x0000562454427bbe in do_search () >>>> #7 0x000056245441595a in connection_threadmain () >>>> #8 0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so >>>> #9 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0 >>>> #10 0x00007ff928ad888d in clone () at /lib64/libc.so.6 >>>> >>>> Thread 4 (Thread 0x7ff8f5a52700 (LWP 14493)): >>>> #0 0x00007ff929430144 in pthread_rwlock_rdlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190cc49c in map_rdlock () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> #2 0x00007ff9190b8745 in backend_search_cb () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> #3 0x00007ff92bcd9028 in plugin_call_func () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #4 0x00007ff92bcd92e3 in plugin_call_plugins () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #5 0x00007ff92bccc0d7 in op_shared_search () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #6 0x0000562454427bbe in do_search () >>>> #7 0x000056245441595a in connection_threadmain () >>>> #8 0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so >>>> #9 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0 >>>> #10 0x00007ff928ad888d in clone () at /lib64/libc.so.6 >>>> >>>> Thread 2 (Thread 0x7ff92c355700 (LWP 15679)): >>>> #0 0x00007ff92943035e in pthread_rwlock_wrlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190b6639 in backend_be_pre_write_cb () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> #2 0x00007ff92bcd9028 in plugin_call_func () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #3 0x00007ff92bcd92e3 in plugin_call_plugins () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #4 0x00007ff91b08ac88 in ldbm_back_delete () at >>>> /usr/lib64/dirsrv/plugins/libback-ldbm.so >>>> #5 0x00007ff92bc89feb in op_shared_delete () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #6 0x00007ff92bc8a1c8 in delete_internal_pb () at >>>> /usr/lib64/dirsrv/libslapd.so.0 >>>> #7 0x00007ff9196f131e in delete_changerecord () at >>>> /usr/lib64/dirsrv/plugins/libretrocl-plugin.so >>>> #8 0x00007ff9196f1605 in changelog_trim_thread_fn () at >>>> /usr/lib64/dirsrv/plugins/libretrocl-plugin.so >>>> #9 0x00007ff929a8cc5b in _pt_root () at /lib64/libnspr4.so >>>> #10 0x00007ff92942ce65 in start_thread () at /lib64/libpthread.so.0 >>>> #11 0x00007ff928ad888d in clone () at /lib64/libc.so.6 >>>> >>>> With the following (stripped) debuginfo you can see that it is hanging >>>> while deleting >>>> a changerecord. >>>> >>>> Thread 2 (Thread 0x7ff92c355700 (LWP 15679)): >>>> #0 0x00007ff92943035e in pthread_rwlock_wrlock () at >>>> /lib64/libpthread.so.0 >>>> #1 0x00007ff9190b6639 in backend_be_pre_write_cb () at >>>> /usr/lib64/dirsrv/plugins/schemacompat-plugin.so >>>> #2 0x00007ff92bcd9028 in plugin_call_func (list=0x5624565a3340, >>>> operation=operation@entry=453, pb=pb@entry=0x5624629c28 >>>> 40, call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2028 >>>> n = <optimized out> >>>> func = 0x7ff9190b65a0 <backend_be_pre_write_cb> >>>> rc = <optimized out> >>>> return_value = 0 >>>> count = 0 >>>> #3 0x00007ff92bcd92e3 in plugin_call_list (pb=0x5624629c2840, >>>> operation=453, list=<optimized out>) at ldap/servers/slap >>>> d/plugin.c:1972 >>>> p = 0x5624564ea2c0 >>>> locked = <optimized out> >>>> plugin_list_number = 3 >>>> rc = 0 >>>> do_op = <optimized out> >>>> #4 0x00007ff92bcd92e3 in plugin_call_plugins (pb=pb@entry=0x5624629c2840, >>>> whichfunction=whichfunction@entry=453) at lda >>>> p/servers/slapd/plugin.c:442 >>>> p = 0x5624564ea2c0 >>>> locked = <optimized out> >>>> plugin_list_number = 3 >>>> rc = 0 >>>> do_op = <optimized out> >>>> #5 0x00007ff91b08ac88 in ldbm_back_delete (pb=0x5624629c2840) at >>>> ldap/servers/slapd/back-ldbm/ldbm_delete.c:373 >>>> be = 0x5624564fc000 >>>> inst = 0x56245635c680 >>>> li = 0x5624562d3a40 >>>> e = 0x562462377810 >>>> tombstone = 0x0 >>>> original_tombstone = 0x0 >>>> tmptombstone = 0x0 >>>> dn = 0x56245d90fa70 "changenumber=879530,cn=changelog" >>>> txn = {back_txn_txn = 0x562459c2cc60} >>>> parent_txn = 0x0 >>>> retval = 0 >>>> msg = <optimized out> >>>> errbuf = 0x0 >>>> retry_count = <optimized out> >>>> disk_full = 0 >>>> parent_found = <optimized out> >>>> ruv_c_init = 0 >>>> parent_modify_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, >>>> attr_encrypt = 0} >>>> ruv_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, >>>> attr_encrypt = 0} >>>> ... >>>> >>> What distribution are you using and what version of 389-ds? >>> >> Centos7 >> >> [root@linge ~]# rpm -qa 389\* >> 389-ds-base-debuginfo-1.3.9.1-13.el7_7.x86_64 >> 389-ds-base-libs-1.3.9.1-13.el7_7.x86_64 >> 389-ds-base-1.3.9.1-13.el7_7.x86_64 >> >> [root@linge ~]# rpm -qa centos-release >> centos-release-7-7.1908.0.el7.centos.x86_64 >> >> > This is a known bug [1]. > With the same bug there are two deadlock scenario but only one is fixed (for > example in slapi-nis-0.56.4-1 [2]). > A fix for the second one is under tests. > > At the moment I would recommend the workaround [3]. The drawback is a growth > of the retroCL database but unless you have a very high rate of update it > should not be a concern. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1751295 > [2] https://koji.fedoraproject.org/koji/buildinfo?buildID=1457771 > [3] https://bugzilla.redhat.com/show_bug.cgi?id=1751295#c5
OK, thanks for the help. I will try that. Two remarks. 1) I did an update a few days ago. (1.3.8.4-18.el7_6 => 1.3.9.1-13.el7_7). Then two days after the update this hang started to happen. 2) the nasty thing about this is, that 398-ds was still alive, but all connections to it hang forever (even "ipactl restart") -- Kees _______________________________________________ FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org