On (12/09/16 11:09), Lachlan Musicman wrote: >We saw another sssd crash on the weekend (well, Friday night). > >Centos 7, sssd 1.14.0 from COPR > Please upgrade to 1.14.1 from copr.
>Everything has worked fine for over a month until Friday. > >According to the log sssd_nss on the host in question: > > - at about 16:18, watchdog_handler killed a process for a timer overflow. > - there is some flopping about as nss/sssd tries to reconnect > - at 16:19:12 we see this: > >(Fri Sep 9 16:19:12 2016) [sssd[nss]] [sbus_dispatch] (0x0400): SBUS is >reconnecting. Deferring. > > - Which continues until 16:20:56 > >(Fri Sep 9 16:20:56 2016) [sssd[nss]] [sbus_dispatch] (0x0400): SBUS is >reconnecting. Deferring. > >Note that there are 9,573,091 lines of this, at about 80,000 msgs per >second. > > - nss seems to stumble back to life at this point (there are no logs on >the freeipa server unfortunately) > > - at every 15 min interval we see this (I think this might be zabbix >polling sssd): > >(Fri Sep 9 18:30:01 2016) [sssd[nss]] [get_client_cred] (0x0020): >SELINUX_getpeercon failed [-1][Unknown error -1]. What is a state of SELinux on your machine? Please share output of "sestatus" LS -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project