[SSSD-users] Re: login hangs with enumerate = true

2017-06-14 Thread Jakub Hrozek
On Tue, Jun 13, 2017 at 06:21:28PM +, Joakim Tjernlund wrote:
> On Tue, 2017-06-13 at 18:01 +0200, Jakub Hrozek wrote:
> > On Tue, Jun 13, 2017 at 12:12:05PM +, Joakim Tjernlund wrote:
> > > > It is now :) was in the wrong section before
> > > 
> > > timeout = 30 in domain section SEEMS to help, no problem since yesterday.
> > > What did I really do here?
> > 
> > There is a ticket to document this better already but tl;dr there is a
> > watchdog that, unless during three ticks of the 'timeout' value, an
> > internal event is received that resets the watchdog, kills the process,
> > because the process is presumed stuck.
> > 
> > What happens when sssd writes so many entries to the cache is that the
> > write operations blocks the event loop, prevents the delivery of the
> > watchdog reset which results in killing of the process.
> 
> hmm, on a tmpfs 3*10 secs should be more that enough for that I think.
> Also, the process(the domain process) was never dead but eating CPU instead.

well, I was not precise earlier, it doesn't have to be writes, but for
example the loop you showed checks if all members of a group are cached
already or not by searching each member in turn. That is not a write,
but can also block the process.
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-14 Thread Jakub Hrozek
On Tue, Jun 13, 2017 at 06:18:24PM +, Joakim Tjernlund wrote:
> On Tue, 2017-06-13 at 17:59 +0200, Jakub Hrozek wrote:
> > On Tue, Jun 13, 2017 at 12:34:41PM +, Joakim Tjernlund wrote:
> > > > timeout = 30 in domain section SEEMS to help, no problem since 
> > > > yesterday.
> > > > What did I really do here?
> > > > 
> > > 
> > > However, now I see that getent group/getent group  is 
> > > incomplete, members are missing.
> > > And it varies between machines, even ones that have enumerate = false has 
> > > incomplete member list for
> > > a random grop name.
> > 
> > Bug-whack-a-mole probably:
> > https://pagure.io/SSSD/sssd/issue/3369
> > please check the debug logs if there are messages from the "cleanup
> > task".
> 
> Nothing in the logs, what debug level do I need to see this?

5 or higher.
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-13 Thread Joakim Tjernlund
On Tue, 2017-06-13 at 18:01 +0200, Jakub Hrozek wrote:
> On Tue, Jun 13, 2017 at 12:12:05PM +, Joakim Tjernlund wrote:
> > > It is now :) was in the wrong section before
> > 
> > timeout = 30 in domain section SEEMS to help, no problem since yesterday.
> > What did I really do here?
> 
> There is a ticket to document this better already but tl;dr there is a
> watchdog that, unless during three ticks of the 'timeout' value, an
> internal event is received that resets the watchdog, kills the process,
> because the process is presumed stuck.
> 
> What happens when sssd writes so many entries to the cache is that the
> write operations blocks the event loop, prevents the delivery of the
> watchdog reset which results in killing of the process.

hmm, on a tmpfs 3*10 secs should be more that enough for that I think.
Also, the process(the domain process) was never dead but eating CPU instead.

 Jocke
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-13 Thread Joakim Tjernlund
On Tue, 2017-06-13 at 17:59 +0200, Jakub Hrozek wrote:
> On Tue, Jun 13, 2017 at 12:34:41PM +, Joakim Tjernlund wrote:
> > > timeout = 30 in domain section SEEMS to help, no problem since yesterday.
> > > What did I really do here?
> > > 
> > 
> > However, now I see that getent group/getent group  is 
> > incomplete, members are missing.
> > And it varies between machines, even ones that have enumerate = false has 
> > incomplete member list for
> > a random grop name.
> 
> Bug-whack-a-mole probably:
> https://pagure.io/SSSD/sssd/issue/3369
> please check the debug logs if there are messages from the "cleanup
> task".

Nothing in the logs, what debug level do I need to see this?
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-13 Thread Jakub Hrozek
On Tue, Jun 13, 2017 at 12:12:05PM +, Joakim Tjernlund wrote:
> > It is now :) was in the wrong section before
> 
> timeout = 30 in domain section SEEMS to help, no problem since yesterday.
> What did I really do here?

There is a ticket to document this better already but tl;dr there is a
watchdog that, unless during three ticks of the 'timeout' value, an
internal event is received that resets the watchdog, kills the process,
because the process is presumed stuck.

What happens when sssd writes so many entries to the cache is that the
write operations blocks the event loop, prevents the delivery of the
watchdog reset which results in killing of the process.
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-13 Thread Jakub Hrozek
On Tue, Jun 13, 2017 at 12:34:41PM +, Joakim Tjernlund wrote:
> > timeout = 30 in domain section SEEMS to help, no problem since yesterday.
> > What did I really do here?
> > 
> 
> However, now I see that getent group/getent group  is incomplete, 
> members are missing.
> And it varies between machines, even ones that have enumerate = false has 
> incomplete member list for
> a random grop name.

Bug-whack-a-mole probably:
https://pagure.io/SSSD/sssd/issue/3369
please check the debug logs if there are messages from the "cleanup
task".
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-13 Thread Joakim Tjernlund
On Tue, 2017-06-13 at 14:12 +0200, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 18:06 +0200, Joakim Tjernlund wrote:
> > On Mon, 2017-06-12 at 17:51 +0200, Jakub Hrozek wrote:
> > > On Mon, Jun 12, 2017 at 03:38:28PM +, Joakim Tjernlund wrote:
> > > > On Mon, 2017-06-12 at 17:32 +0200, Joakim Tjernlund wrote:
> > > > > On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > > > > > On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > > > > > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > > > > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund 
> > > > > > > > wrote:
> > > > > > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund 
> > > > > > > > > > wrote:
> > > > > > > > > > > both 1.15.2 and git master hangs after less than 24 hour 
> > > > > > > > > > > on
> > > > > > > > > > > a server.
> > > > > > > > > > > 
> > > > > > > > > > > I can see this repeating the domain log:
> > > > > > > > > > > 
> > > > > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > > > [0xf65ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > > > [0x239cce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > > > [0x1421ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > > > [0x1cb0ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > > 
> > > > > > > > > > This is caused by too long write to disk.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Can I just increase the timeout for now? I will patch the 
> > > > > > > > > code if needed.
> > > > > > > > > On this sever we need enumerate = true ATM, cannot just turn 
> > > > > > > > > it off.
> > > > > > > > 
> > > > > > > > Oh, sure. The other alternative might be to mount the cache to 
> > > > > > > > tmpfs.
> > > > > > > 
> > > > > > > After mounting a tmpfs this morning on /var/lib/sss/db, the error 
> > > > > > > has returned.
> > > > > > > Seems to an additional problem here.
> > > > > > > 
> > > > > > > I don't this AD is that big either:
> > > > > > > # > getent passwd | wc -l
> > > > > > > 3236
> > > > > > > # > getent group | wc -l
> > > > > > > 885
> > > > > > > 
> > > > > > > Any ideas?
> > > > > > 
> > > > > > Can you get a pstack of when the process is 'stuck' ?
> > > > > > 
> > > > > > Does increasing the 'timeout' parameter from its default '10' to 
> > > > > > maybe
> > > > > > 30 in the domain section help?
> > > > > 
> > > > > I see ALOT of this in the log( figured I look before I restart sssd)
> > > > > 
> > > > > 
> > > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] 
> > > > > [sdap_find_entry_by_origDN] (0x4000): Searching cache for 
> > > > > [CN=Jovy\20Sena,OU=Sunnyvale,OU=CorpUsers,DC=infinera,DC=com].
> > > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > > Added timed event "ltdb_callback": 0x4c28c00
> > > > > 
> > > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > > Added timed event "ltdb_timeout": 0x4c28cc0
> > > > > 
> > > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > > Running timer event 0x4c28c00 "ltdb_callback"
> > > > > 
> > > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > > Destroying timer event 0x4c28cc0 "ltdb_timeout"
> > > > > 
> > > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > > Ending timer event 0x4c28c00 "ltdb_callback"
> > > > > 
> > > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > > Added timed event "ltdb_callback": 0x34ccf50
> > > > > 
> > > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > > Added timed event "ltdb_timeout

[SSSD-users] Re: login hangs with enumerate = true

2017-06-13 Thread Joakim Tjernlund
On Mon, 2017-06-12 at 18:06 +0200, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 17:51 +0200, Jakub Hrozek wrote:
> > On Mon, Jun 12, 2017 at 03:38:28PM +, Joakim Tjernlund wrote:
> > > On Mon, 2017-06-12 at 17:32 +0200, Joakim Tjernlund wrote:
> > > > On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > > > > On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > > > > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > > > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund 
> > > > > > > > > wrote:
> > > > > > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > > > > > a server.
> > > > > > > > > > 
> > > > > > > > > > I can see this repeating the domain log:
> > > > > > > > > > 
> > > > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > > [0xf65ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > > [0x239cce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > > [0x1421ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > > [0x1cb0ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > 
> > > > > > > > > This is caused by too long write to disk.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Can I just increase the timeout for now? I will patch the code 
> > > > > > > > if needed.
> > > > > > > > On this sever we need enumerate = true ATM, cannot just turn it 
> > > > > > > > off.
> > > > > > > 
> > > > > > > Oh, sure. The other alternative might be to mount the cache to 
> > > > > > > tmpfs.
> > > > > > 
> > > > > > After mounting a tmpfs this morning on /var/lib/sss/db, the error 
> > > > > > has returned.
> > > > > > Seems to an additional problem here.
> > > > > > 
> > > > > > I don't this AD is that big either:
> > > > > > # > getent passwd | wc -l
> > > > > > 3236
> > > > > > # > getent group | wc -l
> > > > > > 885
> > > > > > 
> > > > > > Any ideas?
> > > > > 
> > > > > Can you get a pstack of when the process is 'stuck' ?
> > > > > 
> > > > > Does increasing the 'timeout' parameter from its default '10' to maybe
> > > > > 30 in the domain section help?
> > > > 
> > > > I see ALOT of this in the log( figured I look before I restart sssd)
> > > > 
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] 
> > > > [sdap_find_entry_by_origDN] (0x4000): Searching cache for 
> > > > [CN=Jovy\20Sena,OU=Sunnyvale,OU=CorpUsers,DC=infinera,DC=com].
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Added timed event "ltdb_callback": 0x4c28c00
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Added timed event "ltdb_timeout": 0x4c28cc0
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Running timer event 0x4c28c00 "ltdb_callback"
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Destroying timer event 0x4c28cc0 "ltdb_timeout"
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Ending timer event 0x4c28c00 "ltdb_callback"
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Added timed event "ltdb_callback": 0x34ccf50
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Added timed event "ltdb_timeout": 0x34cd0c0
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Running timer event 0x34ccf50 "ltdb_callback"
> > > > 
> > > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > > Destroying timer event 0x34cd0c0 "ltdb_timeout"

[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Joakim Tjernlund
On Mon, 2017-06-12 at 17:57 +0200, Jakub Hrozek wrote:
> On Mon, Jun 12, 2017 at 03:21:43PM +, Joakim Tjernlund wrote:
> > On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > > On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > > > a server.
> > > > > > > > 
> > > > > > > > I can see this repeating the domain log:
> > > > > > > > 
> > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > [0xf65ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > [0x239cce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > [0x1421ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > [0x1cb0ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > 
> > > > > > > This is caused by too long write to disk.
> > > > > > > 
> > > > > > 
> > > > > > Can I just increase the timeout for now? I will patch the code if 
> > > > > > needed.
> > > > > > On this sever we need enumerate = true ATM, cannot just turn it off.
> > > > > 
> > > > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > > > 
> > > > After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> > > > returned.
> > > > Seems to an additional problem here.
> > > > 
> > > > I don't this AD is that big either:
> > > > # > getent passwd | wc -l
> > > > 3236
> > > > # > getent group | wc -l
> > > > 885
> > > > 
> > > > Any ideas?
> > > 
> > > Can you get a pstack of when the process is 'stuck' ?
> > 
> > Don't know what pstack is ?
> 
> Sorry, it's a utility that prints the backtrace of a process, e.g.:
> pstack $(pidof sssd_be)
> #0  0x7f5fa5ae9db3 in __epoll_wait_nocancel () at 
> ../sysdeps/unix/syscall-template.S:84
> #1  0x7f5fa61ca8ca in epoll_event_loop (tvalp=0x7ffd78977bf0, 
> epoll_ev=0xb44e70) at ../tevent_epoll.c:642 #2  epoll_event_loop_once 
> (ev=, location=) at ../tevent_epoll.c:926
> #3  0x7f5fa61c8f0a in std_event_loop_once (ev=0xb44c30, 
> location=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at 
> ../tevent_standard.c:114
> #4  0x7f5fa61c50e0 in _tevent_loop_once (ev=ev@entry=0xb44c30, 
> location=location@entry=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at 
> ../tevent.c:533
> #5  0x7f5fa61c527b in tevent_common_loop_wait (ev=0xb44c30, 
> location=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at ../tevent.c:637
> #6  0x7f5fa61c8e9a in std_event_loop_wait (ev=0xb44c30, 
> location=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at 
> ../tevent_standard.c:140
> #7  0x7f5faa173f10 in server_loop (main_ctx=0xb46080) at 
> /sssd/src/util/server.c:719
> #8  0x004093ff in main (argc=8, argv=0x7ffd78978028) at 
> /sssd/src/providers/data_provider_be.c:589
> 
> I don't know about Gentoo, but on RHEL/Fedora, it's part of the gdb
> package.

I see, its not in native Gentoo but can be found in extarnal overlays. Not sure 
this will help though as
sssd is burning CPU when it gets into this state.

 Jocke 
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Joakim Tjernlund
On Mon, 2017-06-12 at 17:51 +0200, Jakub Hrozek wrote:
> On Mon, Jun 12, 2017 at 03:38:28PM +, Joakim Tjernlund wrote:
> > On Mon, 2017-06-12 at 17:32 +0200, Joakim Tjernlund wrote:
> > > On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > > > On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > > > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund 
> > > > > > > > wrote:
> > > > > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > > > > a server.
> > > > > > > > > 
> > > > > > > > > I can see this repeating the domain log:
> > > > > > > > > 
> > > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > [0xf65ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > [0x239cce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > [0x1421ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > > [0x1cb0ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > 
> > > > > > > > This is caused by too long write to disk.
> > > > > > > > 
> > > > > > > 
> > > > > > > Can I just increase the timeout for now? I will patch the code if 
> > > > > > > needed.
> > > > > > > On this sever we need enumerate = true ATM, cannot just turn it 
> > > > > > > off.
> > > > > > 
> > > > > > Oh, sure. The other alternative might be to mount the cache to 
> > > > > > tmpfs.
> > > > > 
> > > > > After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> > > > > returned.
> > > > > Seems to an additional problem here.
> > > > > 
> > > > > I don't this AD is that big either:
> > > > > # > getent passwd | wc -l
> > > > > 3236
> > > > > # > getent group | wc -l
> > > > > 885
> > > > > 
> > > > > Any ideas?
> > > > 
> > > > Can you get a pstack of when the process is 'stuck' ?
> > > > 
> > > > Does increasing the 'timeout' parameter from its default '10' to maybe
> > > > 30 in the domain section help?
> > > 
> > > I see ALOT of this in the log( figured I look before I restart sssd)
> > > 
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] 
> > > [sdap_find_entry_by_origDN] (0x4000): Searching cache for 
> > > [CN=Jovy\20Sena,OU=Sunnyvale,OU=CorpUsers,DC=infinera,DC=com].
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> > > timed event "ltdb_callback": 0x4c28c00
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> > > timed event "ltdb_timeout": 0x4c28cc0
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > Running timer event 0x4c28c00 "ltdb_callback"
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > Destroying timer event 0x4c28cc0 "ltdb_timeout"
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > Ending timer event 0x4c28c00 "ltdb_callback"
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> > > timed event "ltdb_callback": 0x34ccf50
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> > > timed event "ltdb_timeout": 0x34cd0c0
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > Running timer event 0x34ccf50 "ltdb_callback"
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > Destroying timer event 0x34cd0c0 "ltdb_timeout"
> > > 
> > > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > > Ending timer event 0x34ccf50 "ltdb_callback"
> > 
> > After just adding timout = 30 and restarting sssd it still hung. Had to 
> > clear out(saved a copy first)
>

[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Jakub Hrozek
On Mon, Jun 12, 2017 at 03:21:43PM +, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > > a server.
> > > > > > > 
> > > > > > > I can see this repeating the domain log:
> > > > > > > 
> > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context [0xf65ce0] 
> > > > > > > on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x239cce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x1421ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x1cb0ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > 
> > > > > > This is caused by too long write to disk.
> > > > > > 
> > > > > 
> > > > > Can I just increase the timeout for now? I will patch the code if 
> > > > > needed.
> > > > > On this sever we need enumerate = true ATM, cannot just turn it off.
> > > > 
> > > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > > 
> > > After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> > > returned.
> > > Seems to an additional problem here.
> > > 
> > > I don't this AD is that big either:
> > > # > getent passwd | wc -l
> > > 3236
> > > # > getent group | wc -l
> > > 885
> > > 
> > > Any ideas?
> > 
> > Can you get a pstack of when the process is 'stuck' ?
> 
> Don't know what pstack is ?

Sorry, it's a utility that prints the backtrace of a process, e.g.:
pstack $(pidof sssd_be)
#0  0x7f5fa5ae9db3 in __epoll_wait_nocancel () at 
../sysdeps/unix/syscall-template.S:84
#1  0x7f5fa61ca8ca in epoll_event_loop (tvalp=0x7ffd78977bf0, 
epoll_ev=0xb44e70) at ../tevent_epoll.c:642 #2  epoll_event_loop_once 
(ev=, location=) at ../tevent_epoll.c:926
#3  0x7f5fa61c8f0a in std_event_loop_once (ev=0xb44c30, 
location=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at 
../tevent_standard.c:114
#4  0x7f5fa61c50e0 in _tevent_loop_once (ev=ev@entry=0xb44c30, 
location=location@entry=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at 
../tevent.c:533
#5  0x7f5fa61c527b in tevent_common_loop_wait (ev=0xb44c30, 
location=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at ../tevent.c:637
#6  0x7f5fa61c8e9a in std_event_loop_wait (ev=0xb44c30, 
location=0x7f5faa19cbbd "/sssd/src/util/server.c:719") at 
../tevent_standard.c:140
#7  0x7f5faa173f10 in server_loop (main_ctx=0xb46080) at 
/sssd/src/util/server.c:719
#8  0x004093ff in main (argc=8, argv=0x7ffd78978028) at 
/sssd/src/providers/data_provider_be.c:589

I don't know about Gentoo, but on RHEL/Fedora, it's part of the gdb
package.
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Jakub Hrozek
On Mon, Jun 12, 2017 at 03:32:22PM +, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > > a server.
> > > > > > > 
> > > > > > > I can see this repeating the domain log:
> > > > > > > 
> > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context [0xf65ce0] 
> > > > > > > on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x239cce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x1421ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x1cb0ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > 
> > > > > > This is caused by too long write to disk.
> > > > > > 
> > > > > 
> > > > > Can I just increase the timeout for now? I will patch the code if 
> > > > > needed.
> > > > > On this sever we need enumerate = true ATM, cannot just turn it off.
> > > > 
> > > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > > 
> > > After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> > > returned.
> > > Seems to an additional problem here.
> > > 
> > > I don't this AD is that big either:
> > > # > getent passwd | wc -l
> > > 3236
> > > # > getent group | wc -l
> > > 885
> > > 
> > > Any ideas?
> > 
> > Can you get a pstack of when the process is 'stuck' ?
> > 
> > Does increasing the 'timeout' parameter from its default '10' to maybe
> > 30 in the domain section help?
> 
> I see ALOT of this in the log( figured I look before I restart sssd)

Right, this is sssd looking up members for a group it is processing. It
is one of the pieces we need to refactor in the next version, because
the sdap_async_groups.c module can end up looking the same member for
the same group several times during a single group-save operation (IIRC,
this is from memory when I was working on perf enhancement in the
previous version..)

> 
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] 
> [sdap_find_entry_by_origDN] (0x4000): Searching cache for 
> [CN=Jovy\20Sena,OU=Sunnyvale,OU=CorpUsers,DC=infinera,DC=com].
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> timed event "ltdb_callback": 0x4c28c00
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> timed event "ltdb_timeout": 0x4c28cc0
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Running 
> timer event 0x4c28c00 "ltdb_callback"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> Destroying timer event 0x4c28cc0 "ltdb_timeout"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Ending 
> timer event 0x4c28c00 "ltdb_callback"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> timed event "ltdb_callback": 0x34ccf50
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> timed event "ltdb_timeout": 0x34cd0c0
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Running 
> timer event 0x34ccf50 "ltdb_callback"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> Destroying timer event 0x34cd0c0 "ltdb_timeout"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Ending 
> timer event 0x34ccf50 "ltdb_callback"
> ___
> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
___
sssd-users mailing list -- sssd-users@lists.fe

[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Jakub Hrozek
On Mon, Jun 12, 2017 at 03:38:28PM +, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 17:32 +0200, Joakim Tjernlund wrote:
> > On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > > On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > > > a server.
> > > > > > > > 
> > > > > > > > I can see this repeating the domain log:
> > > > > > > > 
> > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > [0xf65ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > [0x239cce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > [0x1421ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > > [0x1cb0ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > 
> > > > > > > This is caused by too long write to disk.
> > > > > > > 
> > > > > > 
> > > > > > Can I just increase the timeout for now? I will patch the code if 
> > > > > > needed.
> > > > > > On this sever we need enumerate = true ATM, cannot just turn it off.
> > > > > 
> > > > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > > > 
> > > > After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> > > > returned.
> > > > Seems to an additional problem here.
> > > > 
> > > > I don't this AD is that big either:
> > > > # > getent passwd | wc -l
> > > > 3236
> > > > # > getent group | wc -l
> > > > 885
> > > > 
> > > > Any ideas?
> > > 
> > > Can you get a pstack of when the process is 'stuck' ?
> > > 
> > > Does increasing the 'timeout' parameter from its default '10' to maybe
> > > 30 in the domain section help?
> > 
> > I see ALOT of this in the log( figured I look before I restart sssd)
> > 
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] 
> > [sdap_find_entry_by_origDN] (0x4000): Searching cache for 
> > [CN=Jovy\20Sena,OU=Sunnyvale,OU=CorpUsers,DC=infinera,DC=com].
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> > timed event "ltdb_callback": 0x4c28c00
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> > timed event "ltdb_timeout": 0x4c28cc0
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Running 
> > timer event 0x4c28c00 "ltdb_callback"
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > Destroying timer event 0x4c28cc0 "ltdb_timeout"
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Ending 
> > timer event 0x4c28c00 "ltdb_callback"
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> > timed event "ltdb_callback": 0x34ccf50
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> > timed event "ltdb_timeout": 0x34cd0c0
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Running 
> > timer event 0x34ccf50 "ltdb_callback"
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> > Destroying timer event 0x34cd0c0 "ltdb_timeout"
> > 
> > (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Ending 
> > timer event 0x34ccf50 "ltdb_callback"
> 
> After just adding timout = 30 and restarting sssd it still hung. Had to clear 
> out(saved a copy first)
^^^
There is a typo here, I wonder if you used the correct spelling in the
config? Also, did you add the option to the domain section?

> the sssd cache as well for normal function.
> 
>  Jocke
> ___
> sssd-users mailing list

[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Joakim Tjernlund
On Mon, 2017-06-12 at 17:32 +0200, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> > On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > > a server.
> > > > > > > 
> > > > > > > I can see this repeating the domain log:
> > > > > > > 
> > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context [0xf65ce0] 
> > > > > > > on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x239cce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x1421ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] 
> > > > > > > (0x0010): A transaction is still active in ldb context 
> > > > > > > [0x1cb0ce0] on /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > 
> > > > > > This is caused by too long write to disk.
> > > > > > 
> > > > > 
> > > > > Can I just increase the timeout for now? I will patch the code if 
> > > > > needed.
> > > > > On this sever we need enumerate = true ATM, cannot just turn it off.
> > > > 
> > > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > > 
> > > After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> > > returned.
> > > Seems to an additional problem here.
> > > 
> > > I don't this AD is that big either:
> > > # > getent passwd | wc -l
> > > 3236
> > > # > getent group | wc -l
> > > 885
> > > 
> > > Any ideas?
> > 
> > Can you get a pstack of when the process is 'stuck' ?
> > 
> > Does increasing the 'timeout' parameter from its default '10' to maybe
> > 30 in the domain section help?
> 
> I see ALOT of this in the log( figured I look before I restart sssd)
> 
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] 
> [sdap_find_entry_by_origDN] (0x4000): Searching cache for 
> [CN=Jovy\20Sena,OU=Sunnyvale,OU=CorpUsers,DC=infinera,DC=com].
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> timed event "ltdb_callback": 0x4c28c00
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> timed event "ltdb_timeout": 0x4c28cc0
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Running 
> timer event 0x4c28c00 "ltdb_callback"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> Destroying timer event 0x4c28cc0 "ltdb_timeout"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Ending 
> timer event 0x4c28c00 "ltdb_callback"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> timed event "ltdb_callback": 0x34ccf50
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added 
> timed event "ltdb_timeout": 0x34cd0c0
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Running 
> timer event 0x34ccf50 "ltdb_callback"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): 
> Destroying timer event 0x34cd0c0 "ltdb_timeout"
> 
> (Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Ending 
> timer event 0x34ccf50 "ltdb_callback"

After just adding timout = 30 and restarting sssd it still hung. Had to clear 
out(saved a copy first)
the sssd cache as well for normal function.

 Jocke
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Joakim Tjernlund
On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > a server.
> > > > > > 
> > > > > > I can see this repeating the domain log:
> > > > > > 
> > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): 
> > > > > > A transaction is still active in ldb context [0xf65ce0] on 
> > > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): 
> > > > > > A transaction is still active in ldb context [0x239cce0] on 
> > > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): 
> > > > > > A transaction is still active in ldb context [0x1421ce0] on 
> > > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): 
> > > > > > A transaction is still active in ldb context [0x1cb0ce0] on 
> > > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > 
> > > > > This is caused by too long write to disk.
> > > > > 
> > > > 
> > > > Can I just increase the timeout for now? I will patch the code if 
> > > > needed.
> > > > On this sever we need enumerate = true ATM, cannot just turn it off.
> > > 
> > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > 
> > After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> > returned.
> > Seems to an additional problem here.
> > 
> > I don't this AD is that big either:
> > # > getent passwd | wc -l
> > 3236
> > # > getent group | wc -l
> > 885
> > 
> > Any ideas?
> 
> Can you get a pstack of when the process is 'stuck' ?
> 
> Does increasing the 'timeout' parameter from its default '10' to maybe
> 30 in the domain section help?

I see ALOT of this in the log( figured I look before I restart sssd)


(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [sdap_find_entry_by_origDN] 
(0x4000): Searching cache for 
[CN=Jovy\20Sena,OU=Sunnyvale,OU=CorpUsers,DC=infinera,DC=com].
(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added timed 
event "ltdb_callback": 0x4c28c00

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added timed 
event "ltdb_timeout": 0x4c28cc0

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Running 
timer event 0x4c28c00 "ltdb_callback"

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Destroying 
timer event 0x4c28cc0 "ltdb_timeout"

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Ending 
timer event 0x4c28c00 "ltdb_callback"

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added timed 
event "ltdb_callback": 0x34ccf50

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Added timed 
event "ltdb_timeout": 0x34cd0c0

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Running 
timer event 0x34ccf50 "ltdb_callback"

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Destroying 
timer event 0x34cd0c0 "ltdb_timeout"

(Mon Jun 12 15:55:09 2017) [sssd[be[infinera.com]]] [ldb] (0x4000): Ending 
timer event 0x34ccf50 "ltdb_callback"
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Joakim Tjernlund
On Mon, 2017-06-12 at 16:01 +0200, Jakub Hrozek wrote:
> On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> > On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > > a server.
> > > > > > 
> > > > > > I can see this repeating the domain log:
> > > > > > 
> > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): 
> > > > > > A transaction is still active in ldb context [0xf65ce0] on 
> > > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): 
> > > > > > A transaction is still active in ldb context [0x239cce0] on 
> > > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): 
> > > > > > A transaction is still active in ldb context [0x1421ce0] on 
> > > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): 
> > > > > > A transaction is still active in ldb context [0x1cb0ce0] on 
> > > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > 
> > > > > This is caused by too long write to disk.
> > > > > 
> > > > 
> > > > Can I just increase the timeout for now? I will patch the code if 
> > > > needed.
> > > > On this sever we need enumerate = true ATM, cannot just turn it off.
> > > 
> > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > 
> > After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> > returned.
> > Seems to an additional problem here.
> > 
> > I don't this AD is that big either:
> > # > getent passwd | wc -l
> > 3236
> > # > getent group | wc -l
> > 885
> > 
> > Any ideas?
> 
> Can you get a pstack of when the process is 'stuck' ?

Don't know what pstack is ?

> 
> Does increasing the 'timeout' parameter from its default '10' to maybe
> 30 in the domain section help?

will try ..
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Jakub Hrozek
On Mon, Jun 12, 2017 at 01:53:27PM +, Joakim Tjernlund wrote:
> On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> > On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > > a server.
> > > > > 
> > > > > I can see this repeating the domain log:
> > > > > 
> > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] 
> > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > > > transaction is still active in ldb context [0xf65ce0] on 
> > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] 
> > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > > > transaction is still active in ldb context [0x239cce0] on 
> > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] 
> > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > > > transaction is still active in ldb context [0x1421ce0] on 
> > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] 
> > > > > [orderly_shutdown] (0x0010): SIGTERM: killing children
> > > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > > > transaction is still active in ldb context [0x1cb0ce0] on 
> > > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > 
> > > > This is caused by too long write to disk.
> > > > 
> > > 
> > > Can I just increase the timeout for now? I will patch the code if needed.
> > > On this sever we need enumerate = true ATM, cannot just turn it off.
> > 
> > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> 
> After mounting a tmpfs this morning on /var/lib/sss/db, the error has 
> returned.
> Seems to an additional problem here.
> 
> I don't this AD is that big either:
> # > getent passwd | wc -l
> 3236
> # > getent group | wc -l
> 885
> 
> Any ideas?

Can you get a pstack of when the process is 'stuck' ?

Does increasing the 'timeout' parameter from its default '10' to maybe
30 in the domain section help?
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Joakim Tjernlund
On Sun, 2017-06-11 at 20:55 +0200, Jakub Hrozek wrote:
> On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> > On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > > both 1.15.2 and git master hangs after less than 24 hour on
> > > > a server.
> > > > 
> > > > I can see this repeating the domain log:
> > > > 
> > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > > > (0x0010): SIGTERM: killing children
> > > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > > transaction is still active in ldb context [0xf65ce0] on 
> > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > > > (0x0010): SIGTERM: killing children
> > > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > > transaction is still active in ldb context [0x239cce0] on 
> > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > > > (0x0010): SIGTERM: killing children
> > > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > > transaction is still active in ldb context [0x1421ce0] on 
> > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > > > (0x0010): SIGTERM: killing children
> > > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > > transaction is still active in ldb context [0x1cb0ce0] on 
> > > > /var/lib/sss/db/cache_infinera.com.ldb
> > > 
> > > This is caused by too long write to disk.
> > > 
> > 
> > Can I just increase the timeout for now? I will patch the code if needed.
> > On this sever we need enumerate = true ATM, cannot just turn it off.
> 
> Oh, sure. The other alternative might be to mount the cache to tmpfs.

After mounting a tmpfs this morning on /var/lib/sss/db, the error has returned.
Seems to an additional problem here.

I don't this AD is that big either:
# > getent passwd | wc -l
3236
# > getent group | wc -l
885

Any ideas?

 Jocke
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread John Hodrien

On Mon, 12 Jun 2017, Joakim Tjernlund wrote:


hmm, isn't "offline" login creds stored here as well? Then having a RAM fs will 
delete
the offline cred's each reboot. Is there a way around this?


You could sync it elsewhere on shutdown perhaps?

So far we've got away with not using tmpfs on machines that need stored
credentials.

jh
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Joakim Tjernlund
On Mon, 2017-06-12 at 10:29 +0200, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 09:19 +0100, John Hodrien wrote:
> > On Sun, 11 Jun 2017, Jakub Hrozek wrote:
> > 
> > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > 
> > I'm an advocate of this method.  With older versions of SSSD, against our
> > relatively large AD, the performance boost from running with tmpfs was
> > immense.  This advantage has been reducing over time, as a normally 
> > configured
> > SSSD's performance has improved greatly in our configuration.
> 
> Testing this now. It is a bit strange that even if you have enumerate = true, 
> the first
> time I do getent group it pauses for a little while even if I wait a few mins 
> for the cache
> to populate.

hmm, isn't "offline" login creds stored here as well? Then having a RAM fs will 
delete
the offline cred's each reboot. Is there a way around this?

 Jocke
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Jakub Hrozek
On Mon, Jun 12, 2017 at 08:29:29AM +, Joakim Tjernlund wrote:
> On Mon, 2017-06-12 at 09:19 +0100, John Hodrien wrote:
> > On Sun, 11 Jun 2017, Jakub Hrozek wrote:
> > 
> > > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> > 
> > I'm an advocate of this method.  With older versions of SSSD, against our
> > relatively large AD, the performance boost from running with tmpfs was
> > immense.  This advantage has been reducing over time, as a normally 
> > configured
> > SSSD's performance has improved greatly in our configuration.
> 
> Testing this now. It is a bit strange that even if you have enumerate = true, 
> the first
> time I do getent group it pauses for a little while even if I wait a few mins 
> for the cache
> to populate.

sssd is blocking for the very first enumeration to avoid replying no or
partial result
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread Joakim Tjernlund
On Mon, 2017-06-12 at 09:19 +0100, John Hodrien wrote:
> On Sun, 11 Jun 2017, Jakub Hrozek wrote:
> 
> > Oh, sure. The other alternative might be to mount the cache to tmpfs.
> 
> I'm an advocate of this method.  With older versions of SSSD, against our
> relatively large AD, the performance boost from running with tmpfs was
> immense.  This advantage has been reducing over time, as a normally configured
> SSSD's performance has improved greatly in our configuration.

Testing this now. It is a bit strange that even if you have enumerate = true, 
the first
time I do getent group it pauses for a little while even if I wait a few mins 
for the cache
to populate.

 Jocke
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-12 Thread John Hodrien

On Sun, 11 Jun 2017, Jakub Hrozek wrote:


Oh, sure. The other alternative might be to mount the cache to tmpfs.


I'm an advocate of this method.  With older versions of SSSD, against our
relatively large AD, the performance boost from running with tmpfs was
immense.  This advantage has been reducing over time, as a normally configured
SSSD's performance has improved greatly in our configuration.

jh
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-11 Thread Jakub Hrozek
On Sat, Jun 10, 2017 at 07:56:47AM +, Joakim Tjernlund wrote:
> On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> > On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > > both 1.15.2 and git master hangs after less than 24 hour on
> > > a server.
> > > 
> > > I can see this repeating the domain log:
> > > 
> > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > > (0x0010): SIGTERM: killing children
> > > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > transaction is still active in ldb context [0xf65ce0] on 
> > > /var/lib/sss/db/cache_infinera.com.ldb
> > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > > (0x0010): SIGTERM: killing children
> > > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > transaction is still active in ldb context [0x239cce0] on 
> > > /var/lib/sss/db/cache_infinera.com.ldb
> > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > > (0x0010): SIGTERM: killing children
> > > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > transaction is still active in ldb context [0x1421ce0] on 
> > > /var/lib/sss/db/cache_infinera.com.ldb
> > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > > (0x0010): SIGTERM: killing children
> > > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > > transaction is still active in ldb context [0x1cb0ce0] on 
> > > /var/lib/sss/db/cache_infinera.com.ldb
> > 
> > This is caused by too long write to disk.
> > 
> 
> Can I just increase the timeout for now? I will patch the code if needed.
> On this sever we need enumerate = true ATM, cannot just turn it off.

Oh, sure. The other alternative might be to mount the cache to tmpfs.

> 
> 
> > > 
> > > Ideas?
> > 
> > Disable enumeration or move the cache to tmpfs. Enumeration won't work
> > well with large domains, sorry.
> 
> And never will?

We are doing incremental performance improvements. There is a round
planned for the next upstream version, I'm afraid we don't have any
patches yet, but me and Sumit have been throwing ideas around, so we
already know what to do. But please keep in mind that enumerating a
large forest amounts to keeping a local replica which is going to be
costly..
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-10 Thread Joakim Tjernlund
On Sat, 2017-06-10 at 08:24 +0200, Jakub Hrozek wrote:
> On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> > both 1.15.2 and git master hangs after less than 24 hour on
> > a server.
> > 
> > I can see this repeating the domain log:
> > 
> > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > (0x0010): SIGTERM: killing children
> > (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > transaction is still active in ldb context [0xf65ce0] on 
> > /var/lib/sss/db/cache_infinera.com.ldb
> > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > (0x0010): SIGTERM: killing children
> > (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > transaction is still active in ldb context [0x239cce0] on 
> > /var/lib/sss/db/cache_infinera.com.ldb
> > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > (0x0010): SIGTERM: killing children
> > (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > transaction is still active in ldb context [0x1421ce0] on 
> > /var/lib/sss/db/cache_infinera.com.ldb
> > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> > (0x0010): SIGTERM: killing children
> > (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> > transaction is still active in ldb context [0x1cb0ce0] on 
> > /var/lib/sss/db/cache_infinera.com.ldb
> 
> This is caused by too long write to disk.
> 

Can I just increase the timeout for now? I will patch the code if needed.
On this sever we need enumerate = true ATM, cannot just turn it off.


> > 
> > Ideas?
> 
> Disable enumeration or move the cache to tmpfs. Enumeration won't work
> well with large domains, sorry.

And never will?

 Jocke
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org


[SSSD-users] Re: login hangs with enumerate = true

2017-06-09 Thread Jakub Hrozek
On Fri, Jun 09, 2017 at 04:28:45PM +, Joakim Tjernlund wrote:
> both 1.15.2 and git master hangs after less than 24 hour on
> a server.
> 
> I can see this repeating the domain log:
> 
> (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> (0x0010): SIGTERM: killing children
> (Fri Jun  9 18:21:49 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> transaction is still active in ldb context [0xf65ce0] on 
> /var/lib/sss/db/cache_infinera.com.ldb
> (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> (0x0010): SIGTERM: killing children
> (Fri Jun  9 18:22:42 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> transaction is still active in ldb context [0x239cce0] on 
> /var/lib/sss/db/cache_infinera.com.ldb
> (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> (0x0010): SIGTERM: killing children
> (Fri Jun  9 18:23:35 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> transaction is still active in ldb context [0x1421ce0] on 
> /var/lib/sss/db/cache_infinera.com.ldb
> (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [orderly_shutdown] 
> (0x0010): SIGTERM: killing children
> (Fri Jun  9 18:24:28 2017) [sssd[be[infinera.com]]] [ldb] (0x0010): A 
> transaction is still active in ldb context [0x1cb0ce0] on 
> /var/lib/sss/db/cache_infinera.com.ldb

This is caused by too long write to disk.

> 
> Ideas?

Disable enumeration or move the cache to tmpfs. Enumeration won't work
well with large domains, sorry.
___
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org