FreeBSD 9.1 - openldap slapd lockups, mutex problems

2013-01-22 Thread Kai Gallasch
Hi.

(Im am sending this to the "stable" list, because it maybe kernel related.. )

On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon.

The slapd runs for some days and then hangs, consuming high amounts of CPU.
In this state slapd can only be restarted by SIGKILL.

 # procstat -kk 71195
  PIDTID COMM TDNAME   KSTACK   
71195 149271 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d do_wait+0x678 
__umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 194998 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _cv_wait_sig+0x12e 
seltdwait+0x110 kern_select+0x6ef sys_select+0x5d amd64_syscall+0x546 
Xfast_syscall+0xf7 
71195 195544 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 196183 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_timedwait_sig+0x19 _sleep+0x2d4 userret+0x9e 
doreti_ast+0x1f 
71195 197966 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 198446 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 198453 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 198563 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 199520 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 200038 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 200670 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 200674 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 200675 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201179 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201180 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201181 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201183 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7 
71195 201189 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d 
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 
amd64_syscall+0x546 Xfast_syscall+0xf7

When I try to stop slapd through the rc script I can see in the logs that the 
process is waiting for a thread to terminate - indefinitely.
Other multithreaded server processes running on the server without problems 
(apache-worker, mysqld, bind, etc.)
On UFS2 slapd runs fine, without showing the error.


Things I have tried already to stop the lockups:

- running openldap-server23, openldap24 both with different BDB backend 
versions.
- tuning the BDB Init File
- reducin

Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems

2013-01-22 Thread Adam McDougall

On 01/22/13 05:19, Kai Gallasch wrote:

Hi.

(Im am sending this to the "stable" list, because it maybe kernel related.. )

On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon.

The slapd runs for some days and then hangs, consuming high amounts of CPU.
In this state slapd can only be restarted by SIGKILL.

  # procstat -kk 71195
   PIDTID COMM TDNAME   KSTACK
71195 149271 slapd-mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d do_wait+0x678 
__umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7



On UFS2 slapd runs fine, without showing the error.
Has anyone else running openldap-server on FreeBSD 9.1 inside a jail seen 
similar problems?


I have seen openldap spin the cpu and even run out of memory to get 
killed on some of our test systems running ~9.1-rel with zfs.  No jails.
I'm not sure what would have put load on our test systems other than 
nightly scripts.  I had to focus my attention on other servers so I 
don't have one to inspect at this point, but I won't be surprised if I 
see this in production.  Thanks for the tip about it being ZFS related, 
and I'll let you know if I find anything out.  This is mostly a "me too" 
reply.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems

2013-02-13 Thread Pierre Guinoiseau
On 22/01/2013 10:55:48, Adam McDougall  wrote:

> On 01/22/13 05:19, Kai Gallasch wrote:
> > Hi.
> >
> > (Im am sending this to the "stable" list, because it maybe kernel related.. 
> > )
> >
> > On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon.
> >
> > The slapd runs for some days and then hangs, consuming high amounts of CPU.
> > In this state slapd can only be restarted by SIGKILL.
> >
> >   # procstat -kk 71195
> >PIDTID COMM TDNAME   KSTACK
> > 71195 149271 slapd-mi_switch+0x186 
> > sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d do_wait+0x678 
> > __umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7
> 
> > On UFS2 slapd runs fine, without showing the error.
> > Has anyone else running openldap-server on FreeBSD 9.1 inside a jail seen 
> > similar problems?
> 
> I have seen openldap spin the cpu and even run out of memory to get 
> killed on some of our test systems running ~9.1-rel with zfs.  No jails.
> I'm not sure what would have put load on our test systems other than 
> nightly scripts.  I had to focus my attention on other servers so I 
> don't have one to inspect at this point, but I won't be surprised if I 
> see this in production.  Thanks for the tip about it being ZFS related, 
> and I'll let you know if I find anything out.  This is mostly a "me too" 
> reply.

Hi,

I've the same problem too, inside a jail, stored on ZFS. I've tried various
tuning in slapd.conf, but none fixed the problem. While hanging, db_stat -c
shows that all locks are being used, I've tried to set the limit really high,
far more than normally needed, but it didn't help. I may have the same problem
with amavisd-new but I've to verify that to be sure the symptoms are similar.

I had no problem at all with the same setup on FreeBSD 8.2R, it was my most
stable service back then. I've not tried with 9.0R.



pgpyxbv1qGu2i.pgp
Description: PGP signature


Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems

2013-02-14 Thread Oliver Brandmueller
Hi,

On Thu, Feb 14, 2013 at 03:13:57AM +0100, Pierre Guinoiseau wrote:
> > I have seen openldap spin the cpu and even run out of memory to get 
> > killed on some of our test systems running ~9.1-rel with zfs.
[...]
> I've the same problem too, inside a jail, stored on ZFS. I've tried various
> tuning in slapd.conf, but none fixed the problem. While hanging, db_stat -c
> shows that all locks are being used, I've tried to set the limit really high,
> far more than normally needed, but it didn't help. I may have the same problem
> with amavisd-new but I've to verify that to be sure the symptoms are similar.

I have amd64 9.1-STABLE r245456 (about Jan 15) running. I have openldap 
openldap-server-2.4.33_2 running, depending on libltdl-2.4.2 and 
db46-4.6.21.4 .

The system is zfs only (for the local filesystems, where openldap is 
running - it has several NFS mounts for other purposes though). It's up 
and running for about a month now (29 days) and never showed any 
problematic behaviour regarding to slapd.

I have ~10 SEARCH requests per seconds avg and only minor 
ADD/MODIFY/DELETE operations. It has several binds und unbinds, about 
1/10th of the requests. It runs in slurpd slave mode for my master LDAP.

zroot/var/db runs with compression=off, dedup=off, zroot is a mirrored 
pool on 2 Intel SATA SSD drives inside a GPT partition. Swap is on a ZFS 
zvol.

- Oliver


-- 
| Oliver Brandmueller  http://sysadm.in/ o...@sysadm.in |
|Ich bin das Internet. Sowahr ich Gott helfe. |


pgplZkz_4YApY.pgp
Description: PGP signature


Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems

2013-03-04 Thread Pierre Guinoiseau
Hi,

I've tested it in a 8.3R jail on a 9.1R host, same setup, and the problem is
still there. So it may be a kernel bug on 9.1R.

On 14/02/2013 10:19:45, Oliver Brandmueller  wrote:

> Hi,
> 
> On Thu, Feb 14, 2013 at 03:13:57AM +0100, Pierre Guinoiseau wrote:
> > > I have seen openldap spin the cpu and even run out of memory to get 
> > > killed on some of our test systems running ~9.1-rel with zfs.
> [...]
> > I've the same problem too, inside a jail, stored on ZFS. I've tried various
> > tuning in slapd.conf, but none fixed the problem. While hanging, db_stat -c
> > shows that all locks are being used, I've tried to set the limit really 
> > high,
> > far more than normally needed, but it didn't help. I may have the same 
> > problem
> > with amavisd-new but I've to verify that to be sure the symptoms are 
> > similar.
> 
> I have amd64 9.1-STABLE r245456 (about Jan 15) running. I have openldap 
> openldap-server-2.4.33_2 running, depending on libltdl-2.4.2 and 
> db46-4.6.21.4 .
> 
> The system is zfs only (for the local filesystems, where openldap is 
> running - it has several NFS mounts for other purposes though). It's up 
> and running for about a month now (29 days) and never showed any 
> problematic behaviour regarding to slapd.
> 
> I have ~10 SEARCH requests per seconds avg and only minor 
> ADD/MODIFY/DELETE operations. It has several binds und unbinds, about 
> 1/10th of the requests. It runs in slurpd slave mode for my master LDAP.
> 
> zroot/var/db runs with compression=off, dedup=off, zroot is a mirrored 
> pool on 2 Intel SATA SSD drives inside a GPT partition. Swap is on a ZFS 
> zvol.
> 
> - Oliver
> 
> 
> -- 
> | Oliver Brandmueller  http://sysadm.in/ o...@sysadm.in |
> |Ich bin das Internet. Sowahr ich Gott helfe. |




pgp8DOT5kXi6a.pgp
Description: PGP signature


Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems

2013-03-25 Thread Kai Gallasch
Am 22.01.2013 um 11:19 schrieb Kai Gallasch:
> Hi.
> 
> (Im am sending this to the "stable" list, because it maybe kernel related.. )
> 
> On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon.
> 
> The slapd runs for some days and then hangs, consuming high amounts of CPU.
> In this state slapd can only be restarted by SIGKILL.


short update:

I tried all I could to isolate the problem.

What I am certain of is that the problem lies in the BDB backend for openldap 
itself, or how slapd interacts with it.
My knowledge is not sufficient to debug BDB itself, it appears to be quite a 
complex gearbox.

Also - as a sidenote - I had to learn that the new owner of BDB (orcle) does 
not give a toss about keeping old links to BDB documentation intact (for 
example informattion you'd need to tune your BDB - "DB_CONFIG" - or understand 
it better.)

In the end I decided to drop the BDB backend for my running slapd installations 
and switch over to MDB[1,2] as backend and since then the problems disappeared.

So if you are plagued by the same slapd lockups and rely on your ldap 
directory, switching backends will give you some peace of mind.

Thanks to all who have replied! And thanks to sleepycat for all the fish.

Kai Gallasch.


[1] http://manpages.ubuntu.com/manpages/precise/man5/slapd-mdb.5.html
[2] http://www.openldap.org/doc/admin24/backends.html
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"