Hi All, Reading about the recent locking-issues that some ppl are seeing, i thought i'd contribute a little (mostly data) :-)
Since some time (don't know exactly when it started, but it seems to be since somewhere this year....), we're seeing an increase in difference between locks requested and locks released when doing db_stat -c -h /var/amavis/db Just like other reports, we saw things like Feb 24 22:46:45 hostname amavis[11739]: (!!)TROUBLE in child_init_hook: BDB no dbS: Lock table is out of available locker entries, . at (eval 61) line 30. Feb 24 22:46:45 hostname amavis[11739]: (!)_DIE: Suicide in child_init_hook: BDB no dbS: Lock table is out of available locker entries, . at (eval 61) line 30.\n (quite a lot, actually) We started monitoring the request/release difference with: watch -n 1 'req=""; rel=""; db_stat -c -h /var/amavis/db/ | grep "Total number of locks re" | cut -f1 | while read nr; do if [ "$req" == "" ]; then req=$nr; else rel=$nr; echo "$req - $rel" | bc -l ; fi; done' Just to give an impression of how fast this difference is increasing, here's a short snippet with timestamps, right after a restart of amavis: Thu Feb 28 11:53:30 CET 2008 132 Thu Feb 28 11:53:31 CET 2008 132 Thu Feb 28 11:53:32 CET 2008 132 Thu Feb 28 11:53:33 CET 2008 226 Thu Feb 28 11:53:34 CET 2008 291 Thu Feb 28 11:53:35 CET 2008 374 Thu Feb 28 11:53:36 CET 2008 377 Thu Feb 28 11:53:37 CET 2008 435 Thu Feb 28 11:53:38 CET 2008 489 Thu Feb 28 11:53:39 CET 2008 552 Thu Feb 28 11:53:40 CET 2008 558 Thu Feb 28 11:53:41 CET 2008 651 Thu Feb 28 11:53:42 CET 2008 792 Thu Feb 28 11:53:43 CET 2008 905 Thu Feb 28 11:53:44 CET 2008 946 Thu Feb 28 11:53:45 CET 2008 1167 Thu Feb 28 11:53:46 CET 2008 1237 Thu Feb 28 11:53:48 CET 2008 1337 Thu Feb 28 11:53:49 CET 2008 1599 Thu Feb 28 11:53:51 CET 2008 1742 Thu Feb 28 11:53:52 CET 2008 1817 Thu Feb 28 11:53:53 CET 2008 1881 Thu Feb 28 11:53:54 CET 2008 1910 Thu Feb 28 11:53:55 CET 2008 2027 Thu Feb 28 11:53:56 CET 2008 2116 Thu Feb 28 11:53:57 CET 2008 2290 Thu Feb 28 11:53:58 CET 2008 2379 Thu Feb 28 11:53:59 CET 2008 2445 Thu Feb 28 11:54:00 CET 2008 2503 Thu Feb 28 11:54:01 CET 2008 2560 Thu Feb 28 11:54:02 CET 2008 2648 Thu Feb 28 11:54:03 CET 2008 2652 Thu Feb 28 11:54:04 CET 2008 2652 Thu Feb 28 11:54:05 CET 2008 2652 This was done with a fresh start of amavis. Postfix was started at 11:53:32 and stopped at 11:54:03. During that time, 79 mails we're passed to amavis (4x clean, 5x spammy, 70x spam). We've not seen any process crashes of any sort, btw. amavis-nanny is reporting that everything seems fine... However, an increase of 2520 locks that were requested but not released over just 79 mails.... that's almost 32 per email ?! We have no clue where to look anymore (spitted through lots of loglevel=5 logs and found nothing strange....), so if you guys can provide any pointers about where to look or how to investigate further, please do ! :) OK, now for the version-info: We're running CentOS 4, fully up-to-date. postfix-2.2.10-1.1.el4 amavisd-new-2.5.3-2.el4.rf spamassassin-3.2.4-1.el4.rf db4-4.2.52-7.3.el4 perl-BerkeleyDB-0.33-1.el4.rf Furthermore, we've patched amavis with the patch that Mark sent yesterday to the list. That didn't change anything, from the looks of it... And finally, just to add even more to this long email, the output from db_stat -c -h /var/amavis/db/ at this time of writing (quite a lot of emails later then the above stats...) All seems fine to me, except for the large difference in requested/releases locks... 536 Last allocated locker ID. 2147M Current maximum unused locker ID. 5 Number of lock modes. 1000 Maximum number of locks possible. 1000 Maximum number of lockers possible. 1000 Maximum number of lock objects possible. 100 Number of current locks. 105 Maximum number of locks at any one time. 200 Number of current lockers. 200 Maximum number of lockers at any one time. 4 Number of current lock objects. 6 Maximum number of lock objects at any one time. 109549 Total number of locks requested. 76738 Total number of locks released. 0 Total number of lock requests failing because DB_LOCK_NOWAIT was set. 41 Total number of locks not immediately available due to conflicts. 0 Number of deadlocks. 0 Lock timeout value. 0 Number of locks that have timed out. 0 Transaction timeout value. 0 Number of transactions that have timed out. 440KB The size of the lock region.. 22 The number of region locks granted after waiting. 221743 The number of region locks granted without waiting. So, can anyone shine some light on this, please ? And if anyone has a suggestion on how to capture an email that might me causing this behaviour: we haven't been able to do that (it's quite a lot email that passes through this host, so hard to do by hand :) ) Cheers, Taco. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/