Taco, > Reading about the recent locking-issues that some ppl are seeing, i > thought i'd contribute a little (mostly data) :-) > > Since some time (don't know exactly when it started, but it seems to be > since somewhere this year....), we're seeing an increase in difference > between locks requested and locks released when doing db_stat -c -h > /var/amavis/db > > Just like other reports, we saw things like > Feb 24 22:46:45 hostname amavis[11739]: (!!)TROUBLE in child_init_hook: > BDB no dbS: Lock table is out of available locker entries ... > (quite a lot, actually) > > We started monitoring the request/release difference with: > > watch -n 1 'req=""; rel=""; db_stat -c -h /var/amavis/db/ | grep "Total > number of locks re" | cut -f1 | while read nr; do if [ "$req" == "" ]; then > req=$nr; else rel=$nr; echo "$req - $rel" | bc -l ; fi; done'
That is a good approach. > Just to give an impression of how fast this difference is increasing, > here's a short snippet with timestamps, right after a restart of amavis: > > Thu Feb 28 11:53:30 CET 2008 > 132 > Thu Feb 28 11:53:31 CET 2008 > 132 > Thu Feb 28 11:53:32 CET 2008 > 132 > Thu Feb 28 11:53:33 CET 2008 > 226 > Thu Feb 28 11:53:34 CET 2008 > 291 > Thu Feb 28 11:53:35 CET 2008 > 374 > Thu Feb 28 11:53:36 CET 2008 > 377 > Thu Feb 28 11:53:37 CET 2008 > 435 > Thu Feb 28 11:53:38 CET 2008 > 489 > Thu Feb 28 11:53:39 CET 2008 > 552 > Thu Feb 28 11:53:40 CET 2008 > 558 > Thu Feb 28 11:53:41 CET 2008 > 651 > Thu Feb 28 11:53:42 CET 2008 > 792 > Thu Feb 28 11:53:43 CET 2008 > 905 > Thu Feb 28 11:53:44 CET 2008 > 946 > Thu Feb 28 11:53:45 CET 2008 > 1167 > Thu Feb 28 11:53:46 CET 2008 > 1237 > Thu Feb 28 11:53:48 CET 2008 > 1337 > Thu Feb 28 11:53:49 CET 2008 > 1599 > Thu Feb 28 11:53:51 CET 2008 > 1742 > Thu Feb 28 11:53:52 CET 2008 > 1817 > Thu Feb 28 11:53:53 CET 2008 > 1881 > Thu Feb 28 11:53:54 CET 2008 > 1910 > Thu Feb 28 11:53:55 CET 2008 > 2027 > Thu Feb 28 11:53:56 CET 2008 > 2116 > Thu Feb 28 11:53:57 CET 2008 > 2290 > Thu Feb 28 11:53:58 CET 2008 > 2379 > Thu Feb 28 11:53:59 CET 2008 > 2445 > Thu Feb 28 11:54:00 CET 2008 > 2503 > Thu Feb 28 11:54:01 CET 2008 > 2560 > Thu Feb 28 11:54:02 CET 2008 > 2648 > Thu Feb 28 11:54:03 CET 2008 > 2652 > Thu Feb 28 11:54:04 CET 2008 > 2652 > Thu Feb 28 11:54:05 CET 2008 > 2652 > > This was done with a fresh start of amavis. Postfix was started at > 11:53:32 and stopped at 11:54:03. During that time, 79 mails we're > passed to amavis (4x clean, 5x spammy, 70x spam). > > We've not seen any process crashes of any sort, btw. amavis-nanny is > reporting that everything seems fine... However, an increase of 2520 locks > that were requested but not released over just 79 mails.... that's almost > 32 per email ?! Perhaps this could explain it: bdb 4.4.20 Change Log: Concurrent Data Store Changes: 1. Lock upgrades and downgrades are now accounted for separately from lock requests and releases. [#11155] In fact I see the following at our site (using db 4.6.21 from ports, FreeBSD): $ db_stat-4.6 -c -h /var/amavis/db/ 17802 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 5 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 100 Number of current locks 119 Maximum number of locks at any one time 125 Number of current lockers 127 Maximum number of lockers at any one time 4 Number of current lock objects 8 Maximum number of lock objects at any one time 12M Total number of locks requested (12353027) 12M Total number of locks released (12352927) 5274871 Total number of locks upgraded 10M Total number of locks downgraded (10578206) 9878 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 0 Number of deadlocks 0 Lock timeout value 0 Number of locks that have timed out 0 Transaction timeout value 0 Number of transactions that have timed out 464KB The size of the lock region 3915 The number of region locks that required waiting (0%) (and I'm not seing any bdb lock issues - I searched the log for the last few months) So if 'Total number of locks upgraded/downgraded' were counted against locks requested/released, I'd probably see something like you are seing with db 4.2.52. > We have no clue where to look anymore (spitted through lots of loglevel=5 > logs and found nothing strange....), so if you guys can provide any > pointers about where to look or how to investigate further, please do ! :) > > OK, now for the version-info: > > We're running CentOS 4, fully up-to-date. > postfix-2.2.10-1.1.el4 > amavisd-new-2.5.3-2.el4.rf > spamassassin-3.2.4-1.el4.rf > db4-4.2.52-7.3.el4 > perl-BerkeleyDB-0.33-1.el4.rf I'd suggest upgrading bdb to 4.4.20 (or at least to 4.3.29). > Furthermore, we've patched amavis with the patch that Mark sent yesterday > to the list. That didn't change anything, from the looks of it... Good. Ensuring there are no crashes is a pre-requisite to further bdb troubleshooting, otherwise anything is possible. > And finally, just to add even more to this long email, the output from > db_stat -c -h /var/amavis/db/ at this time of writing (quite a lot of > emails later then the above stats...) All seems fine to me, except for the > large difference in requested/releases locks... > > 536 Last allocated locker ID. > 2147M Current maximum unused locker ID. > 5 Number of lock modes. > 1000 Maximum number of locks possible. > 1000 Maximum number of lockers possible. > 1000 Maximum number of lock objects possible. > 100 Number of current locks. > 105 Maximum number of locks at any one time. > 200 Number of current lockers. > 200 Maximum number of lockers at any one time. > 4 Number of current lock objects. > 6 Maximum number of lock objects at any one time. > 109549 Total number of locks requested. > 76738 Total number of locks released. > 0 Total number of lock requests failing because DB_LOCK_NOWAIT > was set. > 41 Total number of locks not immediately available > due to conflicts. > 0 Number of deadlocks. > 0 Lock timeout value. > 0 Number of locks that have timed out. > 0 Transaction timeout value. > 0 Number of transactions that have timed out. > 440KB The size of the lock region.. > 22 The number of region locks granted after waiting. > 221743 The number of region locks granted without waiting. > > So, can anyone shine some light on this, please ? > And if anyone has a suggestion on how to capture an email that might me > causing this behaviour: we haven't been able to do that (it's quite a lot > email that passes through this host, so hard to do by hand :) ) Enabling a clean quarantine, or an archival quarantine would be one possibility. Here are some possibly relevant entries in bdb release notes: http://www.oracle.com/technology/software/products/berkeley-db/db/ 4.3.29 Concurrent Data Store Changes: Fix cursor locks for environments opened without DB_THREAD so that they use the same locker ID. This eliminates many common cases of application self-deadlock, particularly in CDS. [#9742] Queue Access Method Changes: 5 Fix a bug where the Queue access method might fail to release a record lock when running without transactions. [#9487] Locking Subsystem Changes: 11 Fix a bug that would fail to upgrade a write lock when moving a cursor off a previously deleted record. [#11042] Utility Changes: 4 Fix a bug in the locking statistics where “current locks” included failed lock requests. [#9314] 4.4.20 Concurrent Data Store Changes: 1. Lock upgrades and downgrades are now accounted for separately from lock requests and releases. [#11155] Queue Access Method Changes: 2. Fix a bug where updates that do not use transactions, but do enable locking, failed to release locks. [#11669] Mark ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/