Hi All,

Reading about the recent locking-issues that some ppl are seeing, i 
thought i'd contribute a little (mostly data) :-)

Since some time (don't know exactly when it started, but it seems to be 
since somewhere this year....), we're seeing an increase in difference 
between locks requested and locks released when doing db_stat -c -h 
/var/amavis/db

Just like other reports, we saw things like

Feb 24 22:46:45 hostname amavis[11739]: (!!)TROUBLE in child_init_hook: BDB no 
dbS: Lock table is out of available locker entries, . at (eval 61) line 30.
Feb 24 22:46:45 hostname amavis[11739]: (!)_DIE: Suicide in child_init_hook: 
BDB no dbS: Lock table is out of available locker entries, . at (eval 61) line 
30.\n

(quite a lot, actually)

We started monitoring the request/release difference with:

watch -n 1 'req=""; rel=""; db_stat -c -h /var/amavis/db/ | grep "Total number 
of locks re" | cut -f1 | while read nr; do if [ "$req" == "" ]; then req=$nr; 
else rel=$nr; echo "$req - $rel" | bc -l ; fi; done'

Just to give an impression of how fast this difference is increasing, 
here's a short snippet with timestamps, right after a restart of amavis:

Thu Feb 28 11:53:30 CET 2008
132
Thu Feb 28 11:53:31 CET 2008
132
Thu Feb 28 11:53:32 CET 2008
132
Thu Feb 28 11:53:33 CET 2008
226
Thu Feb 28 11:53:34 CET 2008
291
Thu Feb 28 11:53:35 CET 2008
374
Thu Feb 28 11:53:36 CET 2008
377
Thu Feb 28 11:53:37 CET 2008
435
Thu Feb 28 11:53:38 CET 2008
489
Thu Feb 28 11:53:39 CET 2008
552
Thu Feb 28 11:53:40 CET 2008
558
Thu Feb 28 11:53:41 CET 2008
651
Thu Feb 28 11:53:42 CET 2008
792
Thu Feb 28 11:53:43 CET 2008
905
Thu Feb 28 11:53:44 CET 2008
946
Thu Feb 28 11:53:45 CET 2008
1167
Thu Feb 28 11:53:46 CET 2008
1237
Thu Feb 28 11:53:48 CET 2008
1337
Thu Feb 28 11:53:49 CET 2008
1599
Thu Feb 28 11:53:51 CET 2008
1742
Thu Feb 28 11:53:52 CET 2008
1817
Thu Feb 28 11:53:53 CET 2008
1881
Thu Feb 28 11:53:54 CET 2008
1910
Thu Feb 28 11:53:55 CET 2008
2027
Thu Feb 28 11:53:56 CET 2008
2116
Thu Feb 28 11:53:57 CET 2008
2290
Thu Feb 28 11:53:58 CET 2008
2379
Thu Feb 28 11:53:59 CET 2008
2445
Thu Feb 28 11:54:00 CET 2008
2503
Thu Feb 28 11:54:01 CET 2008
2560
Thu Feb 28 11:54:02 CET 2008
2648
Thu Feb 28 11:54:03 CET 2008
2652
Thu Feb 28 11:54:04 CET 2008
2652
Thu Feb 28 11:54:05 CET 2008
2652



This was done with a fresh start of amavis. Postfix was started at 
11:53:32 and stopped at 11:54:03. During that time, 79 mails we're 
passed to amavis (4x clean, 5x spammy, 70x spam).

We've not seen any process crashes of any sort, btw. amavis-nanny is 
reporting that everything seems fine... However, an increase of 2520 locks 
that were requested but not released over just 79 mails.... that's almost 
32 per email ?!

We have no clue where to look anymore (spitted through lots of loglevel=5 
logs and found nothing strange....), so if you guys can provide any 
pointers about where to look or how to investigate further, please do ! :)

OK, now for the version-info:

We're running CentOS 4, fully up-to-date.
postfix-2.2.10-1.1.el4
amavisd-new-2.5.3-2.el4.rf
spamassassin-3.2.4-1.el4.rf
db4-4.2.52-7.3.el4
perl-BerkeleyDB-0.33-1.el4.rf

Furthermore, we've patched amavis with the patch that Mark sent yesterday 
to the list. That didn't change anything, from the looks of it...


And finally, just to add even more to this long email, the output from 
db_stat -c -h /var/amavis/db/ at this time of writing (quite a lot of 
emails later then the above stats...) All seems fine to me, except for the 
large difference in requested/releases locks...

536     Last allocated locker ID.
2147M   Current maximum unused locker ID.
5       Number of lock modes.
1000    Maximum number of locks possible.
1000    Maximum number of lockers possible.
1000    Maximum number of lock objects possible.
100     Number of current locks.
105     Maximum number of locks at any one time.
200     Number of current lockers.
200     Maximum number of lockers at any one time.
4       Number of current lock objects.
6       Maximum number of lock objects at any one time.
109549  Total number of locks requested.
76738   Total number of locks released.
0       Total number of lock requests failing because DB_LOCK_NOWAIT was set.
41      Total number of locks not immediately available due to conflicts.
0       Number of deadlocks.
0       Lock timeout value.
0       Number of locks that have timed out.
0       Transaction timeout value.
0       Number of transactions that have timed out.
440KB   The size of the lock region..
22      The number of region locks granted after waiting.
221743  The number of region locks granted without waiting.



So, can anyone shine some light on this, please ?
And if anyone has a suggestion on how to capture an email that might me 
causing this behaviour: we haven't been able to do that (it's quite a lot 
email that passes through this host, so hard to do by hand :) )


Cheers,

Taco.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to