Taco,

> Reading about the recent locking-issues that some ppl are seeing, i
> thought i'd contribute a little (mostly data) :-)
>
> Since some time (don't know exactly when it started, but it seems to be
> since somewhere this year....), we're seeing an increase in difference
> between locks requested and locks released when doing db_stat -c -h
> /var/amavis/db
>
> Just like other reports, we saw things like
>   Feb 24 22:46:45 hostname amavis[11739]: (!!)TROUBLE in child_init_hook:
>    BDB no dbS: Lock table is out of available locker entries ...
> (quite a lot, actually)
>
> We started monitoring the request/release difference with:
>
> watch -n 1 'req=""; rel=""; db_stat -c -h /var/amavis/db/ | grep "Total
> number of locks re" | cut -f1 | while read nr; do if [ "$req" == "" ]; then
> req=$nr; else rel=$nr; echo "$req - $rel" | bc -l ; fi; done'

That is a good approach.

> Just to give an impression of how fast this difference is increasing,
> here's a short snippet with timestamps, right after a restart of amavis:
>
> Thu Feb 28 11:53:30 CET 2008 > 132
> Thu Feb 28 11:53:31 CET 2008 > 132
> Thu Feb 28 11:53:32 CET 2008 > 132
> Thu Feb 28 11:53:33 CET 2008 > 226
> Thu Feb 28 11:53:34 CET 2008 > 291
> Thu Feb 28 11:53:35 CET 2008 > 374
> Thu Feb 28 11:53:36 CET 2008 > 377
> Thu Feb 28 11:53:37 CET 2008 > 435
> Thu Feb 28 11:53:38 CET 2008 > 489
> Thu Feb 28 11:53:39 CET 2008 > 552
> Thu Feb 28 11:53:40 CET 2008 > 558
> Thu Feb 28 11:53:41 CET 2008 > 651
> Thu Feb 28 11:53:42 CET 2008 > 792
> Thu Feb 28 11:53:43 CET 2008 > 905
> Thu Feb 28 11:53:44 CET 2008 > 946
> Thu Feb 28 11:53:45 CET 2008 > 1167
> Thu Feb 28 11:53:46 CET 2008 > 1237
> Thu Feb 28 11:53:48 CET 2008 > 1337
> Thu Feb 28 11:53:49 CET 2008 > 1599
> Thu Feb 28 11:53:51 CET 2008 > 1742
> Thu Feb 28 11:53:52 CET 2008 > 1817
> Thu Feb 28 11:53:53 CET 2008 > 1881
> Thu Feb 28 11:53:54 CET 2008 > 1910
> Thu Feb 28 11:53:55 CET 2008 > 2027
> Thu Feb 28 11:53:56 CET 2008 > 2116
> Thu Feb 28 11:53:57 CET 2008 > 2290
> Thu Feb 28 11:53:58 CET 2008 > 2379
> Thu Feb 28 11:53:59 CET 2008 > 2445
> Thu Feb 28 11:54:00 CET 2008 > 2503
> Thu Feb 28 11:54:01 CET 2008 > 2560
> Thu Feb 28 11:54:02 CET 2008 > 2648
> Thu Feb 28 11:54:03 CET 2008 > 2652
> Thu Feb 28 11:54:04 CET 2008 > 2652
> Thu Feb 28 11:54:05 CET 2008 > 2652
>
> This was done with a fresh start of amavis. Postfix was started at
> 11:53:32 and stopped at 11:54:03. During that time, 79 mails we're
> passed to amavis (4x clean, 5x spammy, 70x spam).
>
> We've not seen any process crashes of any sort, btw. amavis-nanny is
> reporting that everything seems fine... However, an increase of 2520 locks
> that were requested but not released over just 79 mails.... that's almost
> 32 per email ?!

Perhaps this could explain it:

bdb 4.4.20 Change Log:

Concurrent Data Store Changes:
1. Lock upgrades and downgrades are now accounted for separately
   from lock requests and releases. [#11155]


In fact I see the following at our site (using db 4.6.21
from ports, FreeBSD):

$ db_stat-4.6 -c -h /var/amavis/db/
17802   Last allocated locker ID
0x7fffffff      Current maximum unused locker ID
5       Number of lock modes
1000    Maximum number of locks possible
1000    Maximum number of lockers possible
1000    Maximum number of lock objects possible
100     Number of current locks
119     Maximum number of locks at any one time
125     Number of current lockers
127     Maximum number of lockers at any one time
4       Number of current lock objects
8       Maximum number of lock objects at any one time
12M     Total number of locks requested (12353027)
12M     Total number of locks released  (12352927)
5274871 Total number of locks upgraded
10M     Total number of locks downgraded (10578206)
9878    Lock requests not available due to conflicts, for which we waited
0       Lock requests not available due to conflicts, for which we
        did not wait
0       Number of deadlocks
0       Lock timeout value
0       Number of locks that have timed out
0       Transaction timeout value
0       Number of transactions that have timed out
464KB   The size of the lock region
3915    The number of region locks that required waiting (0%)

(and I'm not seing any bdb lock issues - I searched the log
for the last few months)

So if 'Total number of locks upgraded/downgraded' were counted
against locks requested/released, I'd probably see something
like you are seing with db 4.2.52.

> We have no clue where to look anymore (spitted through lots of loglevel=5
> logs and found nothing strange....), so if you guys can provide any
> pointers about where to look or how to investigate further, please do ! :)
>
> OK, now for the version-info:
>
> We're running CentOS 4, fully up-to-date.
> postfix-2.2.10-1.1.el4
> amavisd-new-2.5.3-2.el4.rf
> spamassassin-3.2.4-1.el4.rf
> db4-4.2.52-7.3.el4
> perl-BerkeleyDB-0.33-1.el4.rf

I'd suggest upgrading bdb to 4.4.20 (or at least to 4.3.29).

> Furthermore, we've patched amavis with the patch that Mark sent yesterday
> to the list. That didn't change anything, from the looks of it...

Good. Ensuring there are no crashes is a pre-requisite to
further bdb troubleshooting, otherwise anything is possible.

> And finally, just to add even more to this long email, the output from
> db_stat -c -h /var/amavis/db/ at this time of writing (quite a lot of
> emails later then the above stats...) All seems fine to me, except for the
> large difference in requested/releases locks...
>
> 536     Last allocated locker ID.
> 2147M   Current maximum unused locker ID.
> 5       Number of lock modes.
> 1000    Maximum number of locks possible.
> 1000    Maximum number of lockers possible.
> 1000    Maximum number of lock objects possible.
> 100     Number of current locks.
> 105     Maximum number of locks at any one time.
> 200     Number of current lockers.
> 200     Maximum number of lockers at any one time.
> 4       Number of current lock objects.
> 6       Maximum number of lock objects at any one time.
> 109549  Total number of locks requested.
> 76738   Total number of locks released.
> 0       Total number of lock requests failing because DB_LOCK_NOWAIT
>         was set.
> 41      Total number of locks not immediately available
>         due to conflicts.
> 0       Number of deadlocks. 
> 0       Lock timeout value.
> 0       Number of locks that have timed out.
> 0       Transaction timeout value.
> 0       Number of transactions that have timed out.
> 440KB   The size of the lock region..
> 22      The number of region locks granted after waiting.
> 221743  The number of region locks granted without waiting.
>
> So, can anyone shine some light on this, please ?
> And if anyone has a suggestion on how to capture an email that might me
> causing this behaviour: we haven't been able to do that (it's quite a lot
> email that passes through this host, so hard to do by hand :) )

Enabling a clean quarantine, or an archival quarantine
would be one possibility.


Here are some possibly relevant entries in bdb release notes:
http://www.oracle.com/technology/software/products/berkeley-db/db/

4.3.29
Concurrent Data Store Changes:
  Fix cursor locks for environments opened without DB_THREAD so that they
  use the same locker ID. This eliminates many common cases of application
  self-deadlock, particularly in CDS.  [#9742]
Queue Access Method Changes:
5 Fix a bug where the Queue access method might fail to release a record
  lock when running without transactions.  [#9487]
Locking Subsystem Changes:
11 Fix a bug that would fail to upgrade a write lock when moving a cursor
  off a previously deleted record. [#11042]
Utility Changes:
4 Fix a bug in the locking statistics where “current locks” included
  failed lock requests.  [#9314]

4.4.20
Concurrent Data Store Changes:
1. Lock upgrades and downgrades are now accounted for separately
   from lock requests and releases. [#11155]
Queue Access Method Changes:
2. Fix a bug where updates that do not use transactions,
   but do enable locking,  failed to release locks. [#11669]


Mark

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to