Thanks for the responses.
Few questions - will running 'check_whitelist' affect our server's
performance? Do I risk creating other problems if I leave things as they
are until our sys admin returns? :)
On 7/18/07, Matt Kettler [EMAIL PROTECTED] wrote:
Tammy George wrote:
Hello.
Our Linux server is running SpamAssassin version 3.1.5.
Backups started dying with 'inactivity timeout'. Dug around found
the following:
drwx-- 3 vscan vscan512 Jul 18 16:28 .
-rw--- 1 vscan vscan 1099983372288 Jul 18 16:28 auto-whitelist
-rw--- 1 vscan vscan 1205862400 Jul 18 16:28 bayes_seen
-rw--- 1 vscan vscan 10846208 Jul 18 16:28 bayes_toks
-rw--- 1 vscan vscan 18240 Jul 18 16:28 bayes_journal
drwxr-x--- 12 vscan vscan 1024 Jul 18 12:12 ..
-rw--- 1 vscan vscan2654208 Jan 26 2005
bayes_toks.expire42066
-rw--- 1 vscan vscan 606208 Mar 30 2004
bayes_toks.expire93303
drwxr-xr-x 2 vscan vscan512 Jan 28 2004 old
-rw-r--r-- 1 vscan vscan 1165 Jan 27 2004 user_prefs
A du -k shows auto-whitelist as being 1747968.
Surprisingly, we aren't experiencing any problems other than the
backups. Our site handles A LOT of email.
After I send this email, I'm going to look into check_whitelist and
trim_whitelist (and probably sa-learn re: the bayes files), however,
any suggestions would be most appreciated! Our sys admin is on
vacation and he's our expert.
for the auto-whitelist file you need to run this command:
check_whitelist --clean /path/to/auto-whitelist
That said, IMHO, the AWL isn't really ready for production use on large
systems unless you're going to run it on SQL and use your own scripts to
do expiry.
The bayes_toks and bayes_journal files auto-expire, so you don't need to
do anything to them.
The bayes_seen file doesn't have any kind of date information, so it
can't auto-expire. However, you can remove the file reasonably safely.
This file is just a list of all the files that have already been run
through sa-learn. The only drawback to deleting it is that it will allow
you to re-train a message that you've already learned. So if you
maintain a massive directory of files to be relearned but don't clean
it out, you might have a minor amount of over-learning (no big deal).
Thanks in advance for any advice.