Re: huge auto-whitelist file etc

2007-07-19 Thread Tammy George

Thanks for the responses.

Few questions - will running 'check_whitelist' affect our server's
performance?  Do I risk creating other problems if I leave things as they
are until our sys admin returns?  :)




On 7/18/07, Matt Kettler [EMAIL PROTECTED] wrote:


Tammy George wrote:
 Hello.

 Our Linux server is running SpamAssassin version 3.1.5.

 Backups started dying with 'inactivity timeout'.  Dug around  found
 the following:

 drwx--   3 vscan  vscan512 Jul 18 16:28 .
 -rw---   1 vscan  vscan  1099983372288 Jul 18 16:28 auto-whitelist
 -rw---   1 vscan  vscan 1205862400 Jul 18 16:28 bayes_seen
 -rw---   1 vscan  vscan   10846208 Jul 18 16:28 bayes_toks
 -rw---   1 vscan  vscan  18240 Jul 18 16:28 bayes_journal
 drwxr-x---  12 vscan  vscan   1024 Jul 18 12:12 ..
 -rw---   1 vscan  vscan2654208 Jan 26  2005
 bayes_toks.expire42066
 -rw---   1 vscan  vscan 606208 Mar 30  2004
 bayes_toks.expire93303
 drwxr-xr-x   2 vscan  vscan512 Jan 28  2004 old
 -rw-r--r--   1 vscan  vscan   1165 Jan 27  2004 user_prefs

 A du -k shows auto-whitelist as being 1747968.

 Surprisingly, we aren't experiencing any problems other than the
 backups.  Our site handles A LOT of email.

 After I send this email, I'm going to look into check_whitelist and
 trim_whitelist (and probably sa-learn re: the bayes files), however,
 any suggestions would be most appreciated!  Our sys admin is on
 vacation and he's our expert.
for the auto-whitelist file you need to run this command:

   check_whitelist --clean /path/to/auto-whitelist

That said, IMHO, the AWL isn't really ready for production use on large
systems unless you're going to run it on SQL and use your own scripts to
do expiry.

The bayes_toks and bayes_journal files auto-expire, so you don't need to
do anything to them.

The bayes_seen file doesn't have any kind of date information, so it
can't auto-expire. However, you can remove the file reasonably safely.
This file is just a list of all the files that have already been run
through sa-learn. The only drawback to deleting it is that it will allow
you to re-train a message that you've already learned. So if you
maintain a massive directory of files to be relearned but don't clean
it out, you might have a minor amount of over-learning (no big deal).




 Thanks in advance for any advice.





huge auto-whitelist file etc

2007-07-18 Thread Tammy George

Hello.

Our Linux server is running SpamAssassin version 3.1.5.

Backups started dying with 'inactivity timeout'.  Dug around  found the
following:

drwx--   3 vscan  vscan512 Jul 18 16:28 .
-rw---   1 vscan  vscan  1099983372288 Jul 18 16:28 auto-whitelist
-rw---   1 vscan  vscan 1205862400 Jul 18 16:28 bayes_seen
-rw---   1 vscan  vscan   10846208 Jul 18 16:28 bayes_toks
-rw---   1 vscan  vscan  18240 Jul 18 16:28 bayes_journal
drwxr-x---  12 vscan  vscan   1024 Jul 18 12:12 ..
-rw---   1 vscan  vscan2654208 Jan 26  2005
bayes_toks.expire42066
-rw---   1 vscan  vscan 606208 Mar 30  2004
bayes_toks.expire93303
drwxr-xr-x   2 vscan  vscan512 Jan 28  2004 old
-rw-r--r--   1 vscan  vscan   1165 Jan 27  2004 user_prefs

A du -k shows auto-whitelist as being 1747968.

Surprisingly, we aren't experiencing any problems other than the backups.
Our site handles A LOT of email.

After I send this email, I'm going to look into check_whitelist and
trim_whitelist (and probably sa-learn re: the bayes files), however, any
suggestions would be most appreciated!  Our sys admin is on vacation and
he's our expert.

Thanks in advance for any advice.