Andrew Savory wrote:

Hi,

On 17 Apr 2004, at 18:59, Stefano Mazzocchi wrote:

Find out how this works here:

http://www.betaversion.org/~stefano/software/erathostenes/index.html


Interesting! But when you say "the assumption is that you *never* delete anything" ... do you mean in perpetuity? How realistic do you think this is, given the ~40kb payload of most virus mails these days? Over the last 6 months, I've accumulated over a gigabyte of such mail ... that's a pretty high cost in disk space!

Or do you discard after retraining?

In this version of the script, if you remove an email from the spam folder, the spam database is untrained. This allows you to avoid the escalation of false positives (if you can still spot them!).

In the previous incarnation, the script was not doing this, but the problem was that if you had a false positive (or, much more frequent, you moved your ham in the spam folder by mistake and you didn't notice before cron called the trainer) this "pollutes" the database.

In order to be able to perform "undo" in training (this is not frequent but it's a nice feature) you need to save everything at all time. Actually, the way the script works today is that it makes a local copy of the email in your server, so not only you save everything, but you have two copies of it.

Note that it is entirely possible, in case your disk space is limited, to modify the script to remove binary attachments from email.

Anyway, In my case, I have 320mb of spam in the last 6 months. Disk space is not that big of a deal these days, especially on servers.

--
Stefano.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to