Dear Allori Lorenzo,
I have developed a useful concept for making the bayesians to learn..

On the mail relay server I have a small imap server that has three account 
configured on it. Spam, No-spam, and Learn. Spam and No-spam accounts 
can be used by any user to collect false positives(NOSPAM) and no-tagged 
real spam(SPAM). Theese two accounts are the same accounts for all users 
to reduce the use of ram for the imap service. For personal privacy after a 
mail arrives into spam or no-spam inbox is taken away from that account 
and dropped into the Learn account in its separated folder (so the two 
accounts appears empty). The next step is for the system administrator to 
see if the right mail is in the right folder (to see if users has made some 
mistakes ;) ). Then he can launch the sa-learn script. 
As regards secirity I'm using Dovecot imap, and theese account cannot 
receive mail from outside you can only drop mails from a mail-client.
(The good thing is you can decide which users can have this feature).

Do you think it can be interesting to integrate this inside openprotect?
  
Thanks for your suggestions. Your idea is sure interesting and yes, it will be a nice feature to integrate into OpenProtect.

Certain things to make sure are:

  1. The IMAP accounts shouldn't receive mails through SMTP
  2. The IMAP server should be fairly lightweight and secure, because making users patch the IMAP server frequently will be painful
  3. Authorized users(using a username/passwd pair) should copy the false-positives(ham mail tagged as spam) to Notspam IMAP account and false-negatives(spam tagged as ham) to Spam IMAP account
  4. Mails in Spam and Notspam IMAP accounts should be moved to another pair of IMAP accounts, where the admin can view the mails and make sure that the users are not poisoning the bayesian database. For eg, if a user copies a valid ham mail to the Spam IMAP account, then it could tag the mails from that sender wrongly as Spam in the future.
  5. Once the admin is certain of these mails, he should move them to another set of IMAP accounts where they're fed to sa-learn in a cronjob and then the mails are purged or archived if possible.
  6. Rebuild the bayesian database every 1 day or so to avoid slow down due to fragmentation.
The above usage of three pairs of IMAP accounts seems a little cumbersome. If someone else has another idea, don't hesitate to suggest that too. We'll discuss the merits of that method too.

Regarding the choice of IMAP servers, I'd like some suggestions on your preferred IMAP server for this bayesian learning. If its is some secure piece of software and easily installable like djbdns, then there'd be no worries regarding keeping the IMAP server current and patching it for new vulnerabilities.
ps=for this things: do I have to write directly to you or in the mailing-
list?                              
  
The mailing list is a better medium, as more guys will be aware of what we're doing and can give their suggestions. But, you can also personally mail me, if you feel that is better. :)


cheers,
Karthikeyan, S.
-- 
S.Karthikeyan | Ph: +91 (0) 44 52166646 Fax: +91 (0) 44 52079957
Opencomputing Technologies | http://opencompt.com 
Server Side E-Mail Protection.

Reply via email to