Bayes Misidentification

2007-06-04 Thread Ben Lentz

Greetings list!

Starting Friday, June 1st, every email that passes through my site-wide 
SpamAssassin system has been coming through with BAYES_99. I've been 
running with Bayes for months without any accuracy problems, and I can't 
figure out what has changed.


I am storing the Bayes data in a MySQL database. I tried truncating the 
database on Friday when I first detected this issue, but sure enough, 
all my external messages are now coming through with BAYES_99 again.


I don't trust the Bayes system any more and after many user complaints, 
I've opted to turn it off. However, setting use_bayes 0 doesn't seem to 
do anything; messages are still coming through with BAYES_99.


Is anyone else having this issue? Is my database just being poisoned 
over and over again?


Thanks for any input anyone can provide.


Re: Bayes Misidentification

2007-06-04 Thread Jari Fredriksson
I had similar problem a week or two ago.

I have a site wide system, and I use user spam to run the stuff.

However, it seemed that user root somehow got some stuff for it's account, 
and indeed spamd was using root's account for all scanning (that's why 
truncating spam's data did not help.

The problem seemed to go away when I added -q option to spamd start, that way 
it seems to use the correct used id for MySQL connection too, without it it was 
using root.

That's how I thought it went.

Regards,
jarif





Ben Lentz wrote:
 Greetings list!
 
 Starting Friday, June 1st, every email that passes through my
 site-wide SpamAssassin system has been coming through with BAYES_99.
 I've been running with Bayes for months without any accuracy
 problems, and I can't figure out what has changed.
 
 I am storing the Bayes data in a MySQL database. I tried truncating
 the database on Friday when I first detected this issue, but sure
 enough, all my external messages are now coming through with BAYES_99
 again. 
 
 I don't trust the Bayes system any more and after many user
 complaints, I've opted to turn it off. However, setting use_bayes 0
 doesn't seem to do anything; messages are still coming through with
 BAYES_99. 
 
 Is anyone else having this issue? Is my database just being poisoned
 over and over again?
 
 Thanks for any input anyone can provide.




Re: Bayes Misidentification

2007-06-04 Thread Craig Carriere
Just a guess and probably wrong, but if you encrypt your data in mySQL
are you sure your system can read the key file and de-crypt the data? 
If not bayes will be feed encrypted mail and will soon become
corrupted.  Also have you tried to simply delete all from your mySQL
bayes bases and retrain it? 

Ben Lentz wrote:
 Greetings list!

 Starting Friday, June 1st, every email that passes through my
 site-wide SpamAssassin system has been coming through with BAYES_99.
 I've been running with Bayes for months without any accuracy problems,
 and I can't figure out what has changed.

 I am storing the Bayes data in a MySQL database. I tried truncating
 the database on Friday when I first detected this issue, but sure
 enough, all my external messages are now coming through with BAYES_99
 again.

 I don't trust the Bayes system any more and after many user
 complaints, I've opted to turn it off. However, setting use_bayes 0
 doesn't seem to do anything; messages are still coming through with
 BAYES_99.

 Is anyone else having this issue? Is my database just being poisoned
 over and over again?

 Thanks for any input anyone can provide.

begin:vcard
fn:Dr. Craig Carriere
n:Carriere;Craig
org:Cobatco Inc.;Technology Development
adr:;;1215 NE Adams Street;Peoria;IL;61550;USA
email;internet:[EMAIL PROTECTED]
tel;work:309.676.2663
tel;fax:309.676.2667
url:http://www.cobatco.com
version:2.1
end:vcard



Re: Bayes Misidentification

2007-06-04 Thread arni

Jari Fredriksson schrieb:

I had similar problem a week or two ago.
  
Are you both using autolearn only, or do you manually learn with 
sa-learn (or similar) ?


You probably poisened you bayes db by learning ham as spam.

If you're using autolearning: Adjust your scores and generally make sure 
you dont have false positves as these are very bad.
If you're manually learning: You cant trust your user's to classify spam 
for your global database. Users are users and 99% of all mistakes happen 
in front of the keyboard.


Solution for now: If you can still find out what ham you learned wrong, 
unlearn it - if you cant, you'll have to revert to a bayes backup. If 
you dont have one you'll have to start new.


arni



Re: Bayes Misidentification

2007-06-04 Thread Ben Lentz



Just a guess and probably wrong, but if you encrypt your data in mySQL
are you sure your system can read the key file and de-crypt the data? 
If not bayes will be feed encrypted mail and will soon become

corrupted.  Also have you tried to simply delete all from your mySQL
bayes bases and retrain it? 



  
Yes, that's what I was hoping would happen when I truncated the _seen, 
_tokens, and _expire tables on Friday. By Saturday afternoon, false 
positives were being generated, with BAYES_99 being the largest 
contributing factor.


I've since dropped the tables and recreated them (in case the table 
structure has changed between versions; I recently upgraded to 3.2.0 
when it was released).


I'm not sure I know what you mean when you say I've got encrypted data 
in MySQL. I didn't establish any keys or anything like that to 
communicate with MySQL, I just set the bayes_store_module, 
bayes_sql_dsn, bayes_sql_username, and bayes_sql_password settings.


My bayes configuration is based on a little IMAP-derived user feed back 
data, but by vast majority is trained by the auto-learning system.


Re: Bayes Misidentification

2007-06-04 Thread Ben Lentz




I had similar problem a week or two ago.

I have a site wide system, and I use user spam to run the stuff.

However, it seemed that user root somehow got some stuff for it's account, and indeed 
spamd was using root's account for all scanning (that's why truncating spam's data did 
not help.

The problem seemed to go away when I added -q option to spamd start, that way it seems to 
use the correct used id for MySQL connection too, without it it was using 
root.

That's how I thought it went.

Regards,
jarif


  
Thanks for the tip, but I'm still storing my configuration in regular 
files; it's just the Bayes stuff that's in MySQL (the -q seems to have 
to do with a sql-based configuration).


Re: Bayes Misidentification

2007-06-04 Thread arni

Ben Lentz schrieb:
My bayes configuration is based on a little IMAP-derived user feed 
back data, but by vast majority is trained by the auto-learning system.
You cant trust your users, they will put newsletters they ordered but 
dont know how to stop and other non-spam into the spamfolder.


arni


Re: Bayes Misidentification

2007-06-04 Thread Jari Fredriksson
Ben Lentz wrote:
 I had similar problem a week or two ago.
 
 I have a site wide system, and I use user spam to run the stuff.
 
 However, it seemed that user root somehow got some stuff for it's
 account, and indeed spamd was using root's account for all scanning
 (that's why truncating spam's data did not help.  
 
 The problem seemed to go away when I added -q option to spamd start,
 that way it seems to use the correct used id for MySQL connection
 too, without it it was using root.  
 
 That's how I thought it went.
 
 Regards,
 jarif
 
 
 
 Thanks for the tip, but I'm still storing my configuration in regular
 files; it's just the Bayes stuff that's in MySQL (the -q seems to have
 to do with a sql-based configuration).


Well, another change that I made was removing -u username option, it was -u 
amavis, but then I looked manpage which said

 Run as the named user.  If this option is not set, the default behaviour is to 
setuid() to the user running spamc,
   if spamd is running as root.

Which was what I actually needed. My spamc is called every time with -u spam

I was a bit confused about what changed what but it seems now to work. I added 
-q while I do not used SQL preferences, and removed -u from spamd startup.

Anyway, it felt like spamd WAS running as root vis MySQL, and not it seems to 
work. After those changes there was no BAYES_99 when the database was sa-learn 
--clear, but without the changes, there was BAYES_99 for every mail.. unless I 
said sa-learn -u root --clear