Bayes Misidentification
Greetings list! Starting Friday, June 1st, every email that passes through my site-wide SpamAssassin system has been coming through with BAYES_99. I've been running with Bayes for months without any accuracy problems, and I can't figure out what has changed. I am storing the Bayes data in a MySQL database. I tried truncating the database on Friday when I first detected this issue, but sure enough, all my external messages are now coming through with BAYES_99 again. I don't trust the Bayes system any more and after many user complaints, I've opted to turn it off. However, setting use_bayes 0 doesn't seem to do anything; messages are still coming through with BAYES_99. Is anyone else having this issue? Is my database just being poisoned over and over again? Thanks for any input anyone can provide.
Re: Bayes Misidentification
I had similar problem a week or two ago. I have a site wide system, and I use user spam to run the stuff. However, it seemed that user root somehow got some stuff for it's account, and indeed spamd was using root's account for all scanning (that's why truncating spam's data did not help. The problem seemed to go away when I added -q option to spamd start, that way it seems to use the correct used id for MySQL connection too, without it it was using root. That's how I thought it went. Regards, jarif Ben Lentz wrote: Greetings list! Starting Friday, June 1st, every email that passes through my site-wide SpamAssassin system has been coming through with BAYES_99. I've been running with Bayes for months without any accuracy problems, and I can't figure out what has changed. I am storing the Bayes data in a MySQL database. I tried truncating the database on Friday when I first detected this issue, but sure enough, all my external messages are now coming through with BAYES_99 again. I don't trust the Bayes system any more and after many user complaints, I've opted to turn it off. However, setting use_bayes 0 doesn't seem to do anything; messages are still coming through with BAYES_99. Is anyone else having this issue? Is my database just being poisoned over and over again? Thanks for any input anyone can provide.
Re: Bayes Misidentification
Just a guess and probably wrong, but if you encrypt your data in mySQL are you sure your system can read the key file and de-crypt the data? If not bayes will be feed encrypted mail and will soon become corrupted. Also have you tried to simply delete all from your mySQL bayes bases and retrain it? Ben Lentz wrote: Greetings list! Starting Friday, June 1st, every email that passes through my site-wide SpamAssassin system has been coming through with BAYES_99. I've been running with Bayes for months without any accuracy problems, and I can't figure out what has changed. I am storing the Bayes data in a MySQL database. I tried truncating the database on Friday when I first detected this issue, but sure enough, all my external messages are now coming through with BAYES_99 again. I don't trust the Bayes system any more and after many user complaints, I've opted to turn it off. However, setting use_bayes 0 doesn't seem to do anything; messages are still coming through with BAYES_99. Is anyone else having this issue? Is my database just being poisoned over and over again? Thanks for any input anyone can provide. begin:vcard fn:Dr. Craig Carriere n:Carriere;Craig org:Cobatco Inc.;Technology Development adr:;;1215 NE Adams Street;Peoria;IL;61550;USA email;internet:[EMAIL PROTECTED] tel;work:309.676.2663 tel;fax:309.676.2667 url:http://www.cobatco.com version:2.1 end:vcard
Re: Bayes Misidentification
Jari Fredriksson schrieb: I had similar problem a week or two ago. Are you both using autolearn only, or do you manually learn with sa-learn (or similar) ? You probably poisened you bayes db by learning ham as spam. If you're using autolearning: Adjust your scores and generally make sure you dont have false positves as these are very bad. If you're manually learning: You cant trust your user's to classify spam for your global database. Users are users and 99% of all mistakes happen in front of the keyboard. Solution for now: If you can still find out what ham you learned wrong, unlearn it - if you cant, you'll have to revert to a bayes backup. If you dont have one you'll have to start new. arni
Re: Bayes Misidentification
Just a guess and probably wrong, but if you encrypt your data in mySQL are you sure your system can read the key file and de-crypt the data? If not bayes will be feed encrypted mail and will soon become corrupted. Also have you tried to simply delete all from your mySQL bayes bases and retrain it? Yes, that's what I was hoping would happen when I truncated the _seen, _tokens, and _expire tables on Friday. By Saturday afternoon, false positives were being generated, with BAYES_99 being the largest contributing factor. I've since dropped the tables and recreated them (in case the table structure has changed between versions; I recently upgraded to 3.2.0 when it was released). I'm not sure I know what you mean when you say I've got encrypted data in MySQL. I didn't establish any keys or anything like that to communicate with MySQL, I just set the bayes_store_module, bayes_sql_dsn, bayes_sql_username, and bayes_sql_password settings. My bayes configuration is based on a little IMAP-derived user feed back data, but by vast majority is trained by the auto-learning system.
Re: Bayes Misidentification
I had similar problem a week or two ago. I have a site wide system, and I use user spam to run the stuff. However, it seemed that user root somehow got some stuff for it's account, and indeed spamd was using root's account for all scanning (that's why truncating spam's data did not help. The problem seemed to go away when I added -q option to spamd start, that way it seems to use the correct used id for MySQL connection too, without it it was using root. That's how I thought it went. Regards, jarif Thanks for the tip, but I'm still storing my configuration in regular files; it's just the Bayes stuff that's in MySQL (the -q seems to have to do with a sql-based configuration).
Re: Bayes Misidentification
Ben Lentz schrieb: My bayes configuration is based on a little IMAP-derived user feed back data, but by vast majority is trained by the auto-learning system. You cant trust your users, they will put newsletters they ordered but dont know how to stop and other non-spam into the spamfolder. arni
Re: Bayes Misidentification
Ben Lentz wrote: I had similar problem a week or two ago. I have a site wide system, and I use user spam to run the stuff. However, it seemed that user root somehow got some stuff for it's account, and indeed spamd was using root's account for all scanning (that's why truncating spam's data did not help. The problem seemed to go away when I added -q option to spamd start, that way it seems to use the correct used id for MySQL connection too, without it it was using root. That's how I thought it went. Regards, jarif Thanks for the tip, but I'm still storing my configuration in regular files; it's just the Bayes stuff that's in MySQL (the -q seems to have to do with a sql-based configuration). Well, another change that I made was removing -u username option, it was -u amavis, but then I looked manpage which said Run as the named user. If this option is not set, the default behaviour is to setuid() to the user running spamc, if spamd is running as root. Which was what I actually needed. My spamc is called every time with -u spam I was a bit confused about what changed what but it seems now to work. I added -q while I do not used SQL preferences, and removed -u from spamd startup. Anyway, it felt like spamd WAS running as root vis MySQL, and not it seems to work. After those changes there was no BAYES_99 when the database was sa-learn --clear, but without the changes, there was BAYES_99 for every mail.. unless I said sa-learn -u root --clear