Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?
On 5/5/22 14:28, Dave Wreski wrote: No, that's how you train your corpora. If you manually look through the headers of mail that's already been processed by your mail system, the ham should be as close to BAYES_00 as possible, and spam should be at BAYES_99. If that's not the case, then it's been trained incorrectly. /etc/mail/spamassassin/local.cf: bayes_auto_learnĀ 0 bayes_auto_expire 0 I'd also recommend disabling auto-learn, if you have that enabled. If you've gone through your corpus manually, and are certain the ham is all good mail and the spam emails are all bad mail, then it might be worth it to dump the existing bayes database and just retrain it with the corresponding mboxes. I also typically add --progress to sa-learn. Best, Dave Thanks, I appreciate it. I'll tune it a bit. Thomas
Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?
That's a great call, thanks. I grepped my mail files and didn't find any SPAM_99 headers in any of them. You should be looking for BAYES_99 and BAYES_999 in your corpus. Thanks, Dave. I use my various mailboxes (sa-learn --ham --mbox /home/thomas.cameron/mail/INBOX/[mailbox file] and then sa-learn --spam --mbox /home/thomas.cameron/mail/INBOX/spam) to train SA, doesn't that mean that I've already checked my corpora? No, that's how you train your corpora. If you manually look through the headers of mail that's already been processed by your mail system, the ham should be as close to BAYES_00 as possible, and spam should be at BAYES_99. If that's not the case, then it's been trained incorrectly. /etc/mail/spamassassin/local.cf: bayes_auto_learn 0 bayes_auto_expire 0 I'd also recommend disabling auto-learn, if you have that enabled. If you've gone through your corpus manually, and are certain the ham is all good mail and the spam emails are all bad mail, then it might be worth it to dump the existing bayes database and just retrain it with the corresponding mboxes. I also typically add --progress to sa-learn. Best, Dave Thomas
Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?
On 5/5/22 11:59, Dave Wreski wrote: You should probably check that none of your ham (i.e. non-spam) messages contains SPAM_99 or SPAM_999. It can happen when spammers poison your bayes database, and increased score in that case might lead to legitimate mail being misclassified as a spam. That's a great call, thanks. I grepped my mail files and didn't find any SPAM_99 headers in any of them. You should be looking for BAYES_99 and BAYES_999 in your corpus. Thanks, Dave. I use my various mailboxes (sa-learn --ham --mbox /home/thomas.cameron/mail/INBOX/[mailbox file] and then sa-learn --spam --mbox /home/thomas.cameron/mail/INBOX/spam) to train SA, doesn't that mean that I've already checked my corpora? Thomas
Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?
You should probably check that none of your ham (i.e. non-spam) messages contains SPAM_99 or SPAM_999. It can happen when spammers poison your bayes database, and increased score in that case might lead to legitimate mail being misclassified as a spam. That's a great call, thanks. I grepped my mail files and didn't find any SPAM_99 headers in any of them. You should be looking for BAYES_99 and BAYES_999 in your corpus. Best, Dave
Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?
On 5/5/22 11:47, Matija Nalis wrote: On Thu, May 05, 2022 at 10:37:40AM -0500, Thomas Cameron wrote: I understand that turning knobs without understanding the consequences can do bad thing, but almost all of the spam that gets through SA on my server has SPAM_99 or SPAM_999 set in the headers. It is obviously spam, so I don't really get how it wasn't flagged, but it wasn't. What are the risks of giving more weight to SPAM_99 and/or SPAM_999? Explain it like I'm five, sorry, it's probably something simple that I just don't understand. Thomas You should probably check that none of your ham (i.e. non-spam) messages contains SPAM_99 or SPAM_999. It can happen when spammers poison your bayes database, and increased score in that case might lead to legitimate mail being misclassified as a spam. That's a great call, thanks. I grepped my mail files and didn't find any SPAM_99 headers in any of them. Thomas
Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?
You should probably check that none of your ham (i.e. non-spam) messages contains SPAM_99 or SPAM_999. It can happen when spammers poison your bayes database, and increased score in that case might lead to legitimate mail being misclassified as a spam. On Thu, May 05, 2022 at 10:37:40AM -0500, Thomas Cameron wrote: > I understand that turning knobs without understanding the consequences can > do bad thing, but almost all of the spam that gets through SA on my server > has SPAM_99 or SPAM_999 set in the headers. It is obviously spam, so I don't > really get how it wasn't flagged, but it wasn't. What are the risks of > giving more weight to SPAM_99 and/or SPAM_999? Explain it like I'm five, > sorry, it's probably something simple that I just don't understand. > > Thomas > -- Opinions above are GNU-copylefted.
Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?
On 5/5/22 10:46, Reindl Harald wrote: Am 05.05.22 um 17:37 schrieb Thomas Cameron: I understand that turning knobs without understanding the consequences can do bad thing, but almost all of the spam that gets through SA on my server has SPAM_99 or SPAM_999 set in the headers. It is obviously spam, so I don't really get how it wasn't flagged, but it wasn't. What are the risks of giving more weight to SPAM_99 and/or SPAM_999? Explain it like I'm five, sorry, it's probably something simple that I just don't understand when your bayes is well trained just raise it the risk is simple: when you bayes isn't trained well or poisend (autolearning is the root of all evil) you risk FPs we milter-reject at 8.0 points and BAYES_99 + BAYES_999 are 7.5 points since 2014, the most junk collects the remaining 0.5 points with other rules and the few FP typically hit some DNSWL/SPF rules with negative score well, our bayes has 160k messages Many thanks! I appreciate the response! Thomas