Re: Auto Learn Spam
On Wed, 2010-04-28 at 12:38 -0400, Carlos Mennens wrote: > I checked /etc/mail/spamassassin/local.cf just now and found only the > following: > > required_hits 5 > report_safe 0 > rewrite_header Subject [SPAM] > > However I don't know if Amavisd-new is looking at local.cf because I > show parameters in my amavisd.conf file for SpamAssassin: > > $sa_tag_level_deflt = -999.0; # add spam info headers if at, or > above that level > $sa_tag2_level_deflt = 5.0; # add 'spam detected' headers at that level > $sa_kill_level_deflt = 8.0; # triggers spam evasive actions (e.g. > blocks mail) > $sa_dsn_cutoff_level = 10; # spam level beyond which a DSN is not sent > $sa_quarantine_cutoff_level = 12; # spam level beyond which quarantine is off > $penpals_bonus_score = 8;# (no effect without a @storage_sql_dsn database) > $penpals_threshold_high = $sa_kill_level_deflt; # don't waste time on hi spam > These settings are for amavisd-new and not spamassassin. Amavisd-new is the glue between your MTA and spamassassin (and virus scanners). Most of the behavior of spamassassin is still controlled through the local.cf (although some settings can be defined in both places and the amavisd.conf file will take precedence). > $sa_mail_body_size_limit = 400*1024; # don't waste time on SA if mail is > larger > $sa_local_tests_only = 0;# only tests which do not require internet > access? > [...] > $sa_spam_subject_tag = '***SPAM*** '; > $defang_virus = 1; # MIME-wrap passed infected mail > $defang_banned = 1; # MIME-wrap passed mail containing banned name > # for defanging bad headers only turn on certain minor contents categories: > $defang_by_ccat{+CC_BADH.",3"} = 1; # NUL or CR character in header > $defang_by_ccat{+CC_BADH.",5"} = 1; # header line longer than 998 characters > > When I get a spam message that was scored by SA, it says ***SPAM*** > and not [SPAM] so that leaves me to believe that SA parameters are > being fed from amavisd.conf file. Does this make sense to you guys? This is just the setting in amavisd.conf taking precedence. If you were to comment out $sa_spam_subject_tag I *believe* the value in your local.cf would then be used.
Re: Auto Learn Spam
On Wed, 2010-04-28 at 11:53 -0400, Carlos Mennens wrote: > I noticed when reviewing headers today that there was a section for > 'autolearn=no' and was wondering what exactly does this mean and > wouldn't autolearn be a good thing? I use Amavisd-new which calls out > to SpamAssassin modules but I don't have the spamd daemon running > physically. The Amavisd-new daemon simply loads the modules for spamd > and does the scoring directly saving my mail server from running more > daemon's and system resources that it needs to. So below are the > headers: > Autolearn kicks in at certain scores. I believe the default is 12.0 for spam and 0.1 for ham. You can customize those settings in your local.cf file. bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam -3.0 bayes_auto_learn_threshold_spam 12.0 I changed the default value for nonspam because the majority of my users don't train bayes and so the default value could cause bayes to learn incorrectly if a spam message scored low (maybe no network rules or URI rules triggered the first few times). > X-Spam-Status: No, score=2.808 tagged_above=-999 required=5 > tests=[BAYES_50=0.8, HTML_IMAGE_ONLY_24=1.618, HTML_MESSAGE=0.001, > HTML_MIME_NO_HTML_TAG=0.377, MIME_HTML_ONLY=0.723, > RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] > autolearn=no > This particular message scored a 2.808 so it's not high or low enough for bayes to know which way it should learn the message. --Dennis
Re: multiple instances
On Fri, 2010-04-16 at 10:08 -0700, Gary Smith wrote: > I have a need to run several different instances of SA on a single box (in > development). In production, we have 3 different SA environments (with 2+ > servers each) that have different rule sets and specific routing rules > determine which instance it gets sent to. We need to mimic this in > development. > > Ideally I would like to create all 3 instances (*2 mimicing load balancing) > on a single development box. We're not worried about the performance or > memory aspect. > > Is this possible, and if so, is there an easy way to do this. I was > thinking that I could create separate chroot environments for each one if > necessary and either bind each instance to an IP (which I'm not sure if > that's possible) or at least a different port. > > Any advice (or some sample scripts on doing this) would be greatly > appreciated. > I'm sure it's possible, but rather than going through all the work of trying to script and setup chroot environments, why not use VMs? You can then quite literally match the production setup. Since you are not worried about performance or memory you could give each VM 128 MB of RAM and only be using 1 GB or so total... --Dennis
Re: Quarantine Management
Quoting Alex : Hi, Just wondering what other tools are out there that people like. I use postfix as my MTA right now, but am not completely opposed to using something else if necessary to use a specific quarantine system. Amavisd-new works well with postfix maia mailguard using amavisd-new but an old version. I think he's probably referring to something that would help him manage the quarantine itself, such as to query it for FNs, provide some type of reporting, forward FPs back to the proper recipient, manage expiry, expunging, and scoring, etc? Yes exactly what I'm referring to. Wishlist would be: User controllable (i.e users can release spam messages back into their mailbox) Whitelist/blacklist management Domain configurations maia mailguard has pretty much all of that but hasn't been updated in a while, just looking for other possibilities. Do people just flag the message as spam (maybe in the header) and then let users filter to a spam folder? We are using this as a front end to exchange so I guess we could just flag it and then have exchange deliver it to the users "Junk E-mail" folder, but then bayes can't learn from its mistakes as easily. --Dennis
Quarantine Management
What are people using for quarantine management with spamassassin? I've been using maia mailguard and it works decently but hasn't been updated in what seems like forever (svn has been updated, but no formal release). Just wondering what other tools are out there that people like. I use postfix as my MTA right now, but am not completely opposed to using something else if necessary to use a specific quarantine system. Thanks, --Dennis
Re: AWL
> Not that I'm aware of. > > Is the AWL score enough to prevent the messages from being marked as > spam, or are you seeing the negative AWL score on messages that are > marked as spam? It is normal for AWL to give negative scores to spam > from time to time, but for the most part, it should not be enough to > push the score below the spam threshold. Not usually, but I have seen a few messages that triggered BAYES_99 or BAYES_95 and then a few other rules that pushed the score to just above 5.0 (which is what I block at) and then AWL will come in with say a -0.35 and drop the overall score to 4.8. I know how AWL works and occasionally it will lower the score of a spam, but it just seems to be happening more often lately. I store my AWL in mysql so I just deleted all entries that have a count of less then 20. I think pretty much every time this happens the AWL count is low (maybe 3 or 4). --Dennis
AWL
I have AWL enabled and it seems to be ok with helping out legitimate senders that occasionally send a "spammy" type message, but lately I have seen an increase where AWL is adding a negative score to a very blatant spam. So my questions are, do people feel AWL is worth having enabled? Is there a way to have the AWL rule only triggered if there is a minimum number of messages seen by that sender? --Dennis
Re: KHOP_RCVD_TRUST
On Fri, 2010-03-26 at 11:35 -0400, Michael Scheidell wrote: > > On 3/26/10 10:41 AM, Dennis B. Hopp wrote: > > I received the following e-mail > > > > http://pastebin.com/JXr9buxi > > > > It had a total score of 4.973 (blocked at 5). Among other rules it hit: > > > > KHOP_RCVD_TRUST=-1.75,RCVD_IN_DNSWL_MED=-0.5,SPF_PASS=-0.001 > > > > > is that an old rule? i just checked SA updates, and I don't see that > rule in current SA 3.3.1 > > so, who is KHOP? I looked in rule sets and don't know them. were these > rules inherited form some outside trusted source? > > http://khopesh.com/wiki/Anti-spam#sa-update_channels Some of his rules I believe have been incorporated into mainline sa. I'm using 3.3.1. I just got an update from some of the KHOP channels yesterday so they appeared to be maintained. --Dennis
KHOP_RCVD_TRUST
I received the following e-mail http://pastebin.com/JXr9buxi It had a total score of 4.973 (blocked at 5). Among other rules it hit: KHOP_RCVD_TRUST=-1.75,RCVD_IN_DNSWL_MED=-0.5,SPF_PASS=-0.001 So is the KHOP_RCVD_TRUST score too low? Should I possibly consider making that -0.75 or something? Is there a way to report FP to KHOP? Thanks, --Dennis
Re: Upgrading to SpamAssassin 3.3
On Wed, 2010-03-17 at 11:35 -0400, Kaleb Hosie wrote: > Hello, > I'm running SA 3.2.5 on CentOS 5.4 and I've noticed that a newer major > release has been released. The server is currently in production so I'm a bit > leery to upgrade. > > Do you feel that it is worth the upgrade to 3.3? Is there anything I should > know before I go ahead and upgrade? > I upgraded CentOS 5.4 to 3.3.0 and only ran into one issue which had nothing to do with spamassassin. The ugprade of spamassassin went fine but I use it with maia-mailguard and the current stable version of maia-mailguard does not work correctly with 3.3.0. There is a patch in the svn for maia that fixes the issue. --Dennis
Re: My First Spam Mail Today
> My headers look like: > > X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on mail.iamghost.com > X-Spam-Level: * > X-Spam-Status: No, score=1.0 required=6.3 > tests=EXTRA_MPART_TYPE,HTML_MESSAGE autolearn=no version=3.3.0 > > * > The message scored a 1.0 (score=1.0) but the X-Spam-Score header apparently wasn't added to the message. > The above snipper shows no score as I would expect to see below from a > different server: > > X-Spam-Flag: NO > X-Spam-Score: -1.15 > X-Spam-Level: > X-Spam-Status: No, score=-1.15 tagged_above=-999 required=5 > tests=[BAYES_00=-2.599, MSGID_MULTIPLE_AT=1.449] autolearn=no > > * > > Am I missing something in my local.cf that is not properly scoring all > incoming messages? In this example you also have "tagged_above=-999" which leads me to believe you are using amavisd-new. Are both servers using amavisd-new? --Dennis
Re: [sa] Re: Bogus mails from hijacked accounts
On Fri, 2010-03-12 at 12:52 -0600, Dennis B. Hopp wrote: > > > The problem with this is that the !__FORGED_YH2 matches > > when there is *NO* Reply-To header at all! > > > > You need something like this: > > > > header __FORGED_YH2 Reply-To =~ /\@([^y]|y[^a]|ya[^h]|yah[^o])/i > > meta FORGED_YAHOO (__FORGED_YH1 && __FORGED_YH2) > > > > (remove the negation from the meta) > > This directly tests for an existing Reply-To specifically to a domain > > that does not begin with 'yaho'. > > Wouldn't that meta rule trigger when the reply-to contained 'yaho'? I > want to trigger when the from contains yahoo.com and the reply-to does > not. Nevermind..the '^' inside brackets negates..I get it now..
Re: [sa] Re: Bogus mails from hijacked accounts
> The problem with this is that the !__FORGED_YH2 matches > when there is *NO* Reply-To header at all! > > You need something like this: > > header __FORGED_YH2 Reply-To =~ /\@([^y]|y[^a]|ya[^h]|yah[^o])/i > meta FORGED_YAHOO (__FORGED_YH1 && __FORGED_YH2) > > (remove the negation from the meta) > This directly tests for an existing Reply-To specifically to a domain > that does not begin with 'yaho'. Wouldn't that meta rule trigger when the reply-to contained 'yaho'? I want to trigger when the from contains yahoo.com and the reply-to does not. > > However, keep in mind that the headers for *this* mailing list would > trigger your rule. So you will also need to meta this with a rule that > tests for yahoo mail server being the sending SMTP client > Good point. I didn't think about that.. --Dennis
Re: Bogus mails from hijacked accounts
> describe FORGED_HOTMAIL Hotmail with non-Hotmail Reply-to address > header __FORGED_HM1 From ~= /\...@hotmail\.com/i > header __FORGED_HM2 Reply-to ~= /\...@hotmail\.com/i > meta FORGED_HOTMAIL (__FORGED_HM1 && !__FORGED_HM2) > scoreFORGED_HOTMAIL 5.0 > > and write cookie cutter rules for Yahoo and Gmail. > > OTOH if you're happy that a Japanese test won't generate FPs you can > cover all three ISPs with one rule: > > describe FORGED_FROM Hotmail,Yahoo or Google with Japanese Reply-to > header __FF1 From ~= /\@(hotmail|yahoo|gmail)\.com/i > header __FF2 Reply-to ~= /\.jp/i > meta FORGED_FROM (__FF1 && __FF2) > scoreFORGED_FROM 5.0 > > Of course, if its just a few Japanese ISPs being used you can easily > make _FF2 more specific. > I tried this for yahoo... describe FORGED_YAHOO Yahoo with non-Yahoo Reply-to address header __FORGED_YH1 From =~ /\...@yahoo\.com/i header __FORGED_YH2 Reply-to =~ /\...@yahoo\.com/i meta FORGED_YAHOO (__FORGED_YH1 && !__FORGED_YH2) scoreFORGED_YAHOO 0.25 And it triggered on a message with the following header http://pastebin.com/qs18DpYn My best guess is it is using the "In-Reply-To" header...is there a way to differentiate "In-Reply-To" and "Reply-To" ? Thanks, --Dennis
Re: Bogus mails from hijacked accounts
> ...and I suppose the same would apply to social networks. I don't use > either, so am somewhat clueless about what goodies are available if you > can access their accounts. > I have some free e-mail accounts that I use as throw away accounts. When a site just HAS to have a valid e-mail so you can read the news article or whatever. I might login to the accounts about once a month. > > The one of these I encountered at $DAYJOB was sent to the account > > owner's wife's ex-husband-- not my first choice when asking for emergency > > funds. The email also claimed he was traveling in London-- the guy AFAIK > > hasn't left Texas, let alone the US, in the past few years-- and used a > > number of phrases that a native speaker of American so-called-English > > wouldn't. > > > OK, looks like I hugely overestimated the intelligence of recipients of > such scams and hence the care needed to target an attack. > It's a sad thing, but a lot of people fall for stupid scams every day...
Re: Bogus mails from hijacked accounts
> I don't think the accounts were hijacked: the headers showed that the > messages the OP posted were not sent from the domain hosting the mail > accounts. It looked to me as if somebody has sold on lists of valid > hotmail etc. accounts. > > I smell an inside job, or at least some careful preparation, because the > OP reckons that these accounts (forged as sender) were paired with valid > accounts he hosts that would be used by the owner of the forged account. > The messages I saw took the form: We got one owner of the hijacked accounts to admit he got an e-mail that basically said "Hi we are trying to get rid of dead accounts so please click here to verify your information". The site then very nicely asked for his username/password which he gave and then viola, no more access to his account. The message was then sent to every address in his address book (which is why many of my users got the same message). Sadly, we have had this happen a couple of times with hotmail and yahoo addresses. What can I say, some of our clients aren't exactly the most tech savvy. --Dennis
Re: Bogus mails from hijacked accounts
> Its not conditional, just using a meta rule and negating the Reply-to > test in the meta: > > describe FORGED_HOTMAIL Hotmail with non-Hotmail Reply-to address > header __FORGED_HM1 From ~= /\...@hotmail\.com/i > header __FORGED_HM2 Reply-to ~= /\...@hotmail\.com/i > meta FORGED_HOTMAIL (__FORGED_HM1 && !__FORGED_HM2) > scoreFORGED_HOTMAIL 5.0 > > and write cookie cutter rules for Yahoo and Gmail. > > OTOH if you're happy that a Japanese test won't generate FPs you can > cover all three ISPs with one rule: > > describe FORGED_FROM Hotmail,Yahoo or Google with Japanese Reply-to > header __FF1 From ~= /\@(hotmail|yahoo|gmail)\.com/i > header __FF2 Reply-to ~= /\.jp/i > meta FORGED_FROM (__FF1 && __FF2) > scoreFORGED_FROM 5.0 Thanks Martin. This is actually far simpler then I was thinking it would be. --Dennis
Re: Bogus mails from hijacked accounts
> 1) Spammers rotate sender addresses and hijacked account info more > often than most of us change our underwear. An account *may* get > reused; chances are it'll be months before it does, and the spammers > will have rotated through hundreds or thousands of others - both > phish-cracked and those set up just to send their junk. Blacklisting a > sender is reduced to blocking the persistent friend-of-a-friend who > refuses to remove you from the endless stream of chain-forwards, and > legitimate-but-totally-clueless mailing list operators who can't figure > out how to unsubscribe you from their list. :( > > 2) You noted originally that these appear to be fully legitimate > freemail accounts, legitimately used in the past to correspond with your > customers/clients, that have been compromised and then used to send > spam. How do you propose to still allow the legitimate account holders > to email your clients if you blacklist the sender? > I don't want to blacklist the address, hence the reason why in my original e-mail I said "other then blacklisting". I know blacklisting would block these bogus e-mails as well as legit e-mails as soon as the clients get access back (they currently don't have access to their accounts because their passwords have been changed). > > Martin's suggestion followup should point you in the right direction. > Sets of phrase rules (how similar are these messages? do you have ten > or fifteen you can compare sentence-by-sentence?) with low scores will > likely help some too. Meta rules that bump the score up depending on > how many phrases hit, or phrase+mismatched-sender/reply also work > tolerably well on this class of spam... if you can get enough samples to > build a complete enough set of phrase rules. I'm going to look at what Martin suggested and compare it to what samples I have. Thanks, --Dennis
Re: Bogus mails from hijacked accounts
On Wed, 2010-03-10 at 20:22 +, Martin Gregorie wrote: > On Wed, 2010-03-10 at 13:37 -0600, Dennis B. Hopp wrote: > > > Obviously we just have to tell the clients that they need to deal with > > the various e-mail providers, but is there an effective way that I can > > filter these messages out before my users see them without blacklisting > > the address? > > > There's nothing in SA that can blacklist a sending MTA, so blacklisting > can't happen unless you've added something to your MTA set-up that does > auto-blacklisting. > I meant blacklisting the sender address, not the MTA. > The question then comes down to marking the message as spam and dealing > with it however you normally deal with spam. You'll probably need custom > rule(s) to handle that. You say the message bodies are quite variable, > but I notice that the Reply-to: header doesn't remotely match the From: > header. Is this a common factor? > The ones that I have seen the reply-to doesn't match the from and I think the reply-to have all been something.jp > If it is, and the body texts have no common features that could also be > used, the only obvious approach would be a rule for each forged sending > domain that fires if the sending domain doesn't match the Reply-to > domain. > There isn't anything in common that I can see that wouldn't be susceptible to false positives. One even left the clients signature intact. I've written fairly simple custom rules before but I'm not sure how to do conditional rules. I'll have to dig into the docs a little more. > Only you can know if these rules would cause false positives: I can't > possibly tell from a single sample message. > I wasn't expecting anybody to give me a magic rule that would fix it, just suggestions since I would only be able to blacklist the sender address after the e-mail had been received and I was notified of the problem. And obviously blacklisting all of gmail/hotmail/yahoo isn't an option. Thanks, --Dennis
Bogus mails from hijacked accounts
We seem to be having a problem where clients that we interact with regularly are having their hotmail/gmail/yahoo accounts hijacked. We are receiving e-mails from their accounts that legitimately go through the correct servers (hotmail,yahoo, etc.) and so they get passed through our spam filters. The messages have different bodies but basically say the same thing that they were on vacation and had all their money stolen so they need to have money wire transferred to them. Obviously we just have to tell the clients that they need to deal with the various e-mail providers, but is there an effective way that I can filter these messages out before my users see them without blacklisting the address? In one case I had probably 15 users that received the same message and naturally they freaked out. I have put a sample at: http://pastebin.com/9BDXrxmm Note I did change the real e-mail address in this message but the hotmail address used is valid just masked. The message doesn't hit any rules of significance on my system. BAYES_00=-1.9,FREEMAIL_FROM=0.001,HTML_MESSAGE=0.001,RCVD_IN_DNSWL_NONE=-0.0001,SPF_PASS=-0.001,T_RP_MATCHES_RCVD=-0.01,T_TO_NO_BRKTS_FREEMAIL=0.01 Thanks --Dennis
Re: Bogus Dollar Amounts
Quoting Kai Schaetzl : Dennis B. Hopp wrote on Wed, 24 Feb 2010 09:14:58 -0600: Obviously I have something going on with my bayes, but that's a separate issue Indeed. But it's an important issue. If it is that biased for other spam as well youa re better off to not use it in this state. X-Spam-Status: No, score=2.8 required=5.0 tests=BAYES_50,HK_MUCHMONEY, T_LOTS_OF_MONEY,UNPARSEABLE_RELAY autolearn=no version=3.3.0 add your RBL score and it's way over 5. I agree it's an important issue. I had turned off bayes autoexpire in local.cf and at some point taken the cron job out that did a manual force-expire. Once I did a force expire BAYES_60 triggered rather then BAYES_00. What is the HK_MUCHMONEY rule that you have? Is that part of the base SA installation? Thanks, --Dennis
Re: Bogus Dollar Amounts
It is common in many parts of the world to use a period instead of a comma as a digit group separator, and vice-versa for the decimal separator. http://en.wikipedia.org/wiki/Thousands_separator#Digit_grouping I knew it was common in other parts of the world, but for some reason was thinking that when referring to US Dollars it wouldn't be. Now that I think about it I can understand why my original thought was wrong. I guess it doesn't really matter since the message was actually hitting another rule (T_LOTS_OF_MONEY) that I somehow missed. --Dennis
Re: Bogus Dollar Amounts
Nevermind...it was also hitting T_LOTS_OF_MONEY and once I expired old bayes tokens it no longer hit BAYES_00. Now I just have to figure out whats up with my bayes db. --Dennis Quoting "Dennis B. Hopp" : I have been seeing a few spam mails slip past that talk about being able to get bogus dollar amounts. What I mean by that is it will give a large value in the e-mail but where there should be a comma it puts a period. I put an example of one of these messages at: http://pastebin.com/SXuGELUS Are there any rules that can detect this? The only rules this hit on mine are: 1.900 DCC_CHECK 1.449 RCVD_IN_BRBL_LASTEXT 1.000 RCVD_IN_BRBL -0.001 SPF_PASS -0.010 T_RP_MATCHES_RCVD -1.900 BAYES_00 Obviously I have something going on with my bayes, but that's a separate issue Thanks, --Dennis
Bogus Dollar Amounts
I have been seeing a few spam mails slip past that talk about being able to get bogus dollar amounts. What I mean by that is it will give a large value in the e-mail but where there should be a comma it puts a period. I put an example of one of these messages at: http://pastebin.com/SXuGELUS Are there any rules that can detect this? The only rules this hit on mine are: 1.900 DCC_CHECK 1.449 RCVD_IN_BRBL_LASTEXT 1.000 RCVD_IN_BRBL -0.001 SPF_PASS -0.010 T_RP_MATCHES_RCVD -1.900 BAYES_00 Obviously I have something going on with my bayes, but that's a separate issue Thanks, --Dennis
Re: mail slipping through
Quoting Gary Smith : I've been having a pretty good hit rate on spam until recently (about two weeks). Two types of email have been coming through at a good rate. I'm receiving at least four per hour from the domains included below. I've also been training bayes with them as well, to no avail. Is it pretty much the same body, just different senders? *...@chocolatebearbear .INFO *...@biblegame .info *...@clickbetterthere .info If it's just the senders you could easily blacklist the domains, none of these domains look all that legit. Can you copy a message or two (with full headers) to pastebin so we can have a look? --Dennis
Re: Number of rules
Quoting Karsten Bräckelmann : If I'm reading that correctly less then 50% of mail is actually being filtered (seems like it should be higher then that). Those stats Actually, the numbers you gave for the "last couple days" are even lower. About one third, <15k out of 45k do have a BAYES_xx hit and thus are scanned by SA. I told you how to train your Bayes, if you're not satisfied with the result. Whether you like it not, there really isn't an other way. FWIW, blocking the obvious offenders early seems like a proper explanation for Bayes not showing a lot of high hitters. Yes you did and I'm going to set something up to make a copy of the messages that trigger BAYES_20 through BAYES_80 into a separate mailbox that I can then inspect periodically for a while (while still letting the message be delivered to the user) Anyway, considering the back and forth -- IMHO, you *first* should get a clear picture how exactly your mail is being processed. I don't feel like stabbing in the dark. And I don't expect you to take a stab in the dark. The 45K messages was the total processed inbound and outbound which I didn't think about that outbound is not funneled through SA and so would not be seen in BAYES. So I admit, it was a poor analysis on my part. Maybe I'm worried about nothing but given some of the spam that I get forwarded that gets through (some very obvious spam) and then to see what rules it hits just makes me think that something isn't quite right. Forwarded -- as in reports by your users, or forwarded from external MXs to yours? In the latter case, the obvious thing to check is your internal and trusted network settings. Forwarded from internal users asking how it got through the spam filters. I rarely get reports to our abuse/postmaster addresses (with the exception of AOL users who mark messages as spam when they clearly are not spam).
Re: Number of rules
Quoting Karsten Bräckelmann : On Fri, 2009-07-31 at 06:07 -0700, John Hardin wrote: On Fri, 31 Jul 2009, Dennis B. Hopp wrote: > I cleared my maia statistics a couple of days ago. Since then BAYES_00 has > triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other > BAYES_XX are less then 1000 times). Do they all add up to about 45,000? Doh! Good catch, John. No, they cannot possibly. Do the math. These 3 rules are less than 10k, remaining 35k. Each less than 1k hits means we need another > 35 rules. However, there are merely 6 ones left. $ grep -c BAYES_ 50_scores.cf 9 The stats are incorrect. Well, unless the lions share is processed with Bayes disabled, or otherwise not processed by SA. I do have sanesecurity rules in clamav which may be filtering messages before spamassassin sees them which would account for some of the difference between the total BAYES triggered and messages received. We also relay all outbound mail through these same servers but do not send outbound mail through spamassassin which again would make for some difference. I should have thought to mention that before. I couldn't get sa-stats to give me any useful information. I did get amavis-logwatch and I am not sure if I like what it's showing me. I ran it against the last few maillogs I have so it encompasses basically the last month. Here is the relevant parts of the output: http://pastebin.com/m59ddaf1d If I'm reading that correctly less then 50% of mail is actually being filtered (seems like it should be higher then that). Those stats don't count the messages we completely reject. We don't reject solely on one RBL but use policy-weightd to reject messages. I guess I could just let all messages through to SA for a few days to see how things change, but I don't see the point of wasting CPU/Memory for messages that are pretty much guaranteed spam. Here is the stats on my postfix: http://pastebin.com/m15d2533e Maybe I'm worried about nothing but given some of the spam that I get forwarded that gets through (some very obvious spam) and then to see what rules it hits just makes me think that something isn't quite right. --Dennis
Re: Number of rules
Quoting John Hardin : On Fri, 31 Jul 2009, Dennis B. Hopp wrote: I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). Do they all add up to about 45,000? No they don't. I see some messages that trigger no rules at all (Bayes or otherwise). I thought that was odd since I thought a bayes rule should trigger pretty much all the time. In those same couple of days we have processed about 45,000 messages (this is the number of messages that actually reached spamassasin and wasn't out right rejected). If there is a better way to get sa statistics I'd be happy to know. sa_stats.pl from the SARE website. http://www.rulesemporium.com/programs/ I'll take a look. Will this works with logs that are written by amavisd-new? Thanks, --Dennis
Re: Number of rules
Quoting RW : On Fri, 31 Jul 2009 03:55:48 +0200 Karsten Bräckelmann wrote: The default of 0.1. It's a default for a reason. But that *really* is not your problem. Your problem is with learning spam, not learning even more ham. Just as you mentioned in your original report. See my previous response for a solution. You want to learn more spam. What he actually wrote was that 3.7% of _all_messages_ were hitting hitting BAYES_00, and 1.7% were hitting BAYES_99. If he actually meant what he wrote and doesn't have an extraordinary spam/ham ratio, then he clearly has a problem with both spam and ham. I cleared my maia statistics a couple of days ago. Since then BAYES_00 has triggered 4510 times, BAYES_99 2366 times and BAYES_50 1568 (all the other BAYES_XX are less then 1000 times). In those same couple of days we have processed about 45,000 messages (this is the number of messages that actually reached spamassasin and wasn't out right rejected). So my initial percentages were way off (I was going by maia mailguards sa rule statistics). So roughly 10% of mail is hitting BAYES_00 and 5% is hitting BAYES_99. It seems to me that BAYES_99 should probably be triggered more often then BAYES_00. If there is a better way to get sa statistics I'd be happy to know. I know that the bayes success rate comes down to training, but like every other administrator I can't possible check every message for accuracy and I was hoping to make the auto learn a little better. I thought maybe I just didn't have enough rules (both negative and positive scoring) to trigger the auto learn often enough. Thanks, --Dennis
Re: Number of rules
Quoting LuKreme : On Jul 30, 2009, at 18:12, "Dennis B. Hopp" wrote: Yeah I knew that. I have a few negative scoring rules but not many (outside of what might be in the misc rules sets I have). What is a good threshold for ham then? 5.0 is the score SA us designed for. It's a very good number in almost all cases. I meant the threshold for bayes auto learn to learn the message. I'll try switching back to the default values.
Re: Cant Post Message
Quoting twofers : I have a post I have tried several times over the last week to post to this forum and it never seems to get posted. I don't understand why? There is nothing exotic about it, just text, a question and email header info I pasted. Any idea whats up? Thanks, Wes Try putting the header on a site like www.pastebin.com and then put the link in your e-mail rather then the actual header. --Dennis
Re: Number of rules
Quoting RW : Bear in mind that autolearning uses it's own version of the score that excludes whitelisting and Bayes, which means that very little ham will reach the -1 threshold unless you've added your own site-specific rules for identifying it. Yeah I knew that. I have a few negative scoring rules but not many (outside of what might be in the misc rules sets I have). What is a good threshold for ham then? --Dennis
Number of rules
I'm using maia-mailguard with spamassassin 3.2.5. For the most part it seems to be working ok but I feel like too many messages are hitting BAYES_00 (roughly 3.7% of all messages) and BAYES_99 is only hitting about 1.7%. I have bayes autolearn on with ham being learned at -1.0 and spam learned at 8.0 I'm sort of thinking part of my problem is I just don't have enough rules so I'm curious how many rules do other users out there have in their spamassassin setup? I currently have about 2558 rules consiting of stock rules, SOUGHT, KHOP, SARE, some customer rules I wrote and various rules I've seen posted on this list and other sites. I have a few plugins enabled as well (FreeMail, iXhash, Botnet, ASN, Pyzor, Razor2, DCC) I know some of it is just training of the bayes but I'm wondering if just lack of rules might be causing some of my problems. Thanks, --Dennis