__FRAUD_JBU and __FRAUD_TCC
Hello all you happy people, While debugging a FP on ADVANCE_FEE_3_NEW, I noticed that it included body __FRAUD_JBU /\bforeign account\b/i and body __FRAUD_TCC /foreign (?:offshore )?(?:bank|account)/i Correct me if I'm wrong, but won't anything matching __FRAUD_JBU also match __FRAUD_TCC? It also means that the phrase "foreign account" has twice the weight of "computer ballot system", "affidavits" or "as the beneficiary", which seems wrong. https://svn.apache.org/repos/asf/spamassassin/trunk/rules/20_advance_fee.cf is the file in SVN trunk. Is this worth raising as a bug (or improvement request)? Could someone try variants of the ADVANCE_FEE rules without __FRAUD_JBU and see if it performs better? Unfortunately, I can't see enough of the original email, and would be unlikely to have permission to supply it if I did. Thanks, James.
Re: SORBS bites the dust
mouss wrote (about the PBL): > stop spreading FUD. if you know of false positives, show us so that we > see what you exactly mean. > > a lot of people, including $self, use the PBL at smtp time. As usual, it depends on your definition of “false positive”. If you mean “IP address that should not have been in the PBL but was”, that’s one thing. It’s a consistent definition, but not very useful for stopping spam. If you mean “solicited and/or non-bulk email that would have been stopped by the PBL”, then I’ve seen a number of small Indian and Chinese companies who are unaware of a lot of things, including the existence of the PBL and that it’s a Good Thing to send email through a smart host with a consistent IP address and reverse DNS.¹ Obviously, everyone’s email stream is different. Mine includes a commercially-significant amount of email from small companies in those two countries, and probably doesn’t include email from other countries where this takes place. But by this definition, false positives do occur, and my company’s SpamAssassin installation has to try to handle them. James. ¹ Fortunately, they’re also unaware that signatures should be removed when replying. That, a standard corporate signature including company registration data, a standard domain in each Message-ID that doesn’t appear in public DNS, a few negatively-scored custom rules to detect these, and the AWL mean that once someone has responded to one of our emails, they get automatically whitelisted. So at least existing correspondents don’t get blocked. -- E-mail: james@ | Top Tip: If you are being chased by a police dog, don’t aprilcottage.co.uk | try to get away by crawling through a tunnel, going onto | a little see-saw, and jumping through a hoop of fire. | They are trained for that, you see. | -- “Bystander”, London magistrate
Re: 'anti' AWL
Charles Gregory wrote: > Though again, legit senders that average negative are relatively rare > (well, on my system, anyways). For what it’s worth, I’ve set up SA to identify replies to the organisation’s email. It looks at the In-Reply-To and References headers (our Message-IDs have a distinctive domain that’s not in public DNS and isn’t easily guessable) and looks for the organisation standard signature (again, this is very unlikely to come up in spam). Most replies have one or the other, and it’s fairly common for a correspondent to have an average score of less than -10. It means that AWL really does work as an auto-white-list for us. James. -- E-mail: james@ | ... clueless he is not. He's just selective about which aprilcottage.co.uk | clues to pay attention to. | -- Shmuel (Seymour J.) Metz
Re: Image spam and failing rule
Theo Van Dinter wrote: > It's already been mentioned, but mimeheader is the right way to look > at the headers of MIME parts. Charles Gregory wrote: > Look more closely at my rule. It is checking for TWO headers, > one after the other (separated by \n), identifying a gif with no name. > >>> full /Content-Type: image\/gif;\n[^a-z]+name=""/ I think you’ll find that’s one header on two lines, and mimeheader copes with it. Hope this helps, James. -- E-mail: james@ | “As for Nitel, the state telephone monopoly, the less aprilcottage.co.uk | said the better, which might well be the company’s | motto.” | -- The Economist, about Nigeria
Re: Image spam and failing rule
Gary Forrest wrote: > Hi All > > We are receiving the same image spam many times, random text within the > body. > The only common thing is a image attachment, with the filename in the > following format > > DSL1234.png > > I have made the following ' RAWBODY ' rule > > /dsl[0-9]{4}\.png/i > > This rule works if the text appears in the body, when testing with a > hand telnet to port 25, but fails in practice. > I think this is because the RAWBODY rule does not search the text of a > attachment. > > example text of a spam > > --=_NextPart_000_0075_01C9C5DF.A7950570 > Content-Type: image/png; >name="DSL6672.png" > Content-Transfer-Encoding: base64 > Content-ID: > Content-Disposition: inline > > Any ideas ? mimeheader LOCAL_DSL_ATTACHMENT Content-Type =~ /name="dsl[0-9]{4}\.png"/i (Untested.) Hope this helps, James. -- E-mail: james@ | top! to bottom from or backwards read not do I, post top aprilcottage.co.uk | not do Please | -- Jeff Vian
Re: Another bad kind of spams, for Pfizer knockoffs with image
Charles Gregory wrote: > I've been scoring the attachment name pattern with a 'full' test. > But this will only work until they figure ways to randomize the > attachment names The mimeheader plugin can do that and is much cheaper. The Abody Ahead part of the HTML seems to be a good spam sign, too. I can’t come up with a test (other than a full test) that will actually match all of that with 3.2.x: the rawbody rule matches one line at a time. A meta on both Abody and Ahead in the rawbody seems to do a pretty good job. To what extent should Windows Mail be counted as a variant of Outlook/Outlook Express? It’s not caught in __ANY_OUTLOOK_MUA: should it be? Hope this helps, James. -- E-mail: james@ | ... a sign carefully conveying in pictograms the fact aprilcottage.co.uk | that you should not leave wheelchairs on a certain river | bank as they would roll down the hill and the crocs would | eat the passenger.-- Skud
Re: sa-compile
I wrote: > meta __SEEK_LZH2GT 0 # Microsoft Office 2003 Pro > meta __SEEK_O1TQTY 0 # aving trouble viewing this e > meta __SEEK_QGCXIK 0 # lots of dots > which relies on the names being derived from the string. Benny Pedersen wrote: > the above __SEEK_* is random so you disable random seek :=) I’m sorry, Benny, I *really* don’t understand that sentence. What’s random? As Justin said, the names are not random, they’re hashes of the string. So the same string will always have the same name. Look at a spamassassin -D run sometime: you’re quite likely to see __SEEK_ and __SEEK_FRAUD_ rules being combined, because the same string crops up in both corpuses. The names of the rules will be identical apart from the _FRAUD. It is, of course, possible (but very unlikely) that a different string has the same hash as one I’ve disabled. That is a known problem with all hashes, and is inherently present in the naming scheme Justin as chosen. James. -- E-mail: james@ | top! to bottom from or backwards read not do I, post top aprilcottage.co.uk | not do Please | -- Jeff Vian
Re: sa-compile
Karsten Bräckelmann wrote: > By looking at the sub-rules' names I got the impression they are just > random. But maybe they actually are somehow based on the rule's content? > Never checked. Justin? Justin Mason replied: > yep, they're derived from a hash of the string. Is that documented, or could it be documented somewhere? I haven’t had to touch it recently, but I do have a small .cf of false positives: meta __SEEK_LZH2GT 0 # Microsoft Office 2003 Pro meta __SEEK_O1TQTY 0 # aving trouble viewing this e meta __SEEK_QGCXIK 0 # lots of dots which relies on the names being derived from the string. Thanks, James. -- E-mail: james@ | Humans are humans. No one grouping is all good or all aprilcottage.co.uk | bad, no one group has a monopoly on honour, decency or | truth. Anyone that claims to have one is a | dishonourable, indecent liar. -- Phil Launchbury
Re: Near capitable punishment for all capitals?
Mark wrote: > Eh, it's no biggie, really, I was just surprised it scores as high as, > say, being listed on DCC. But then again, who actually *does* write in all > caps, except a spammer? :) Quite a few of my employer’s correspondents: and not just in the subject! I know a number of my users who do a lot of bulk data entry have Caps Lock permanently on: much of our company data is in all-caps, and users are strongly encouraged to keep new data looking the same way. And some of them don’t see any reason to turn it off for a quick email. So site config has a much lower score for this rule. I’m not sure it’s worth putting these emails in a corpus: once you start cherrypicking emails to make a point, the automatic score generation is no longer statistically relevant. James. -- E-mail: james@ | “The duke had a mind that ticked like a clock and, like a aprilcottage.co.uk | clock, it regularly went cuckoo.” | -- Terry Pratchett, Wyrd Sisters
Re: quirks with bayes ?
I wrote (about the AWL): > In the absence of any sort of expire mechanism¹ (see, for example, > https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6059) one can do > a crude approximation by periodically resetting it. LuKreme wrote: > But why would you want to ever reset the AWL? To quote that bug report: at this stage I don’t think it’s worth having on by default, given the problems it causes for disk load and out-of-control bloated db files eating lots of disk space and memory, vs the marginal gains in accuracy it provides. let’s set it off by default. One can temporarily get the db files under control by deleting them and letting SpamAssassin recreate them from scratch. I continued: > Yes, you lose a lot of history. But it’s no worse than not having it at > all, which is what is due for 3.3.0. LuKreme wrote: > 3.3.0 has no AWL? Not by default: see http://svn.apache.org/viewvc/spamassassin/trunk/rules/v310.pre?r1=563527&r2=759790&pathrev=759790 James. -- E-mail: james@ | Q. "Why can't I print?" aprilcottage.co.uk | A. "Because you're not a printer." | -- Stephen Judd
Re: quirks with bayes ?
LuKreme wrote: > On 31-Mar-2009, at 11:13, Lucio Chiappetti wrote: >> And even resetting the AWL ... > > > Why would you reset the AWL? I can't see any circumstance on which that > would be a good idea. In the absence of any sort of expire mechanism¹ (see, for example, https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6059) one can do a crude approximation by periodically resetting it. Yes, you lose a lot of history. But it’s no worse than not having it at all, which is what is due for 3.3.0. James. ¹ Unless one uses a real SQL database. -- E-mail: james@ | We're beginning to see the results of bringing a rubber aprilcottage.co.uk | chicken to a nuclear battle. | -- “Whitefang”, on groklaw.net
Re: interesting flash attack in spam
John Hardin wrote: > No reason it shouldn't be. I'd suggest something like a rawbody match on > /]/i meta'd with HTML_MESSAGE should be worth a few (dozen) > points. This would seem to FP on Microsoft HTML generated by certain versions of Word. One example:
Re: Dealing with low scoring spam - tighter MTA integration
Andrzej Adam Filip wrote: > At "RCPT TO:" stage there are available: > * connecting client IP address (last mail hop) > so big part of DNSBL and DNSWL tests *CAN* be used > * envelope sender for SPF based tests > * envelope sender and envelope recipient for auto white/black listing > (producing some kind of grey-listing based for first attempt from > unknown reputation source) Are you thinking that it might be good to tie this in to the SpamAssassin AWL score? So a sender with an existing low AWL might be allowed through even if the sending host gets on one or two DNSBLs? And you’re missing the possibility of doing reverse DNS lookups, too. James. -- E-mail: james@ | A: Because people don’t normally read bottom to top. aprilcottage.co.uk | Q: Why is top-posting such a bad thing? | A: Top-posting. | Q: What is the most annoying thing in e-mail and usenet?
Re: does SBL/XBL have a plugin?
Ricardo Kleemann wrote: > Are SBL/XBL tests automatically enabled or is there a plugin I need to enable? SBL/XBL are tested as part of the SpamAssassin ZEN list for 3.2.x if you have network tests enabled. Hope this helps, James. -- E-mail: james@ | “Sir, they’ve taken Mr. Rimmer!” aprilcottage.co.uk | “Quick, let’s get out of here before they bring him | back!” | -- Kryten and Cat, ‘Red Dwarf’
Re: Spamassassin Upgrade
Kban35 wrote: > I just upgraded my SA to 3.2.5 and now when I look in my /etc/init.d I do not > see spamassassin listed anywhere in there. Which OS? How did you upgrade – cpan? yum? apt-get? From where did you get 3.2.5? Thanks, James. -- E-mail: james@ | “Drums must never stop. Very bad if drums stop.” aprilcottage.co.uk | “Why? What will happen if the drums ever stop?” | “Bass solo.”
Re: Googlegroups related spam
Karsten Bräckelmann wrote: > However, there are some highly abusive patterns sticking out. A google > URI with a ../ in the path? Sure! Score 2. :) Alternating alpha and > numbers might be worth another point. A question mark in a google groups > URI? Punish that. You can eliminate links to Usenet posts, too – make sure that the group name doesn’t contain a dot. James. -- E-mail: james@ | Q. "Why can't I print?" aprilcottage.co.uk | A. "Because you're not a printer." | -- Stephen Judd
Re: not seeing any advantage to sa-learn?
Ricardo Kleemann wrote: > I'm running spamc/spamd 3.2.4 on a Ubuntu 8.04 server, it's the > standard Ubuntu package. I have the default settings for Bayes (with > auto_learn) and I'm using a mysql backend for BayesStore. It’s worth noting that Bayes, by itself, is not allowed to condemn spam. Its maximum score, by default, is 3.5. That means that if you’ve got the standard required_score of 5, the spam will not be marked unless it also hits other rules with a combined score of at least 1.5 – even with perfect training. If you are confident in your training (all spam gets learnt as spam, and all non-spam gets learnt as non-spam) you may wish to try raising the BAYES_99 score. Note that, in theory, 1% of email hit by this rule is non-spam – but also that another theory says that the maths behind that figure are wrong. It can be difficult to stop “narrowcast” spammers that spam a limited mailing list (perhaps industry specific), using standard bulk-mailing tools, and who don’t get onto the various DNSBLs. Their spam looks like solicited bulk email, there’s practically nothing about the spam that distinguishes it from wanted email that isn’t specific to that particular spammer, and they stay out of the notice of antispam organisations. If you want to stop spammers like these, you may have to create specific rules (either sending domains, or sending IP address, or specific phrases in their emails – or best of all, all three, with each rule attracting a few points). The good news is there aren’t that many of these spammers, and they tend to try to look like a legitimate company, so they will usually send using a constant identity, probably use a constant domain, and they’ll either use their own servers or e-mail service providers. So static rules aimed at them can be very effective. Alternatively, you could post sightings on news.admin.net-abuse.sightings and see if Spamhaus pick up on them. That way, more people can be spared their spam. James. -- E-mail: james@ | Remember, half-measures can be very effective if all you aprilcottage.co.uk | deal with are half-wits.
Re: Are custom rules ignored if a white list entry is in playq
Gary Forrest - Netnorth wrote: > Question, are custom rules ignored if a white list entry has the same > email address ? Quick point – if you have short-circuiting turned on, then they may well be… James. -- E-mail: james@ | Which do you consider was the stronger swimmer, aprilcottage.co.uk | (a) The Spanish Armadillo, | (b) The Great Seal? | -- ‘1066 and All That’
Re: sought rules updates
LuKreme wrote: > I read the man page, where there is no mention of how to obtain this > number. In fact, I read many posts, and many webpages and have still not > found that information. I've seen the IDs in others posts, sure, but > where do they originate? > > Even searching the wiki (which just links to the previously linked > http://taint.org/2007/08/15/004348a.html )is merely a "here's the > random-looking digits you pass to --gpgkey" and not a "here's what the > --gpgkey is, means, and how it's generated". These numbers are a way of identifying those keys. They are a cryptographically strong hash: the idea is that it’s easy for users to use numbers that short to confirm that the key they’ve received is the key they thought they were receiving, and very difficult for any attacker to generate another key with the same hash. > Why doesn't sa-learn simply trust the keys that are added to its > keychain without this extra (and at least for me, confusing) step? I'm > starting to think the simplest way to do this is just ignore the gpg > flags entirely and use --nogpg. What's the downside to this (other than > the obvious DNS hijacking to point the URL to some spammer site with bad > data which seems a remote enough chance to ignore). That’s your choice. Hope this helps, James. -- E-mail: james@ | “Right lads, we’ve got 45 minutes to score 37 goals. aprilcottage.co.uk | No problem with that -- the other team just did.”
Re: skew the AWL on spam report
Matt Kettler wrote: > If a spammer is using the same sending address over and over again, > blacklist them entirely. > > That said, I've never seen a spammer re-use the same address twice. Doesn’t mean it doesn’t happen – only that you’re not on any “narrowcast” lists (e.g. “Email 200,000 British business addresses!”) A number of otherwise-legitimate companies will send to those lists. SpamAssassin’s fixed rules will do very little against these spammers – they’re using the same e-mail engines, the same language, and the same sort of Internet connections as other companies with opt-in mailing lists. For some addresses, they represent the vast majority of spam that gets past SpamAssassin. Fortunately, they often do use and re-use the same IP addresses and the same domains, and can be blocked that way. http://groups.google.com/group/news.admin.net-abuse.email/browse_thread/thread/78ac67e56ce35d90 lists a good (bad?) selection of them… James. -- E-mail: james@ | Remember, half-measures can be very effective if all you aprilcottage.co.uk | deal with are half-wits.
Re: Help with bayes
Kai Schaetzl wrote: > well, but how? By auto-learning? In that case you are just multiplying your > problem. It seems a lot of spam gets miscategorized as ham. Auto-learning > that spam as ham means enforcing this miscategorization and that's what you > see as a result. When SpamAssassin decides whether or not to learn a message, it does not take Bayes scores into account. So if you have a message that only hits BAYES_00 (with a score of either -2.3 or -2.6) and another rule with a score of 0.2, that message will not be learnt (unless you change the limits), because 0.2 is greater than 0.1 (the limit). Hope this helps, James. -- E-mail: james@ | ‘Sir, they’ve taken Mr. Rimmer!’ aprilcottage.co.uk | ‘Quick, let’s get out of here before they bring him | back!’ | -- Kryten and Cat, ‘Red Dwarf’
Re: Problem with learning bayes
Thomas Zastrow wrote: > I have a new server where I installed Spamassassin. Next, I took a > maildir with a lot of spam and learned the filter: > > sa-learn --spam --showdots /path/to/maildir Did you learn some non-spam, too? Bayes needs at least 200 of each before it will work. Hope this helps, James. -- E-mail: james@ | ...a probably apocryphal bilingual sign in darkest North aprilcottage.co.uk | Wales. In English it says "70mph" and in Welsh "slow | down, sharp bend ahead". | -- Peter Corlett
SBL false positives?
mouss wrote: > in which sublist? xbl, sbl or pbl? and when you say "a lot", how many? > can you show an example of an IP that you consider as an FP? Well, since you asked… I’m not the Original Poster, but I consider most of http://www.spamhaus.org/sbl/sbl.lasso?query=SBL60174 to be a FP *when used with SpamAssassin rules*. This is a /19 range of VSNL dynamic addresses, which had (correctly) been put on the PBL. I understand that many smaller Indian companies can only get a dynamic IP, want to run an internal mail server (often Exchange), and forget to relay outgoing e-mail through an appropriate external mailserver. At least one VSNL customer ran into trouble sending e-mail due to the PBL listing, and rather than using a suitable relay, systematically (and repeatedly) removed the entire /19 from the PBL! Spamhaus then stuck the whole range into the SBL. This is fine when the SBL is merely used against the last external relay, but SpamAssassin will test *all* IP addresses in the headers against the SBL. So non-spamming Indian companies get hit even if they relay through a good mailserver. I consider it a stretch putting this range under the SBL: the “Policy & Listing Criteria” says that the range “appear[s] to Spamhaus to be under the control of, or made available for the use of, senders of Unsolicited Bulk Email (“spammers”).” This doesn’t seem to be the reason in this case: there doesn’t seem to be any evidence that the individuals who removed the range from the PBL intended to send unsolicited bulk e-mail. It’s abuse of the Spamhaus web site, not directly abuse of e-mail, and would better be handled by a PBL range which can’t be edited through the website. I wrote to Spamhaus querying the listing, but have heard nothing (probably not surprisingly, since I’m not VSNL. Thank goodness!) I haven’t raised a SpamAssassin bug, since I don’t think it *is* a SpamAssassin bug. James. -- E-mail: james@ | 'Short for "Sic Transit Gloria Humanorum", which is Latin aprilcottage.co.uk | for "There goes the neighbourhood!"' | -- Menno Willemse
Re: CPAN Install Fails
Bob Cohen wrote: > I'm running Fedora v9. All of the prerequisite and optional modules > installed with no problem. Suggestions? Well, there’s always “install it with yum”: yum install spamassassin Hope this helps, James. -- E-mail: james@ | “It has taken 24 years to get the Reichstag wrapped. aprilcottage.co.uk | Chancellor Kohl said it would only be wrapped over his | dead body, so sensing an opportunity the Bundestag | outvoted him.” -- The Guardian
Re: Honeypot Email Addresses
jdow wrote: > I believe you could "blacklist_from". That would train SpamAssassin's > Bayes filter - Or not. Both USER_IN_BLACKLIST and USER_IN_BLACKLIST_TO have tflags set to userconf noautolearn (in current 3.2.5 rules), which means that SpamAssassin will ignore their scores when deciding whether to autolearn. > if you use global (shudder) Bayes. Um. There is research ( http://www.ceas.cc/2007/papers/paper-74.pdf ) suggesting that “globally-trained text classification easily outperforms personally-trained classification under realistic settings” (the text-classifier they used was Bayes). James. -- E-mail: james@ | Ah yes. Thingie and thingamagig, and I'll throw in aprilcottage.co.uk | whatchamacallit too. Because in tech support, "button" is | sometimes a bit too technical for the average caller. | -- "mr_scoot"
Re: Fwd: Attn: webmail Subscriber
John Hardin wrote: > Is there any reason the base rules should _not_ contain a > whitelist_from_spf or whitelist_from_rcvd for the list? Larry Nedry wrote: > Would you really want to auto-train your bayes with mail from this list? The whitelist rules are ignored when SpamAssassin decides whether to auto-train a message. Mail::SpamAssassin::Plugin::AutoLearnThreshold says: Note that certain tests are ignored when determining whether a message should be trained upon: · rules with tflags set to ’learn’ (the Bayesian rules) · rules with tflags set to ’userconf’ (user configuration) · rules with tflags set to ’noautolearn’ All the WHITELIST rules have tflags of ‘userconf nice noautolearn’ (at least). Hope this helps, James. -- E-mail: james@ | Just remember: 1 virus 3 viriii aprilcottage.co.uk |2 virii 4 viriv | -- Matt S Trout
Re: Moving ham/spam from Exchange folders to sa-learn?
Henry Kwan wrote: > Thanks for the script but I don't think I can use it as Exchange2K7 > has dropped IMAP support for public folders. Or least this blog post > from MSFT seems to indicate: > > http://msexchangeteam.com/archive/2006/02/20/419994.aspx I don't have any Exchange 2007 experience, but at least on 2003 "public folder" and "normal mailbox into which everyone can copy e-mail and to which no-one can send e-mail" are two separate concepts. And you can use IMAP to read the contents of the latter. Unfortunately, setting that up involves configuring Outlook on each client PC, so depending on the number of users, this may not be practical. Hope this helps, James. -- E-mail: james@ | Never ask, "Oh, why were things so much better in the old aprilcottage.co.uk | days?" It's not an intelligent question. | -- Ecclesiastes 7 v. 10
Re: Support for FC6, F7, F8?
Eric Wood wrote: > Is there a website or repository where I can yum upgrade to the latest > spamassassin from, say, a FC6 system? F7 and F8 (and F9) have version 3.2.4 in the standard Fedora updates repo: just sudo yum update Note that Fedora 7 will go out of support in a month or so, and FC6 is no longer supported: this means you won't get security updates, so these systems should not be exposed to known-malicious traffic (like spam...) Hope this helps, James. -- E-mail: james@ | The Inquirer was set up by Mike Magee (ticker: DODGY), aprilcottage.co.uk | who co-founded well-known IT site The Register seven | years ago after countless years editing and managing all | manner of things which could be Aardvark Today and Fish | Farming Monthly but weren't. -- The Inquirer, 2001
Re: Bayesiam Learning Paths for Spamassassin
> I have SA 3.17 running with amavisd-new, dovecot and Postfix 2.4.3 and > Clama/v on freebsd 6.1 > > I am trying to"teach" sa using the following > > sa-learn /var/mail/vmail/example.com/user/.INBOX.spam/cur/ > > this is a maildir I have put around 175 spam messages in.. find cur new -type f -mtime -60 | xargs sa-learn --showdots --spam works for me on Fedora 8 with SA 3.2.4. Hope this helps, James. -- E-mail: james@ | Users are not like normal particles; they wave when you aprilcottage.co.uk | observe them. | -- Andrew Dalgleish
Re: Canadian Spam - tired of writing rules!
Michael Hutchinson wrote: > There's been a rise in Canadian Pharmaceutical Spam lately. This spam is > quite basic, generally only including some text and a link. The link is > always changing so we can't score against that. > > About the only other thing it scores on is the FORGED_HOTMAIL_RCVD rule, > which doesn't have a big enough score to push the Spam over the 5.0 > points threshold. > > Does anyone have some effective rules / rulesets / update channels that > would help to eliminate this stuff? I've been writing rules against it > for the past few months. We've just employed our 61st rule against this > type of Spam. Admittedly a lot of those are just basic phrase matching, > and aren't complicated rules - but then the Spam changes enough each > cycle, that it avoids complicated rules that I might write. I find that a meta rule where the body contains "http://"; and has no paragraphs above 100 to 140 characters¹ will give a few false positives, so you can't score it too highly, but it catches a *lot* of spam. The ham that matches this rule tends to be surprisingly rare, doesn't score highly on anything else, and is from regular correspondents (so the AWL helps). If any of the SA developers are reading, I'd love to see how rules like this play in the sandbox... James. ¹ I'd like to do it on body length, but I can't find a suitable way of doing this. body /.{100}/ will match on any e-mail which *has* got a paragraph of > 99 characters... -- E-mail: james@ | The opinions expressed herein are not necessarily those aprilcottage.co.uk | of my employer, are not necessarily mine, and in fact are | probably not necessary at all...
Re: Logging
Skip wrote: > I am on a linux, shared hosting site (Bluehost.com). I don't > know how I can get it into the startup script for that box, and I only have > access to my own home directory. That may be a showstopper right there. > I'll have no way of knowing when they reboot the box. Earlier, Matt Kettler wrote: > Running from cron is only for things you want to run > at regular intervals. It is not a valid way for starting daemons (ie: > something you want to run once and leave running) Actually, something like this (from man 5 crontab on Fedora 8) might be relevant: These special time specification "nicknames" are supported, which replace the 5 initial time and date fields, and are prefixed by the [EMAIL PROTECTED] character: @reboot:Run once, at startup. Skip may have permissions to edit his own crontab (with the crontab command) and set a daemon going at reboot time. There may be CPU time quota constraints, of course. Hope this helps, James. -- E-mail: james@ |"Just for once, I wish we would encounter an alien aprilcottage.co.uk | menace that wasn't immune to bullets..." | -- The Brigadier, 'Doctor Who'
Re: are the NORMAL_HTTP_TO_IP scores still valid?
Matt Kettler wrote: > Yes. In fact, IP based URLs occur more commonly in nonspam than spam. Chip M. wrote: > Matt, yes this is correct, however in this particular case "nonspam" is > perhaps a bit broad. It's been my experience that these almost always > occur in mass marketing ham, not person-to-person ham. In my (limited) experience, nonspam IP-based URLs almost always have paths after the IP address, whereas a *lot* of spam just points to the IP address. Does this match anyone else's experience? James. -- E-mail: james@ | How about an Australian-language version? aprilcottage.co.uk | 'Your program just attempted an illegal instruction. | No worries, mate.' | -- Paul Tomblin
Re: Whitelist_from_rcvd not working
Dan Barker wrote: > My whitelist_from_rcvd tags don't hit. I believe this has been happening > since my upgrade from 3.1.7 to 3.2.3. > Just in case there is something [else] I've done silly, my local.cf is at > http://www.visioncomm.net/temp/080104Local.txt): Here's what may be a thoroughly stupid question -- what does your local network look like? $ host mail.visioncomm.net mail.visioncomm.net has address 74.254.46.133 Is that server behind a NAT router, or does it actually have that IP address configured? If so, what happens if you add 74.254.46.133 to local_networks and trusted_networks? Hope this helps, James. -- E-mail: james@ | "Right lads, we've got 45 minutes to score 37 goals. aprilcottage.co.uk | No problem with that -- the other team just did."
Re: I have a probleme with my content analysis
lochness wrote: > > I'm running on windows and i'm using one software call "NoSpamtoday" that > software is based on spamassassin I modify local.cf file but in my test I > have this message bellow I put required_hits on 5 but in the message I have > 0 so how can I apply my config NoSpamToday puts its own value of required_hits in the local.cf file -- your setting might be above that, and the last value "wins". Use the admin tool, or look for all values of required_hits in the local.cf file. Hope this helps, James. -- E-mail: james@ | Machine. Unexpectedly, I’d invented a time aprilcottage.co.uk | -- Alan Moore, "Very Short Story" |http://wired.com/wired/archive/14.11/sixwords.html
Outlook-style message-IDs?
Hello, all you happy people, I have in my possession a legitimate e-mail with Message-ID: <[EMAIL PROTECTED]> but no sign that it comes from a Microsoft product. As far as I can see, this one header is causing it to get 2.8 RATWARE_OUTLOOK_NONAME Bulk email fingerprint (Outlook no name) found 1.7 MSGID_DOLLARS Message-Id has pattern used in spam 1.9 RATWARE_MS_HASHBulk email fingerprint (msgid ms hash) found for a total of 6.4 points. It appears that these three rules are all scoring on effectively the same thing. Is this intentional? (For the time being, I intend to add negative rules to score these down when all three are present). Thanks, James. -- E-mail: james@ | actor: (n) a piece of scenery that has the audacity to aprilcottage.co.uk | move once lit.