Re: New image spam

2009-05-24 Thread Cedric Knight
Jeremy Morton wrote:
> Recently I've been receiving some new image spams, subtly different
> from the one this rule is designed to mark:
> http://markmail.org/message/zio642mxs5p42kxa
>
> ... in that it actually does have a blank text MIME part.
>
> Here's an example of one such spam:
> http://rafb.net/p/ppyJAS34.html

Actually, there only seems to be one MIME part in what you have pasted,
so amending John's rule and the inbuilt one as follows should do it:

header __CTYPE_MULTIPART_MXD Content-Type =~ /multipart\//i

because I've also recall them coming in as multipart/related.  The
inbuilt rule only looked for "image/jpeg", so:

mimeheader __ANY_IMAGE_ATTACHContent-Type =~ /image\/(?:gif|jpe?g|png)/

I don't think image/jpg is a standard MIME type, so we could also create:

mimeheader MIME_IMAGE_JPG   Content-Type =~ /image\/jpg/
describe MIME_IMAGE_JPG contains wrong MIME type image\/jpg
score MIME_IMAGE_JPG1.0

Also, it's not really a JPEG but a PNG, and one with a consistent header
over the last few weeks:

full PILL_IMAGE_PNG_HEAD
/^U29mdHdhcmUAQWRvYmUgSW1hZ2VSZWFkeXHJZTwAAADAUExU/m
describe PILL_IMAGE_PNG_HEAD 2nd line of base64 of autogenerated PNG
score PILL_IMAGE_PNG_HEAD1.5

I have seen spam come as you say with a minimal body part, as in:

rawbody PILL_IMAGE_HTML  /\s\s<\/body>/s
describe PILL_IMAGE_HTML Very simple HTML part as in image-only
May 09
score PILL_IMAGE_HTML0.1

I think image with not text in the body part at all is pretty rare, but
I might do something like that if I was sending a picture to myself.

Meta rules based on some combination of the above could probably catch
it all at the moment.  Your sample hit DCC_CHECK and BAYES_80 for me,
but not the iXhash rules.

Hope these are of some use.

CK





Re: more mainsleeze spam

2009-06-19 Thread Cedric Knight
Michael Scheidell wrote:
> Main sleaze:  as in DKIM SIGNED, NOT FORGED, SPF RECORDS MATCH, some
> with and some without knowledge and adherence to the US Federal CAN-SPAM
> laws.

> Maybe I am stuck in 1994 when (most) people respected the net.  Maybe I
> react badly when one of these main-sleaze emails makes it past our
> filters, but the good news is that they help us identify third party
> email marketing companies that aren't careful about their clients.

I see similar things, and it annoys me quite a bit too.  In Europe, the
legal situation is somewhat different, as the Privacy & Electronic
Communications Regulations (PECR) outlaw sending unsolicited email to
individuals.  As a result, what I tend to see and get complaints about
is email from valid domains with proper rDNS and SPF which either:

(a) advertise generic scams to consumers such as draws for shopping
vouchers in UK stores or and recently loans and insurance comparison,
which come from the USA with a superficial compliance with "CANSPAM".
Notably, the postal address identifying the organisation (either in the
US or an accommodation address/mailbox supposedly in the UK such as "56
Gloucester Road #215") is presented as an image.  The servers are rented
from US-based companies.  I have some meta rules based on technical
details that help quarantine most of the crap.

(b) are from UK-based registered companies and ostensibly directed to
other businesses in the UK.  Many are for worthless sales training
webinars - I don't know if they teach more people how to send lots of
spam email.  An anonymous benefactor posts a useful monthly list of
spammers and their hosts called "UK spammers activity report June 2009"
on news:news.admin.net-abuse.email, usually leading with the notoriously
annoying and stupid Communicado/Bitesize/Britain in Business.  The list
can be used to block the ranges (often /24) used by the spammers.

What is notable from that list is that most IP addresses aren't in any
BL, except sometimes APEWS and BRBL, probably because BLs have few
spamtrap addresses that the spammer would want to add - there may be
some human intervention to verify that target domains are real users
(although of course you can't really send junk in bulk unless it is
automated.)

I guess these aren't quite as "vertical" as you describe, but there is
often some attempt at targeting the spam - sometimes it's clear the
spammer has included all email addresses from a web page that mentions,
say, a particular town or industry.  My understanding of mainsleaze is
that it comes from companies you might want to buy something from until
you get their spam - what I'm describing isn't quite like that and often
operates from a PO Box/accommodation address.  There are also, as you
mention, often third-party mailers that may still even be in Habeas or
similar cleanlists, although they increasingly become infiltrated, then
dominated, by clients who abuse the network.

Anyway, here are some suggestions to deal with mainsleaze:

(1) Report to SpamCop and DCC/Pyzor.

(2) Locate the upstream colocation provider (or mailing list provider)
and ask them to enforce their AUP and the maximum contractual penalty.
One or two hosts unfortunately are so negligent that it might be
necessary to go to the backbone provider (not that I've ever done that).

(3) More people should consider legal action based on PECR and improper
processing of personal data without consent.  There have been many cases
here in the UK where a few hundred pounds sterling have been awarded by
a small claims court, but the case should be properly prepared - e.g.
http://www.steveroot.co.uk/2008/02/spam-wars-the-s.html.  I also wonder
why spam, being (often explicitly) unauthorised use of a receiving
server, cannot be prosecuted under anti-cracker legislation.

(4) Contact any postal mailbox provider and again ask them to enforce
ToS and penalty.

(5) Possibly most effective?  If the spam contains a free or cheap sales
number, ring, ask to speak to the director (the name is usually a matter
of public record), and ask why they are wasting people's time (and
bandwidth, and CPU) with UBE.  If they offer to unsubscribe your
address, try to explain the point is that it's an abuse of the network
and they shouldn't have sent anything in the first place: if everyone
thought it was acceptable to send opt-out spam, email would become
unusable.  The objective is simply to get an apology, or some indication
that they are not complete moral retards.

In short, I think more anti-spam activists are needed.

CK


Re: A difficult one to weed out?

2009-06-21 Thread Cedric Knight
Jeremy Morton wrote:
> OK, so I just got one of those www medsXX com spams, and even though it
> hit my rule and got 2.0 added to it, it still didn't even get over 3
> points.  Looks like it was sent from quite a legit host.  What rules do
> other people get matching for this e-mail?
> 
> http://pastebin.com/m3b9629b6

The IP and hashes scores 21.8 for me.

besides the standard DCC_CHECK, I'm getting hits on the following
non-standard RBLs:

190.244.172.161 listed in hostkarma.junkemailfilter.com
190.244.172.161 listed in uceprotect-level2.dnsbl
190.244.172.161 listed in bb.barracudacentral.org
190.244.172.161 listed in ix.dnsbl.manitu.net
iXhash found @ ix.dnsbl.manitu.net

Maybe you had a DNS problem when it went through, or you were unlucky
enough to be first on the spammer's list.

Here's a (somewhat unreadable) rule I wrote that doesn't have a great
spam ratio on its own, but can be useful in botnet meta rules:

header NOMATCH_NICK_FROMFrom =~
/^"?(([A-Z])[a-z][a-z])\w*(?:\s(?:(([A-Z])[a-z][a-z])\w*\s|([A-Z])\.?\s)?(([A-Z])[a-z][a-z])\w*)?"?\s*<(?![a-zA-Z1-9\.\-]*(?:\1|\3|\6))(?!.?\2(?:\4|\5)?.?\7).*?\@(?!.?\2-?(?:\4)?\-?\7)(?![a-zA-Z1-9\.\-]*(?:\1|\3|\6))(?!postmaster\@)(?!mailer-daemon\@)/i
describe NOMATCH_NICK_FROM  From address with no part of name
score NOMATCH_NICK_FROM 1.0

The idea is to catch random real names attached to random valid email
addresses.

HTH

CK


Re: [NEW SPAM FLOOD] www.shopXX.net

2009-06-22 Thread Cedric Knight
McDonald, Dan wrote:
>>> I'm considering a low-scoring rule like:
>>> body AE_MEDS37 
>>> /\(\s?w{2,4}\s[:alpha:]{4}\d{1,4}\s(?:net|com|org)\s?\)/
>>> describe AE_MEDS37  rule to catch the next wave of spaced domains
>>> scoreAE_MEDS37  1.0
>
> oops.  Doesn't compile.  should be:
> body   AE_MEDS37 /\(\s?w{2,4}\s[[:alpha:]]{4}\d{1,4}\s(?:net|com|org)\s?\)/

Maybe we can't anticipate which way the spammers are going to go.  The
next wave actually turned out to use punctuation but keep the same
style of domain name.  A straightforward way of catching several possible
ways of presenting these is:

body CK_MEDS50 
/\sw\s*w{1,3}\s*(?:[.,]\s*)?(?:meds|shop)\d{1,4}\s*(?:[.,]\s*)?(?:net|c\s?o\s?m|org)\b/i
describe CK_MEDS50  gappy pharma website address in text
score CK_MEDS50 3.0

Spam with stops (=periods) would also hit a rule looking for obfuscation like:

body EVADE_URI2B /\b(?:H\s*T\s*T\s*P\s*:(?

Re: [NEW SPAM FLOOD] www.shopXX.net

2009-06-22 Thread Cedric Knight
Cedric Knight wrote:
> full NONLINK_SHORT  
> /^Content-Type:\s*text([^\n]+\n){0,30}\n.{0,300}\b(?:H\s*T\s*T\s*P\s*[:;](??,?\s*)+$/i
describe __TO_NOREAL1   Single recipient, no real name


CK


Re: OT: Website protection

2009-07-11 Thread Cedric Knight
schmero...@gmail.com wrote:
>> One of our client's websites gets hacked frequently - 1x per month -
>> usually with some kind of phishing scam.
>>
>> I understand their first line of defense is to make sure security is
>> tight and systems are up to date, however, it seems to me that there
>> must be some scanning utility that would check their site for
>> unauthorized pages via a search for domain names.
>>
>> So, if our client was google, the utility would search all files on the
>> site looking for domains. If it found microsoft.com within one of the
>> pages and email would be sent to the administrator who could delete the
>> page and look for other evidence of being hacked or add microsoft.com to
>> the whitelist.
>>
>> Any ideas where to look for such a beast &/or a mailing list that deals
>> with this type of issue?

Indeed Google "safe browsing" scans pages it indexes looking for IFRAME
exploits, Gumblar etc.

Phishing pages are harder to recognise than links to malware, meaning
Google has to largely rely on us reporting 'web forgeries'.  From the
evidence that Google doesn't automatically list suspicious pages I
assume that no such utility yet exists.

(Also note that Gumblar and other malware use pretty tedious JavaScript
obfuscation techniques.  So you might want to wget the site or access it
through a browser, rather than just grep through it for suspicious strings.)

I don't know who is working on phishing detection tools: maybe contact
APWG (antiphishing.org) or your local OWASP chapter.

Phishing scams are in my experience often uploaded through insecure
CMSes such as Joomla modules (you can see this when the uri contains
things like 'mambots/content' listed in the rules below).

Despite that, have you done the obvious and checked FTP logs?  In many
cases the website owner or designer may have a keylogger or agent
stealing FTP credentials which are then circulated to a botnet to deface
pages:  http://news.zdnet.com/2100-9595_22-306268.html  I've had to ask
people to run at least two up-to-date spyware scans on the Windows PC
they upload content from before the culprit is found.

Also make sure your correct abuse address is listed at abuse.net (and on
WHOIS if appropriate), so e.g. SpamCop reports about spamvertised sites
come to you without delay.

Terry Carmen wrote:
> If you're getting hacked once a month, I suspect the server contains a
> well-known vulnerability that needs to be located and repaired.
> 
> I'd recommend making all content changes on a *really* secure server, then
> replicating the entire web-root to the public web server with rsync, with the
> --delete option enabled.
> 
> Rsync will overwrite any of the "damaged" content with a fresh copy from the
> secure server and remove any "extras", making any unauthorized content changes
> vanish.

I like that suggestion - provided you're not expecting general visitors
to contribute content, you could rsync every 20 mins or so and by the
time the uri is spammed out the malicious content is gone.  The back-end
would be on a firewalled server that is not public-facing.  However, it
doesn't necessarily help if the FTP/SSH/CMS password is weak or
(particularly) has been compromised by malware on a desktop.

These strings in URIs/filenames have seemed to me to be associated with
phishing:

uri PHISH_CGI
/(\/cgi(?!\.ebay\.)|Login(?:Member)?\.do|mambo\/+components|mambots\/content\/|\/smilies|\/uploads|\/\?siteid=|\/aspnet_client|\/(?:includes|_mem_bin|components|classes)\/)/
describe PHISH_CGI  Common phishing destination
score PHISH_CGI 0.05

uri PHISH_CGI2
/\/(?:uploads|files|includes|components|js|mambots|smilies|images)\/.*(?:\.co\.uk|\.com\b|Log[a-z\.0-9-]+\.(?:php|htm))/i
describe PHISH_CGI2 Looks like exploit with "Logon" file
score PHISH_CGI20.2

I hope some of this helps.

CK



Re: OT: Website protection

2009-07-11 Thread Cedric Knight
schmero...@gmail.com wrote:
>>> So, if our client was google, the utility would search all files on the
>>> site looking for domains. If it found microsoft.com within one of the
>>> pages and email would be sent to the administrator who could delete the
>>> page and look for other evidence of being hacked or add microsoft.com to
>>> the whitelist.
>>>
>>> Any ideas where to look for such a beast &/or a mailing list that deals
>>> with this type of issue?

Forgot to mention http://www.unmaskparasites.com/

CK


Re: [NEW SPAM FLOOD] www.shopXX.net

2009-07-13 Thread Cedric Knight
Chris Owen wrote:
> On Jul 13, 2009, at 2:55 PM, Charles Gregory wrote:
> 
 To answer your next post, I don't use '\b' because the next 'trick'
 coming
 will likely be something looking like Xwww herenn comX...  :)
>>> At that point it can be dealt with.
> 
>> Well, they're getting close. I'm seeing non-alpha non-blank crud
>> cozied up to the front of the 'www' now :)

Not forgetting underscores are not word boundaries.  My alternative
rules are badly written but are still hitting with the \b:

rawbody NONLINK_SHORT
/^.{0,500}\b(?:H\s*T\s*T\s*P\s*[:;](? 
> 
> Which of course means we've long since passed the point where any of
> these are going to do the spammers any good.  That's the frustrating part.

You're making the common assumption that spammers send UCE because it
makes them money.  In fact they do it because they are obnoxious
imbeciles who want to annoy people and waste as much time (human and
CPU) as possible.  I don't think it really matters to them that what
they are sending is incomprehensible noise, because noise is their message.

Cheers

CK


Re: forward mails as spam

2009-07-13 Thread Cedric Knight
neroxyr wrote:
> Hope this is the log you wanted
>
> http://www.nabble.com/file/p24471425/block.jpg

It's not possible to see from this whether the first log line that you
have highlighted is necessarily related to the second and third
highlights (the message IDs are different), but I'll assume they are.

What is clear is that USER_IN_BLACKLIST caused 100 of the 103 point
score.  Do you perhaps have
   blacklist_from brennero..e etc
in your local.cf; or some blacklist_from with a * wildcard ?

CK



Re: forward mails as spam

2009-07-14 Thread Cedric Knight
neroxyr wrote:

> I have configured our domain mail to forward messages to a gmail account.
> I did a test sending an email from my gmail account to my domain mail; I
> receive the message sent from my gmail account, but immediately this message
> has to be sent to gmail.

> Mail Delivery Subsystem   13 de julio de 2009
> 17:08
> Para: t...@gmail.com
> The original message was received at Tue, 14 Jul 2009 03:08:52 +0500 (GMT)
> from avx [192.188.xx.xx]
> 
>   - The following addresses had permanent fatal errors -
> t...@gmail.com
>(reason: 550 5.7.1 Blocked by SpamAssassin)
>(expanded from: )

I observe you are sending backscatter.  If your server were set up
better, it would reject spam during the SMTP session (or discard it
later).  The bounce (NDN) would therefore come from the mailer-daemon at
gmail, not from mydomain.com as is shown above.

I use postfix rather than sendmail, but suspect your sendmail milter is
 wrongly configured, or just plain buggy.  What is the name of the
milter you are using?  I understand www.mimedefang.org is more standard
and shouldn't produce backscatter if correctly configured.  And Gmail's
own milter is supposedly quite good, so you might just want to run spamc
from procmail.

CK


Re: Speeding up SC Ham

2009-08-04 Thread Cedric Knight
Chris wrote:
> I decided last week to finally give the short circuit plug-in a try to
> see how much it sped up detection. Its working great on spam:

> but not so well with ham:
>
> Aug  4 14:22:48 localhost spamd[1023]: spamd: result: . -10 -
>
AWL,BAYES_00,DCC_CHECK,DK_POLICY_TESTING,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_HI,RCVD_IN_JMF_W,RDNS_NONE,SPF_PASS,UNPARSEABLE_RELAY
scantime=23.1,size=2682,user=chris
>
> the rules I'm using are straight out of the WiKi:

http://wiki.apache.org/spamassassin/ShortcircuitingRuleset , I presume.

>
> Are there any others I can add to the ham rule to speed things up? For
> instance can BAYES_00 be added or would that tend to cause FN's?

A handful, depending how well Bayes has been trained.  BAYES_00 is the
example given in the 60_shortcircuit.cf file in the rules directory,
which you probably want to read.

You could add RCVD_IN_DNSWL_HI which is in your example.  _MED I'd
expect some FNs from.

Have you set up whitelist_from_dkim and whitelist_from_spf rules ?  The
latter could also be used to shortcircuit your example.

(Any authenticated mail going through the same installation could be
shortcircuited with ALL_TRUSTED.  You can also then add some
trusted_networks.)

Yet another possibility is including some codeword above the cut in your
signature, so that replies are detected by a shortcircuited ham rule.

For general incoming mail, there may not be that much shortcircuiting
that can be done - the rules have to be run to decide if something is
spam.  However, I'd quite like to see a shortcircuit plugin that stops
processing more rules as soon as the running total gets to, say, 12 points.

>Can
> another rule be added for spam that contains entries like:

>
> SAGREY, RCVD_IN_BRBL_RELAY, URIBL_BLACK and so forth with my highest
> hitting rules. Would it be written similiar to the SC_NET_HAM rule?

It can, give all those and the corresponding meta rule a priority of say
-400, give the meta rule a score of 20, and shortcircuit SC_NET_SPAM on.
 But I'd guess you'd get FPs: more perhaps with those rules than with
SpamCop (on first trusted relay) and URIBL_(whatever)_SURBL.

HTH

CK


Re: Again AWL confusion

2009-08-05 Thread Cedric Knight
a...@exys.org wrote:
> exactly. The point is that scores below 2 are never spam, so i avoid
> greylisting. Thats my whitelist (you usually need for greylisting)  at
> the same time, since i whitelist some hosts in SA.

Interesting set-up, although I don't think it would be suitable for a
high-volume server.  So what do you use to do this?  exim-sa and what
greylisting software?

> above 2. The njabl hit would have been enough to hit that. It didn't
> score above 10, because that would have been rejected at smtp time.
> 
> My guess is that it scored 2 on the first try, then later it would have
> scored above 10 due to surbl listings, but awl kicks in and lowers the
> score thinking the greylisted mail was an independent message.

With most greylisting systems, the temporary reject is before the data
section (which helps save bandwidth), so it's hard to know if it's two
attempts to deliver the same message, or two independent messages.  Not
so in your case, however.

What is auto_whitelist_factor set at?

> 
>>> And where else did greylisted mail appear in the log? For the
>>> mail to be logged as rejected by a greylister *after* its been
>>> through SA it must also have been inspected by AWL and therefore it did
>>> affect the AWL database.
>>   
> oh right, i could look at the SA log, but i already know it passed SA 3
> times.

Worth doing.

>> the question is, why it scored hammy?  aep, how did it score before
>> greylisting? Are you sure you do not have bug in your greylisting code?
>>   
> see above. i'm pretty sure the "bug" is passing the same message to SA
> multiple times.

Well, by definition that isn't an SA bug.  Or are you suggesting AWL
should check to see if the same Message-ID has been seen before, and if
it has, not score or learn?  That would be an extra database lookup, and
it would mean AWL would also be disabled for valid mail that had been
delayed by greylisting (maybe OK, because it presumably hasn't been seen
before).

Bayes *shouldn't* allow learning of the same message more than once
(it's doesn't if you train it manually), but maybe autolearn doesn't
update bayes_seen (??).

I think the simplest solution for your config is just:
use_auto_whitelist 0
bayes_auto_learn 0

Setting 'tflags URIBL_BLACK noautolearn' etc. on the remote tests would
probably mean the AWL decrease would be less, because AWL is then just
smoothing out the scores from the local tests.  None of this sounds very
efficient with minimising DNS lookups and reducing carbon footprints...

CK


0.001 rules - why?

2009-08-09 Thread Cedric Knight
I'm using Bayes and network tests, and have found a few rules with a
good ratio of ham to spam, but that score only 0.001 in the default rules.

In some cases, it is presumably because they overlap with other rules or
are detected by remote tests, and so would score double because a
particular feature in the email.  But in other cases, I wonder if
they've been pegged at that low value for some other reason and are
actually pretty useful and could go up to 1.0 or so?

Here are the ones I'm talking about:

FH_HELO_EQ_D_D_D_D

Overlaps with HELO_DYNAMIC_IPADDR2 and TVD_RCVD_IP, but if you redefine
it as

header   FH_HELO_EQ_D_D_D_DX-Spam-Relays-Untrusted =~ /^[^\]]+
helo=(?!(?:[a-z]\S*)?\d+[^\d\s]\d+[^\d\s]\d+[^\d\s]\d+[^\d\s][^\.]*\.\S+\.\S+[^\]]+
auth= )[^ ]{0,15}\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}/

it clearly hits a good number that are missed by the other rules, with a
similar ratio.  Also a rule like
header   HELO_MISC_IPX-Spam-Relays-Untrusted =~ /^[^\]]+
helo=[^a-z ]\S{0,30}(?:\d{1,3}[^\d]){4}[^\]]+ auth= /
hits a lot of spam otherwise missed, although the ratio is not quite so
good.

FH_HOST_EQ_VERIZON_P
Being based in the UK, don't have many dealings with Verizon customers,
so YMMV on this one.  Still, only around 0.2% of hits are ham.

FORGED_OUTLOOK_HTML
part of FORGED_MUA_OUTLOOK, but hits less ham as a proportion, so IMHO
should be >0.

HTML_SHORT_LINK_IMG_1
overlaps with HTML_IMAGE_ONLY_nn but much better ratio.

HTTP_EXCESSIVE_ESCAPES
Not as good a ratio, but still <2% ham for me.

ONLINE_PHARMACY  Does this overlap with something else?  It's not
brilliant, but an obvious phrase to check for.  DC_IMAGE_SPAM_HTML has a
better ratio than, and catches almost as much as, DC_GIF_UNO_LARGO which
scores 2.275.

I was also thinking about raising the score STOX_REPLY_TYPE but the
header it detects is increasingly used by Windows Mail, so left that at
0.001 on my system.

Finally, URIBL_RED is automated and doesn't catch as much as
URIBL_BLACK, but the ratio is as good.  I've set it as 1.5.

Your mileage may vary - be interesting to know if it does.

CK


Re: 0.001 rules - why?

2009-08-10 Thread Cedric Knight
Matus UHLAR - fantomas wrote:
> On 09.08.09 11:33, Cedric Knight wrote:
>> I'm using Bayes and network tests, and have found a few rules with a
>> good ratio of ham to spam, but that score only 0.001 in the default
rules.
>
> apparently there's no use for them alone and the score isn't 0 just
because
> that would cause them not to be processed.

OK, but why are they determined to be useless?  Some do have scores in
other scoresets, but not in scoreset 3 (Bayes + network).  Is it a
consequence of score generation, based purely on ratio, or is it set
manually?  (In the cases where the rule scores 0.001 on all scoresets, I
assume it is pegged manually.)

>
>> Here are the ones I'm talking about:
>>
>> FH_HELO_EQ_D_D_D_D
>>
>> Overlaps with HELO_DYNAMIC_IPADDR2 and TVD_RCVD_IP,
>
> this is a big problem Imho, I've even filled a bugreport because of this

It does mean when you get FPs, they are serious FPs.  But if the rules
were reorganised so they didn't overlap (using meta rules or assertions
in the pattern like those below), I think there's something to be gained
from scoring each case separately.

Or, if rule A and rule B overlap such that B hits fewer spam, but has a
better ratio, then I'd say both should be active and score a positive
value: it's just that the score for B should be less if A will also hit.
 (If B hit only a subset of A's spam and had a *worse* ratio, then yes,
it should be scored near 0.)

Does score generation currently take this approach?

>
>> but if you redefine it as
>>
>> header   FH_HELO_EQ_D_D_D_DX-Spam-Relays-Untrusted =~ /^[^\]]+
>>
helo=(?!(?:[a-z]\S*)?\d+[^\d\s]\d+[^\d\s]\d+[^\d\s]\d+[^\d\s][^\.]*\.\S+\.\S+[^\]]+
>> auth= )[^ ]{0,15}\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3}/
>>
>> it clearly hits a good number that are missed by the other rules, with a
>> similar ratio.
>
> it would match for every host send from generic IP address (if they
know the
> address and it's rdns) , which is very common for dsl,cable,dialup etc
> users.

Hmmm... (a) in most cases, hosts on a dynamic address will use a machine
name rather than their rDNS; (b) is it not an assumption that these will
not be connecting directly to the MTA unless they are trusted?   Cf
RCVD_IN_SORBS_DUL, RCVD_IN_PBL, DOS_OE_TO_MX (although these depend on
last external, rather than last untrusted).

Anyway, my point is that *empirically* this rule seems to do well,
testing against a sample over the past week, even if we exclude mail
that also hits HELO_DYNAMIC_IPADDR2 and TVD_RCVD_IP.  If I search
through recent hits, it's all botnet stuff, and the ratio last week was
similar to URIBL_JP_SURBL (~0.1% ham hits).   If it doesn't do as well
for someone else, maybe it's down to some interesting difference in
setup, e.g. using greylisting.  A score of 0.8 or 1.0 seems to work well
for me.

>> FH_HOST_EQ_VERIZON_P
>> Being based in the UK, don't have many dealings with Verizon customers,
>> so YMMV on this one.  Still, only around 0.2% of hits are ham.
>
> you should understand that SA has many users living in a country with many
> verizon customers and the rules should be done tht they could be used
> generally

Viz the US.  Certainly SpamAssassin should be as widely usable as
possible, although there are problems with non-Western character sets...
OK, I withdraw my suggestion about this one as it relies purely on
factor (b) above, but still think the others are worth a go.

CK


Re: Mailbox for auto learning

2009-08-10 Thread Cedric Knight
Stefan wrote:
> Am Sonntag, 9. August 2009 07:36:54 schrieb Luis Daniel Lucio Quiroz:
>> Hi SAs,
>>
>> Well, after reading this link
>> http://spamassassin.apache.org/full/3.2.x/doc/sa-learn.html I'm still
>> looking for an easy-way to let my mortal users to train our antispam.  I
>> was thinking a mailbox such as  h...@antispamserver and s...@antispamserver
>> to let users to forward their false positivos or their false netgatives. 
>> In isde each box (ham or spam), of course a procmail with sa-learn input
>> will be forwarded.
>>
>> My doubts are nexts:
>> 1. Will forwarded mails be usefull for training, I mean if spam was: From:
>> spa...@example.netTo: u...@mydomain,   when forwarding it will be From:
>> mu...@mydomain To: s...@antispamserver.   Change of this and forwarding
>> (getting rid of headers because mail-clients) wont change learning?
> 
> You have to forward the message as an attachment un unpack it after 
> receiving. 
> Have a look at: 
> https://po2.uni-stuttgart.de/~rusjako/sal-wrapper

Yes, I find this approach works well.  It's the simplest way for me to
train Bayes, and most users can cope with it, providing they're not
using Outlook 2003/XP which can't forward as an attachment.  But
Thunderbird, Outlook Express, Squirrelmail and Pine all can easily.
It's not as simple as a 'This Is Spam' button perhaps, and that's a
*good* thing.  Requiring a little bit of thought stops people using it
as an alternative to the delete key for 'OK, perhaps I did subscribe to
this but I don't want it now'.

My script is very similar to sal-wrapper, using Postfix
check_recipient_access to ensure only authenticated users can send to
the reporting address; triggered from procmail; using MIME::Parser to
extract (possibly multiple) message/rf822 attachments; feed through
sa-learn --ham or spamassassin -r as appropriate and send an
acknowledgement back to the user, to remind them to also send
spam/non-spam to the corresponding address and correct any mistakes.

One thing I notice from sal-wrapper however is that it pipes the header
and body to sa-learn without passing a file as parameter.  I found that
although sa-learn didn't complain, this didn't work at all well, and
quite short ham messages were scoring BAYES_99.  You can pipe to
spamassassin -r just like you can to spamassassin in any other mode, but
I think if you pipe to sa-learn, you need to do it as
   sa-learn --ham -

with the '-' as parameter, so it reads the standard input.
Alternatively feed it a temporary message file.  Or am I misreading
something?

CK



Re: 0.001 rules - why?

2009-08-11 Thread Cedric Knight
Henrik K wrote:
> On Tue, Aug 11, 2009 at 04:31:32AM +0100, RW wrote:
>> On Sun, 09 Aug 2009 11:33:29 +0100
>> Cedric Knight  wrote:
>>
>>
>>> header   FH_HELO_EQ_D_D_D_DX-Spam-Relays-Untrusted =~ /^[^\]]+
>>> ...
>>> header   HELO_MISC_IPX-Spam-Relays-Untrusted =~ /^[^\]]+
>>>
>> Possibly this is down to their running on the wrong boundary, these
>> should be on the internal network boundary.
> 
> All these are fixed to -External in SVN/3.3.

Quite a complicated issue.  I'd posted before
http://www.nabble.com/Understanding-Trusted-and-Internal-to22282224.html#a22292088
wondering why such rules didn't check X-Spam-Relays-External.

However, when I test external equivalents like EXT_HELO_DYNAMIC_IPADDR2,
I find they hit as much ham (still only a little) and about half as much
spam.  In other words, testing the first entry of the -Untrusted
pseudoheader empirically does better for my setup.

I guess this is because (a) greylisting cuts out a lot of botnet spam
that would otherwise be delivered direct to internal_networks; (b) this
system is set to use the 3.2 model, that is, without including general
ISP MTAs in trusted_networks; and setting internal_networks to only
include MXs for the organisation.  What spam does match these rules
often comes via servers that provide MX for a domain that doesn't
greylist or filter and then forwards, and these (often MXs provided by
domain registrars) I include in trusted_networks but not internal, such
that spam delivered to them is tested appropriately by the existing
HELO_DYNAMIC_IPADDR2 and FH_HELO_EQ_D_D_D_D.

BTW, when I did try including servers like Google and ISP MTAs in
trusted_networks on the basis that they are "relay hosts...
considered to not be potentially operated by spammers, open
relays, or open proxies. A trusted host could conceivably relay
spam, but will not originate it, and will not forge header data" I found
not only FPs from EXT_HELO_DYNAMIC_IPADDR2 etc, bit also a lot of FNs,
partly because ALL_TRUSTED often triggered.  I imagine that if this is
an issue, it will come out over the course of SA3.3 testing.

BTW (2), maybe I overstated the case for URIBL_RED.  It seems to vary
somewhat in its reliability, and probably shouldn't be scored >1.0.
Still non-zero though, I propose.

CK



Re: Mailbox for auto learning

2009-08-11 Thread Cedric Knight
Luis Daniel Lucio Quiroz wrote:
> Le lundi 10 août 2009 19:15:15, Cedric Knight a écrit :
>> Stefan wrote:
[...]
>>> You have to forward the message as an attachment un unpack it after
>>> receiving. Have a look at:
>>> https://po2.uni-stuttgart.de/~rusjako/sal-wrapper
>> Yes, I find this approach works well.  It's the simplest way for me to
>> train Bayes, and most users can cope with it, providing they're not
>> using Outlook 2003/XP which can't forward as an attachment.  But
>> Thunderbird, Outlook Express, Squirrelmail and Pine all can easily.
>> It's not as simple as a 'This Is Spam' button perhaps, and that's a
>> *good* thing.  Requiring a little bit of thought stops people using it
>> as an alternative to the delete key for 'OK, perhaps I did subscribe to
>> this but I don't want it now'.
[...]

> Yes but problem is that 99% of users are about using some kind of outlook

Well then, tell them not to :)  Outlook Express and Windows Mail are
fine.  Outlook 2003 supposedly needs a special program like
http://www.olspamcop.org/ to forward properly, although if you select
multiple messages to forward, then it will forward them in some kind of
possibly useful digest format.  Outlook 2007 introduces an explicit menu
item called "forward as an attachment" (Ctrl+Alt+F) but still mangles
the headers:
http://forum.spamcop.net/forums/index.php?showtopic=10241&st=0&p=70453&#entry70453

Outlook 2007 also mangles the headers (kind of reconstructing a
misleading semblance of what the original was) when moving between IMAP
folders.  Therefore, I wouldn't use spamassassin -r on spam from Outlook
users, but sa-learn to get tokens from the body text may be OK.

Actually, some users of Outlook 2003 do seem to be able to forward as
intact message/rfc822 attachment.  Not exactly sure how.

Anyway, the 1% using a better e-mail program may be all that's needed to
train Bayes.

CK



Re: SA checking of authenticated users' messages

2010-07-08 Thread Cedric Knight
On 07/07/10 23:26, Greg Troxel wrote:
> 
> Louis Guillaume  writes:
>> I just need to clarify one thing that's not clear to me in re-reading
>> our thread from the other day: Is there a work-around for this?
>>
>> My users are getting restless. Everytime their ISP changes their IP
>> address I have to whitelist them!
> 
> I think there are currently only two viable approaches:
> 
>   arrange not to pass authenticated mail to spamass-milter
> 
>   change postfix and/or spamass-milter to insert a line in the
>   pseudoheader saying the mail was authenticated, so the ALL_TRUSTED
>   test fires and not the RBL checks.  This is some twitchy code to
>   write, but I suspect it isn't really that hard.

I don't think Louis has said what MTA is involved, but if it's Postfix
2.3 or later, you just add the following line to main.cf:

smtpd_sasl_authenticated_header = yes

And SA should then put all relays in X-Spam-Relays-Trusted and add
ALL_TRUSTED (about -1.8 points) and not do any RBL checks.  It's the RBL
checks that could be the major problem because client IPs are naturally
listed in DULs, and look like dynablocks.

However, some other checks may still run with ALL_TRUSTED and I found
the following kind of thing helped:

ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
meta TRUST_SHORTCIRCUIT (ALL_TRUSTED)
score TRUST_SHORTCIRCUIT-5.0
tflags TRUST_SHORTCIRCUIT   nice
priority TRUST_SHORTCIRCUIT -1000
shortcircuit TRUST_SHORTCIRCUIT on
endif

So you don't necessarily need to separate inbound and outbound ports or
IP addresses, although if you're designing a system from scratch you
probably would.

If some people are using pop-before-smtp there's the POPAuth plugin
which adds the authenticated client IP addresses to trusted_networks
(although in that case be careful of mail 2 web services like Google and
now Hotmail).

Were you using amavis with a single MX, an alternative is a postfix
kludge to separate incoming and authenticated mail to run different
amavis policy banks (e.g. authenticated virus check and DKIM signing;
incoming virus and spam check).  See
http://www.ijs.si/software/amavisd/amavisd-new-docs.html#dkim-postfix-dual-path

I don't know about doing this in MTAs other than postfix.

HTH

C


Re: Fwd: Indispensables pour vos vadrouilles…

2010-07-12 Thread Cedric Knight
On 11/07/10 16:04, Karsten Bräckelmann wrote:
> On Sun, 2010-07-11 at 15:53 +0100, Cedric Knight wrote:
> [nothing but 3 spam samples attached]
> 
> Uhm, dude!?  I hope that was an accidental address auto-completion. Do
> NOT send spam samples to the list.

Grovelling apologies.  It was Thunderbird auto-completion choosing a
different address book entry from the expected (SpamCop).

Anyone archiving this list, please do remove the original post if
possible.  Thanks.

CK


Profiling rules with DProf problems

2010-10-24 Thread Cedric Knight
Hello

I'm trying to get some performance data on a customised ruleset using
the instructions at
http://wiki.apache.org/spamassassin/ProfilingRulesWithDprof
and have two problems.

Firstly, I'm not actually getting any *_body_test or *_head_test data in
tmon.out.  Instead, after running dprofpp, all the body tests with
priority 0 (amounting to up to 80% of time) appear as a single item
"Mail::SpamAssassin::Plugin::Check::_body_tests_0"

I've tried it with 4000 spam and 1300 ham; and with just 90 or 3 spam
with similar results.  I do get body_test and head_data using just the
3.1 branch from svn, but not with 3.2 or 3.3 (all using Perl 5.10).

Is this method still supposed to work in SA >= 3.2 ?   Or is there
something else that can be done to give each rule its own subroutine?

Secondly, and more minor, on an existing installation running 3.2.5
(Debian lenny), if I load the "zoom" plugin with
  loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
through my usual config "-c /etc/spamassassin"

I get:
plugin: eval failed: panic: Devel::DProf inconsistent subroutine return
at
/home/mimo/spamassassin-3.2/masses/../lib/Mail/SpamAssassin/Plugin/Rule2XSBody.pm
line 87.

Thanks for any help.

CK



Re: rule for To: undisclosed-recipients:;

2010-10-24 Thread Cedric Knight
On 25/10/10 04:21, Dennis German wrote:
> Is there? should there be a rule for  a header like:
> To: undisclosed-recipients:;

There was a rule UNDISC_RECIPS in version 3.1, and it scored about 0.8
points.  I don't know why it was removed; presumably it hit too much ham.

It used to go:
header UNDISC_RECIPSTo =~ /^undisclosed-recipients?:\s*;$/

but I personally prefer a wider:
header UNDISC_RECIPS   ToCc =~ /^(?:undisclosed[_
\-]*recipients?|recipient[_ \-]*list[_
\-]*(?:supressed|not[_\-]*shown)):?\s*;/i

Might as well get my 2p's worth in to the related question:

On 24/10/10 19:56, Lawrence @ Rogers wrote:
> What I would like to do is take the Envelope-To and run a regex to
check if the To: header contains it.
>
> Is this possible?

Well, it is possible, and here's some hacky code to show one way of
doing it.  Whether it's desirable is indeed a different matter.

All this would have to be put into a user plugin (see
http://search.cpan.org/dist/Mail-SpamAssassin/lib/Mail/SpamAssassin/Plugin.pm
)

sub get_rcpt_to {
my $permsgstatus = $_[1];
my $line = $_[1]->get('X-Envelope-To')
|| $_[1]->get('Envelope-To')
|| $_[1]->get('Envelope-Recipients')
|| $_[1]->get('X-Rcpt-To')
|| $_[1]->get('X-Original-To');
return $_[1]->{main}->find_all_addrs_in_line ($line);
}

sub is_bcc {
  # call as head rule
  # related to UNDISC_RECIPS
  # hits mailing lists, of course:
  my ($self, $permsgstatus) = @_;
  my $tocc = lc ($permsgstatus -> get('ToCc'));
  my @rcpt_to = $self->get_rcpt_to($permsgstatus);
  my $match = scalar @rcpt_to;
  Mail::SpamAssassin::Plugin::dbg ("is_bcc -  all_to_addrs
@rcpt_to'$match' ToCc '$tocc'");
  return 0 if (! scalar @rcpt_to);
  my $to = lc ($rcpt_to[0]);
  $to =~ s/([?+*{}\[\]\'\/\\\(\)].)/\\$1/g;
  $to =~ s/\@(\w+)\./\...@\(\?:$1\\.\)?/; # make first subdomain optional
  # using index() would be faster:
  return 0 if ($tocc =~ /\b$to(?:$|,|>|\s)/im);
  return 1;
}

I agree it's not useful on it's own, but think it may be as part of a
meta rule for 419 spams and so on.  You'd at least want to limit it like:

header __IS_BCC eval:is_bcc()
meta IS_BCC (__IS_BCC0 && !__DOS_HAS_LIST_ID &&
!__DOS_HAS_LIST_UNSUB && !__DOS_HAS_MAILING_LIST)
describe IS_BCC Envelope recipients not in To or Cc,
prob not list
score IS_BCC0.1

HTH

CK


Re: Full circle DNS test?

2010-10-30 Thread Cedric Knight
On 30/10/10 07:42, Henrik K wrote:
> On Fri, Oct 29, 2010 at 10:02:56PM -0400, dar...@chaosreigns.com wrote:
>> I see there's a RDNS_NONE rule for when the sending IP address has no DNS
>> PTR (reverse DNS) record.  But no rule for when that PTR record doesn't
>> have a matching A (forward DNS) record that matches the sending IP?
>>
>> Is this something that would be accepted into spamassassin if I created a
>> module?  Or a feature that would be added if I didn't do it?
> 
> I doubt SA will incorporate it:
> 
> http://marc.info/?l=spamassassin-users&m=122268554723430
> 
> Make it if you need it. Share it if you want. People will use it if they
> find it useful.

For information, Postfix does a "full-circle" test of rDNS and puts
"unknown" in the Received headers if there is a PTR record, but the
value of that PTR record does not resolve, or if it resolves but does
not match.  And since SpamAssassin examines the hostname from the
Received headers, an email from a (last untrusted) IP address with an
unverified rDNS will hit RDNS_NONE.

So for Postfix and sendmail, what Darxus is suggesting is already
happening.

For other MTAs this may be different.  This means RDNS_NONE may be
assigned different scores from the scoring process, depending on whether
the email corpora checked have had no rDNS added to headers, had
unverified rDNS added, or only verified rDNS added.  That inconsistency
could be an argument for creating a module.  One other advantage of a
rDNS lookup module would be that having unverified rDNS available to
SpamAssassin separately could make it easy to write rules to catch an
unverified rDNS values of a type like dsl-189-180-xxx-xxx.

IMHO configuring Postfix to reject all email without verified rDNS (even
with a 450 temporary error) would result in wrongly bouncing a lot of
email from some organisational mail servers.  By the way, another way of
doing this might be to put the line "smtp: PARANOID" into /etc/hosts.deny.

C


HELO_DYNAMIC false positives on a UK web host

2010-12-09 Thread Cedric Knight
I noticed some bad false positives on email sent from certain web
servers that haven't (yet) been properly configured.  For example, a
trusted header line starting:

Received: from 94.229.160.4.srvlist.ukfast.net
(94.229.160.4.srvlist.ukfast.net [94.229.160.4])

looks to SpamAssassin like the dynamic IP address of a botnet, when it's
actually a perfectly valid mailout or form submission.  It hits
HELO_DYNAMIC_IPADDR2, HELO_DYNAMIC_SPLIT_IP, RCVD_NUMERIC_HELO and
TVD_RCVD_IP.

On the Bayes+network scores for SpamAssassin 3.3, this totals 8.948, and
on 3.2.5 it's 11.886.

IP addresses have been changed to protect the innocent, but the netblock
affected is 94.229.160.0/20, excluding some servers where the hostname
has been set to something descriptive.

I've emailed UKFast, but don't know when or if they will fix the
problem, so here are some workaround rules for anyone who might be affected:

header __HELO_DYNAMIC_UKFAST X-Spam-Relays-Untrusted=~/^[^\]]+
helo=\d+\.\S+\d+[^\d\s]\d+[^\d\s]\d+\.srvlist\.ukfast\.net /

meta COMPENSATE_BAD_HELO (HELO_DYNAMIC_IPADDR2 &&
HELO_DYNAMIC_SPLIT_IP && __HELO_DYNAMIC_UKFAST)

describe COMPENSATE_BAD_HELO HELO_DYNAMIC_* hit hard on
poorly-chosen static rDNS/hostname
score COMPENSATE_BAD_HELO-5.0

and also RDNS_DYNAMIC triggers on the reverse DNS, which in these cases
is identical with the hostname, so I've rewritten one subrule:

header __RDNS_STATIC X-Spam-Relays-Untrusted =~
   /^[^\]]+ rdns=\S*(?:static|fixip|srvlist\.ukfast\.net)/i

-- 
All best wishes,

Cedric Knight
GreenNet

GreenNet supports and promotes groups and individuals working for
peace, human rights and the environment through the use of
information and communication technologies.

GreenNet, Development House, 56-64 Leonard Street, London EC2A 4LT
Tel: UK 0845 055 4011 (Intl +44) 20 7065 0935 Fax: 020 7253 0936
Registered in England No. 02070438 VAT Reg GB 473 0262 65



Re: Odd yahoo spam

2010-12-09 Thread Cedric Knight
On 09/12/10 14:33, Randy Ramsdell wrote:
> I have been receiving bounces to my yahoo account for email I did not
> send. From the pastebin, you see the email did originate from the yahoo
> servers but is not in my sent directory. This is an interesting case and
> I cannot determine how this happened. One thing could be my account was

Have you checked your Yahoo options to see whether the spammer has
turned off saving of outgoing mail to the 'Sent' folder?  The 'hijacker'
presumably had access to everything, and could have just deleted mail.
In other cases the spammers have been known to insert spam links into
signatures, change secret questions, and so on.

By the way, I believe your Yahoo username should be decipherable from
the DKIM headers in theory if the DKIM checks out.

> compromised, but I really doubt that given the password I chose and the
> fact they did not change it to lock me out. I did change the password
> however. Each address in this e-mail are people I have sent to from
> yahoo, but these people are not connected to each other except for the
> work accounts. The "common thread" is me. of course.
> 
> Also not that sending e-mail from my yahoo account does not appear to
> route the same way. I was thinking someone used an API to interface with
> yahoo which would show different received headers. I know that yahoo has
> many servers so this point may be moot.
> 
> Can anyone add insight as to how this is happening?
> 
> http://pastebin.com/WYYLpEJh

Well, Hotmail is a bigger source of compromised accounts (I've had spam
appearing to come from many friends and contacts), but Microsoft still
seem fairly unsure about it themselves:

Either it's phishing, or it's a keylogger on a PC you used, they say.

Initially I saw hacked Hotmail accounts with Chinese
electronics/shopping scams, then the pharma spam gangs worked out the
same technique, and my impression is they've started using Yahoo as well
a bit more in the last week.  I'm sure someone must have more
authoritative information on this than me, but my own personal theory
goes...

...have you ever given your email address and password to a social
networking site?

Personally, I don't think it's responsible to encourage users to give up
credentials to a fairly open system, besides the sending unsolicited
invitations to contacts who may very well not be "friends".
Unfortunately running NoScript on Facebook makes a lot of the Ajax
unworkable.

>From the recipient's point of view, do they want this blocked, or do
they want to notify the sender who they may well know personally?  I
would go for rejection, but accepting such mail may be better than
discarding.

CK


Re: HELO_DYNAMIC false positives on a UK web host

2010-12-09 Thread Cedric Knight
On 09/12/10 20:30, Karsten Bräckelmann wrote:
> On Thu, 2010-12-09 at 20:18 +0000, Cedric Knight wrote:
>> I noticed some bad false positives on email sent from certain web
>> servers that haven't (yet) been properly configured.  For example, a
>> trusted header line starting:
>
> Ah, so they are operational, just poorly configured. That's what you
> just said in other words, right? :)

Yes, I was trying to think of a tactful way of putting it without
showing exasperation :).  It appears that a client can easily set up
hosting using cPanel or something without ever setting the rDNS or
hostname to anything other than the numeric default.

I don't actually know if rDNS or hostname are directly under client
control, but I've advised senders to ask their hosting company to deal
with it.

>
> Anyway, why are *web* servers sending out mail at all? Other than maybe
> cron junk and friends, which would warrant bypassing SA or extending
> your internal network. If they are indeed intended to send out mail to
> third-parties, they better be configured properly first.

In the case that actually caused me to write, orders from a shop.  Or it
might be running PHPList or CiviCRM or any CMS that authenticates users
by email.

>
>> Received: from 94.229.160.4.srvlist.ukfast.net
>> (94.229.160.4.srvlist.ukfast.net [94.229.160.4])
>
> Looks like a dynamic hostname indeed.

The "srv" might raise suspicions.  In fact, I suppose it's not a totally
unreasonable form of rDNS for a large server farm, but personally I give
all the cows on my farm names.

-- 
All best wishes,

Cedric Knight
GreenNet

GreenNet supports and promotes groups and individuals working for
peace, human rights and the environment through the use of
information and communication technologies.

GreenNet, Development House, 56-64 Leonard Street, London EC2A 4LT
Tel: UK 0845 055 4011 (Intl +44) 20 7065 0935 Fax: 020 7253 0936
Registered in England No. 02070438 VAT Reg GB 473 0262 65



Re: HELO_DYNAMIC false positives on a UK web host

2010-12-09 Thread Cedric Knight
On 09/12/10 22:43, John Hardin wrote:
> On Thu, 9 Dec 2010, Cedric Knight wrote:
>
>> It appears that a client can easily set up hosting using cPanel or
>> something without ever setting the rDNS or hostname to anything other
>> than the numeric default.
>
> Is there anything in the headers that indicates cpanel is in use?

Not really, unfortunately.  The only way I know is that the host's
public webpages mention both cPanel and Plesk as available features.  To
clarify a little more, both the false positive samples I have are from
small organisations, with apparently just one dedicated or virtual
server apiece at UKFast, used primarily as a web server.  One sample is from
  X-Mailer: Drupal
and the other is
  X-Mailer: PHPMailer [version 1.72]

The only commonality is that the last Received line is of the form:
  Received: (qmail \d+ invoked by uid \d+); [$DATE]
which might also hit anything that had been through Yahoo or Messagelabs.

> Perhaps a meta on cpanel + dynamic-looking-rDNS would be worth a
> negative point or two...

This is about 9-11 points to offset, though.  Maybe there's no way of
doing a negative rule that spammers couldn't abuse.  The exclusion could
be generalised by having certain HELO strings stop the HELO_DYNAMIC_*
firing: HELO_STATIC_HOST currently only has provision for "rogers.com"
and only neutralises HELO_DYNAMIC_IPADDR, HELO_DYNAMIC_DHCP,
HELO_DYNAMIC_HCC.  That could be extended to the rules involved in this
case, and certain strings like "static|fixip|\bse?rv|mx" (and not
"pool|dsl", although some people even unwisely run their office exchange
server on something with "dsl" and a string of numbers in the rDNS).

These are valuable rules, and hosts should indeed ensure they or their
users set authentic-looking HELOs.  How about scanning through mail or
logs for messages that hit at least 2 of the HELO_DYNAMIC rules and
RCVD_NUMERIC_HELO, but are otherwise hammy?

Looking at HELO_DYNAMIC_SPLIT_IP closely, I'm pretty sure it was never
intended to overlap with RCVD_NUMERIC_HELO.  I'll file a bug.

CK


Two newish RBLs; NXDOMAIN question

2010-12-13 Thread Cedric Knight
There seem to be an abundance of DNSBLs out there nowadays.  Here are
my observations on two, and an implementation question.  The Good, the
Bad and the Ugly:

GBUdb.com's truncated list (http://www.gbudb.com/truncate/) went public
in May and seems to work very well, catching a lot of things missed by
other RBLs, with <1% false positives.  YMMV.  It's related to Message
Sniffer, a commercial anti-spam package, and is fully automated (no user
submissions).  I'd hazard that a starting score of 2.0 might be appropriate.

header RCVD_IN_GBUDB_TRUNC  eval:check_rbl('trunc-firsttrusted',
'truncate.gbudb.net.')
describe RCVD_IN_GBUDB_TRUNCConnecting IP in truncate.gbudb.net.
score RCVD_IN_GBUDB_TRUNC   1.5
tflags RCVD_IN_GBUDB_TRUNC  net

I've also had good experiences with bl.mailspike.net and the dynamic and
noptr lists on spamrats (all.spamrats.com 127.0.0.3[67]).  WPBL seems to
be less useful than two years ago, with too many FPs.

QUORUM.TO is a recent project of Julian Haight, creator of SpamAssassin.
Some data is commercial, but there's also a public list, described at
http://www.quorum.to/publicbl.html.  I'm disappointed by my tests so
far, showing its "base rule" as very fuzzy in distinguishing between ham
and spam.  In addition, quorum.to doesn't follow RFC 5782 convention,
for example in its negative response to 2.0.0.127.list.quorum.to.  In
fact, I was most hopeful of using it as a cleanlist of hosts that had
deliberately validated themselves, as it gives a positive response
(127.0.0.0) for IPv4 addresses it has no data for.  I've been trying it
out with the following rules, but I wouldn't suggest anyone else does at
the moment:

header __RCVD_IN_QUORUM eval:check_rbl('quorum-firsttrusted',
'list.quorum.to.')
describe __RCVD_IN_QUORUM   Connecting IP in list.quorum.to.
tflags __RCVD_IN_QUORUM net

header RCVD_IN_QUORUM_BLOCK
eval:check_rbl_sub('quorum-firsttrusted', '^127\.0\.0\.2$')
describe RCVD_IN_QUORUM_BLOCK   Connecting IP failed quorum.to tests
score RCVD_IN_QUORUM_BLOCK  0.1
tflags RCVD_IN_QUORUM_BLOCK net

header RCVD_IN_QUORUM_REJECT
eval:check_rbl_sub('quorum-firsttrusted', '^127\.0\.0\.[45]$')
describe RCVD_IN_QUORUM_REJECT  quorum.to has been asked to dirtylist
connecting IP
score RCVD_IN_QUORUM_REJECT 0.01
tflags RCVD_IN_QUORUM_REJECTnet

meta RCVD_IN_QUORUM_GOOD(! __RCVD_IN_QUORUM)
describe RCVD_IN_QUORUM_GOODConnecting IP not listed on quorum.to
(or servfail)
score RCVD_IN_QUORUM_GOOD   -0.1
tflags RCVD_IN_QUORUM_GOOD  net

As you can see, the only way to turn use it as a list of clean IPs is to
negate any response.  My problem, rather theoretical, is that as well as
catching an NXDOMAIN response, the cleanlisting (RCVD_IN_QUORUM_GOOD)
will trigger on bad configuration (SERVFAIL) or network problems, and it
appears SA's Dns module treats all those conditions the same.
Intuitively, I'd like to be able to do something like (invalid code
follows):

  header RCVD_IN_QUORUM_GOOD2
eval:check_rbl_sub('quorum-firsttrusted', 'NXDOMAIN')

or maybe

  header RCVD_IN_QUORUM_GOOD2
eval:check_rbl_sub('quorum-firsttrusted', '^$')

Does anyone else think work on providing such a facility for negative
cleanlists might be worthwhile?

CK


Re: Two newish RBLs; NXDOMAIN question

2010-12-13 Thread Cedric Knight
On 13/12/10 15:06, Karsten Bräckelmann wrote:
>> [...] is a recent project of Julian Haight, creator of Spam
> 
> Cop. SpamCop.
> 
>> Assassin.

Oh no, did I type that?  Dratted absent-minded fingers.

Apologies.

C


Re: Two newish RBLs; NXDOMAIN question

2010-12-13 Thread Cedric Knight
On 13/12/10 15:44, RW wrote:
> On Mon, 13 Dec 2010 13:47:14 +
> Cedric Knight  wrote:
...
>> header RCVD_IN_GBUDB_TRUNC  eval:check_rbl('trunc-firsttrusted',
>> 'truncate.gbudb.net.')
>
> That should be "-lastexternal"  -  assuming that the list contains
> a lot of dynamic addresses.

And assuming that you've populated trusted_networks with the SMTP
servers for ISPs/freemailers that don't put any authentication
information in the header.  I haven't, personally.

The list doesn't contain a lot of dynamic addresses in the sense of
RCVD_IN_SORBS_DUL, but I'm not sure it excludes them the way
deep-parsing lists are supposed to.

> Blacklists run on either the last external address or run deep,
> whitelists run on first-trusted.

I guess this one could run deep, and will try that for a period.

This is a confusing issue. The way I think of it goes roughly as follows:

* I prefer not to do unnecessary DNS lookups, ideally at most one per
message (per RBL).

* If I know another party's server forwards a domain it MXes to me, but
is also a MSA (SMTP server) for users, then I don't want to put it in
internal_networks, to prevent possible DUL FPs from the SMTP users.

* So, assuming my spam-checking is better than on that server, I put it
in trusted_networks.  Now -firsttrusted rules will catch spam sent to
the MX, but will not hit the other server's users, unless they are on a
dynamic IP address that has unfortunately recently been used by spammers
and there is no SA-recognised authentication such as ESMTPA (a risk I
can take).  This is a conservative approach to trusted_networks.  (And a
bit lazy, perhaps: I do have a list of mx-like servers that have sent
ham, but I don't want to keep updating it.)

* If I put all major ISPs and freemailers into firsttrusted, and some of
them do have authentication that is recognised by SpamAssassin (such as
Yahoo), then the RBL won't catch exploited servers or botnets or gangs
in West Africa who have freemail accounts or are on the ISP's network;
in fact ALL_TRUSTED might hit (btw I shortcircuit if so, to stop
CPU-intensive rules running).

* With this approach, -lastexternal still catches botnet spam coming
from dynamic and dsl IP addresses direct to my MXs (and any MXs I list
that are not also MSAs), by using the dial-up RBLs like SORBS_DUL.  For
a more general RBL (ie just including addresses known to send spam, not
based on whether they are dynamic), -firsttrusted hits botnet spam
sources, but also stuff from a compromised server.

* Suppose I do add Hotmail: "trusted_networks 65.52.0.0/14".  Hotmail's
auth mechanism is not recognised by SA (unlike Yahoo's).  Now if I set
an RBL to run -lastexternal, it will check whether Hotmail is in the RBL
(or maybe shouldn't check at all since it's trusted).  That Hotmail
server may send a mixture of ham and spam, and I don't expect it to be
listed as a spam source.  If I have the list (which is mostly of
addresses that are known spam sources, not just dynamic or IPs or ones
which are known not to run mailservers) set to -firsttrusted, it stands
a fair chance of discriminating between spammy and non-spammy Hotmail users.

* DNSWLs stay on -firsttrusted too.

I know this goes against received wisdom, but empirically it seems to
work well for me.

CK


Re: DNSBL for email addresses?

2010-12-14 Thread Cedric Knight
On 14/12/10 14:28, Marc Perkel wrote:
> Are there any DNSBLs out there based on email addresses? Since you can't
> use an @ in a DNS lookup

Actually, you can use '@' in a lookup.  You just can't use it in a hostname.

Or you could convert the '@' to a '.' as is the format still used in SOA
records.

But both of these would have privacy issues: say you've received an
email via TLS, your anti-spam system suspects it might be a 419, so you
look up a Reply-To address or body email address, and you send a query
to the RBL via DNS.  But it turns out the message was private ham, and
you've lost the protection of TLS.

So a hash is best, and I'd suggest SHA1 over MD5.  And I do think the
idea is worth trying; although freemail identities are cheap, there is
still some time and effort and risk of detection involved in creating
and checking them.

CK


Re: DNSBL for email addresses?

2010-12-16 Thread Cedric Knight
On 15/12/10 00:43, RW wrote:
> On Tue, 14 Dec 2010 15:52:28 -0800 (PST)
> John Hardin  wrote:
> 
>> On Tue, 14 Dec 2010, Cedric Knight wrote:
>>
>>> So a hash is best,
>>
>> Agreed.
>>
>>> and I'd suggest SHA1 over MD5.
>>
>> Just out of curiosity, why? An MD5 hash is shorter than an SHA hash
>> (an important consideration when you're making lots of DNS queries of
>> the hash), MD5 is computationally lighter than SHA, and MD5 is robust
>> enough for this purpose, even though artificial collision scenarios
>> exist.

Maybe I was being over-cautious, based on articles (which I can't find
online any more) suggesting MD5 is likely to become trivial to crack in
future owing to mathematical shortcuts.  It's not as if you can recover
the data from a hash, or even (as I read it) that you can create a
collision for any given hash yet, but there may be a problem in any
context with assuming something is secure when it's only semi-secure.

I am not a mathematician or security expert, therefore I am swayed by
pronouncements from US-CERT:
"Do not use the MD5 algorithm
Software developers, Certification Authorities, website owners, and
users should avoid using the MD5 algorithm in any capacity. As previous
research has demonstrated, it should be considered cryptographically
broken and unsuitable for further use."
http://www.kb.cert.org/vuls/id/836068

OK, so this isn't a cryptographic application.  I'm just thinking
future-proofing.  Some background for non-experts like me:
http://www.maa.org/devlin/devlin_02_06.html

SHA1 is 40 characters, as against MD5's 32, which isn't such a great
difference, considering an IPv6 lookup is 64 under rfc5782.

>> Granted I wouldn't sign a legal document with it any more, but for a 
>> private perfect hash of an email address, why not?
> 
> I don't see that there's all that much added security anyway. 
> 
> I don't think spammers are likely to intercept dns as a way of
> harvesting addresses.  
> 
> As far as general privacy is concerned, without a shared-secret anyone
> can generate the hash and look for known addresses. And if you don't add
> salt to the hash, it's going to be fairly easy to perform an efficient
> dictionary attack, in which case the choice of hash function makes
> little difference.

I wasn't thinking of harvesting by spammers, but by (say) a government
authority that does not already have a dictionary of addresses that is
known to be complete.  This is information in non-spam bodies that might
be looked up (well it would be if you want to use it to block 419
scams).  Also, possibly people might want to use the same hashing
standard for a DNSWL of (maybe DKIM-verified) email addresses, meaning
that list would be abusable by spammers who are able to create a hash
collision.

CK


Re: lots of freemail spam

2011-01-02 Thread Cedric Knight
On 30/12/10 19:15, Lawrence @ Rogers wrote:
> Lately, I notice we are getting a fair amount (10-12 per day per client)
> of spam coming from freemail users (FREEMAIL_FROM triggers). Usually the
> Subject is non-existent or empty, and the message is always just an URL

I see a fair amount matching that description, and corresponding
complaints.  In the past few weeks there seems to be a shift from
Hotmail/MSN/Live to also use cracked Yahoo and AOL/AIM accounts.
Someone at the freemail providers should know if passwords are obtained
by phishing (such as tabnabbing) or a keylogger or even by a dictionary
attack.

There's no text to match Bayes or body rules; because the URL is on a
cracked site, URIBL_* isn't usually appropriate; because it's from a
cracked account, the headers are fine and it may even reach users who've
chosen to only accept email from friends/contacts.  More of the
originating IPs should hit deep-parsing RBLs than actually do.

So it could be argued that the nest response is not to block, but to let
owners of cracked accounts know they need to change their password and
secret questions (or close the account if it can't be recovered), and
also to report the cracked sites and originating IPs, possibly by
educating users about SpamCop.

> Is there a good rule for flagging these as possible spam? I understand
> that there may be some legit e-mails that would hit all 3 factors, so I
> would score the rule low.
>
> Thoughts?

Something like:

meta FREEMAIL_PHARM_PROB((FREEMAIL_FROM + MISSING_SUBJECT +
LINK_NR_TOP) >=3)
describe FREEMAIL_PHARM_PROBLooks like simple link from cracked account
score FREEMAIL_PHARM_PROB   2.5

LINK_NR_TOP is the only additional element needed, to indicate message
length:

rawbody LINK_NR_TOP
/^.{0,20}http:(?http://sourceforge.net/projects/ixhash/> seems to hit a greater
percentage than other body checksums (the body being empty or very
short).  Also there are short-lived patterns in the abusive file uploaded:

uri FREEMAIL_PHARM1 /\/mtxtsx\.htm/
describe FREEMAIL_PHARM1Particular link on cracked site, Jan 2011
score FREEMAIL_PHARM1   8.0

uri FREEMAIL_PHARM2
/\/(?:2011\.php\?\w+=\w+$|foto2011\.php|clickhere\.php|important\.php|mywork\.html)/
describe FREEMAIL_PHARM2Particular link on cracked site, Jan 2011
score FREEMAIL_PHARM2   4.0

uri FREEMAIL_PHARM3
/\/\/[a-z0-9A-Z.-]+\/images\/[A-Za-z0-9\-]+\.(?:php|htm)/
describe FREEMAIL_PHARM3Top-level images folder, php or htm
extension
score FREEMAIL_PHARM3   0.1

HTH

CK


Re: HEADS UP: DBSL.org is returning positive replies

2012-08-10 Thread Cedric Knight
On 10/08/12 17:26, Geert Mak wrote:
> positive or negative?
> 
> we had lists.dsbl.org on (obviously forgotten) and about 24 hours ago all 
> mail was rejected, based on lists.dsbl.org - quite unpleasant…

Positive.  A fair number of servers rejected mail with 554s.  It was like:
  1.0.0.127.list.dsbl.org has address 74.92.59.67

The registrar removed a delegation
  list.dsbl.org. 3600 IN NS stop-using-dsbl.dsbl.org.
which has now been put back in.

Unfortunately people don't seem to notice the open connections and mail
delays from RBL timeouts (to show any in SA, run it with -D flag).  So
if you have a really old config, might be worth searching for any
reference to dsbl.org.  And removing it, obviously.

> 
> On 10.08.2012, at 13:46, Axb wrote:
> 
>> DSBL.org was shut down 4 years ago but apparently there's still ppl sending 
>> lookups.
>>
>> As of today, dsbl.org is returning positive replies
>>
>> Enjoy the support case party!
>>
>> https://twitter.com/#!/search/?q=DSBL&src=typd
>>
>>
>> Axb

-- 
All best wishes,

Cedric Knight



spameatingmonkey.net down?

2013-01-25 Thread Cedric Knight
Does anyone have any more information on spameatingmonkey.net, which
doesn't seem to have been resolving since  UTC today (20120125) ?
It looks like ns1.urmombl.com is down.

Spam Eating Monkey provides or provided RBL, RHSBL and iXhash of what
seemed to me to be fairly good quality, and particularly RHSBLs of
domains less than 15 days old.

It probably only affects a few SA users, those who have included it
manually, and was removed from SA sandboxes last year.

-- 
All best wishes,

Cedric Knight
GreenNet



Re: spameatingmonkey.net down?

2013-01-25 Thread Cedric Knight
On 25/01/13 13:20, Tom Kinghorn wrote:
> On 25/01/2013 15:12, Cedric Knight wrote:
>> Does anyone have any more information on spameatingmonkey.net, which
>> doesn't seem to have been resolving since  UTC today (20120125) ?
>> It looks like ns1.urmombl.com is down.
>>
>> Spam Eating Monkey provides or provided RBL, RHSBL and iXhash of what
>> seemed to me to be fairly good quality, and particularly RHSBLs of
>> domains less than 15 days old.
>>
>> It probably only affects a few SA users, those who have included it
>> manually, and was removed from SA sandboxes last year.
>>
> *http://is.spameatingmonkey.com.downorblocked.net/*
> 
> Status shows as: OFFLINE

Thanks for the confirmation, but by "more information" I meant any news
from the maintainer of SEM about what his/her intentions were and
whether service is likely to be restored, or whether it's permanently
offline or whatever.

Anyway, its loss, temporary or otherwise, doesn't seem to be affecting
too many people.  It caused some slow mail checking for me.

C


Re: spameatingmonkey.net down?

2013-01-27 Thread Cedric Knight
On 25/01/13 13:12, Cedric Knight wrote:
> Does anyone have any more information on spameatingmonkey.net, which
> doesn't seem to have been resolving since  UTC today (20120125) ?
> It looks like ns1.urmombl.com is down.
> 
> Spam Eating Monkey provides or provided RBL, RHSBL and iXhash of what
> seemed to me to be fairly good quality, and particularly RHSBLs of
> domains less than 15 days old.
> 
> It probably only affects a few SA users, those who have included it
> manually, and was removed from SA sandboxes last year.

For info, the monkey is back up now, with http://dns.squish.net/
reporting 60% DNS health, but a normal-looking status page at
http://spameatingmonkey.com/status.html/

Someone offlist pointed me to Warren Togami's evaluation of the related
RBL (SEM-BLACK) in 2011
<http://www.spamtips.org/2011/05/dnsbl-safety-report-5142011.html>.  It
concluded from the masschecks: "false positives on as much as 5-6% of
ham... high overlap of 83% with RCVD_IN_PBL... outright avoid".  I'm not
saying it's safe for a high score, and would recommend that anyone
trying it uses considerable caution, but /for me/ that RBL hits more
spam than z.mailspike.net (though still not a lot), and has a lower FP
rate than NiX Spam or UCEPROTECT (or in fact, SpamCop).

C


Re: PatioDeals@****** how to get high score

2015-08-15 Thread Cedric Knight
On 14/08/15 02:19, Alex wrote:
 in the .cf file I addes blacklist_from *.review 
 blacklist_from *.work blacklist_from *.date
>>> 
>>> I would use the following:
>>> 
>>> blacklist_uri_host review blacklist_uri_host work 
>>> blacklist_uri_host date
>> 
>> you want both: a bad sender using the domain as well a URI to the
>> domain and without having tested it at my own: make sure it does
>> only match when the domain ends with "review", "work", "date" to
>> prevent FP
> 
> Are you talking about it somehow matching "123review", for example?
> It appears that it refers to only the rhs of the address. For
> example "blacklist_from *.review" catches user@123test.review but
> not u...@123review.com or user@123review.123review or
> 123test.review.com. Are there any other variations to be concerned
> with, or could someone else confirm?

That looks right, checking Conf/Parser.pm.  blacklist_from internally
adds a "$" so it must match the rightmost part of any address.

> So while blacklist_from requires the wildcard match, 
> blacklist_uri_host does not.

Indeed blacklist_uri_host does not permit wildcards.  It must be an
exact match with the top 1-10 parts (labels).

> Also, at some time, Axb had posted a list of the new TLDs that are
> a significant source of spam and included domains like xxx and xyz.
> Does anyone have an updated list that might be helpful?

Try http://rss.uribl.com/tlds/index.html (it's percentages per domain,
rather than per email)
.uno, .red, .black, .blue, .pink, .click, .xyz all seem significantly
abused.
.asia and .link seems to have cleaned up a bit in the last few months,
.science less so. xxx probably isn't very useful to spammers.

Also 20_aux_tlds.cf contains a link to the full IANA gTLD list.

If you want to be less severe, maybe a meta rule using Paul's
BODY_NEWDOMAIN_14_FMBLA with enlist_uri_host setting a range of scores
as described at https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6458#c3

CK


Re: new(ish) malware: RTF with MIME payload

2016-03-18 Thread Cedric Knight
On 17/03/16 19:31, Chip M. wrote:
> Starting about two hours ago, more than 80% of my real-time
> honeypot spam is a new malware campaign.
>
> Full spample (with redacted/munged email addresses and
> Message-ID):
> http://puffin.net/software/spam/samples/0039_mal_rtf_mime.txt

[snips]
> So far, they all have these headers:
> X-Interface: IDSMail OLE Server v6.12 (32)
> X-Mailer: Everest CRM Studio
> Which feel too helpful to last long. :)

They're not reliable, no, but might be useful in a quick meta rule that
lets you know when it mutates.

> Question: What other file extensions / Content Types would be
> viable for this payload? 

Anything that opens in MS Word, eg do[ct][mx]?, asd, wbk, wll and might
run VBA, so includes Excel too, xl.*  - whether it launches depends how
the MUA handles the Content-Type I think - so "application/.*" .

There's some analysis here:
https://isc.sans.edu/forums/diary/XML+A+New+Vector+For+An+Old+Trick/19423/
(mentions analysis tools)
https://blogs.mcafee.com/mcafee-labs/banking-malware-dridex-arrives-via-phishing-email/
(mentions abused for Dridex)

On 17/03/16 19:46, Reindl Harald wrote:
> 
> /var/www/uploadtemp/8044012e4e9b882b3c7643489c05df73e5cf6dcf.eml:
> Sanesecurity.Malware.26034.XmlHeurGen.AM.UNOFFICIAL FOUND

Yes, Sanesecurity is great... this detects Russian XML attachments with
no content, containing VBA and the ActiveMIME header.  It's possible the
language will change, but you could write your own ClamAV .ndb sig to
stop any ActiveMIME

>  1.5 HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of
> words
>  1.5 BAYES_50   BODY: Bayes spam probability is 40 to 60%
> [score: 0.4999]

You score 1.5 for unrecognised?

>  2.0 MISSING_MIMEOLEMessage has X-MSMail-Priority, but no X-MimeOLE

That one could be useful.

>  0.5 INVALID_MSGID Message-Id is not valid, according to RFC 2822
>  3.8 MSGID_NOFQDN1  Message-ID with no domain name

Those two are just from Chip's mungeing, unfortunately.

CK


Re: new(ish) malware: RTF with MIME payload

2016-03-20 Thread Cedric Knight
On 18/03/16 08:39, Cedric Knight wrote:
> On 17/03/16 19:31, Chip M. wrote:
>> Starting about two hours ago, more than 80% of my real-time 
>> honeypot spam is a new malware campaign.
>> 
>> Full spample (with redacted/munged email addresses and 
>> Message-ID): 
>> http://puffin.net/software/spam/samples/0039_mal_rtf_mime.txt
> 
> [snips]

>> Question: What other file extensions / Content Types would be 
>> viable for this payload?
> 
> Anything that opens in MS Word, eg do[ct][mx]?, asd, wbk, wll and
> might run VBA, so includes Excel too, xl.*  - whether it launches
> depends how the MUA handles the Content-Type I think - so
> "application/.*" .

On second thoughts, that's nowhere near safe to catch everything that
might run VBA.  Need someone with more knowledge of a Windows system
with Office installed, but quite likely it will also try to open xml,
prn, csv, od[st], dif, slk, wp, rt.*, ppt, p[op].* as a document and
run any macros.

And regardless of extension, very likely, text/csv, text/rtf will do
so as well, and according to this old page
https://msdn.microsoft.com/en-us/library/ms775147%28v=vs.85%29.aspx
even text/plain could open as an application data file.

CK


Re: freemail spam

2016-03-25 Thread Cedric Knight
On 25/03/16 00:55, Alex wrote:
> Hi,
> 
> First, I'm wondering why parking.ru isn't among the freemail domains?

Probably because the FreeMail plugin is designed to detect the
right-hand side of email addresses for providers like Gmail and AOL, and
parking.ru looks like a general-purpose web host.  Does it offer free
email service @parking.ru?

> Perhaps it should be added?

You could do that in your config with
  freemail_domains parking.ru

> Received: from mail05.parking.ru (mail05.parking.ru [195.128.120.25])
> by mail02.example.com (Postfix) with ESMTP id 6ED82347D26
> for ; Wed, 23 Mar 2016 17:42:50 -0400 (EDT)
> 
> I'm reading through the FREEMAIL_* rules, and wondered, how can I
> build a rule that looks to see if email was passed through a freemail
> domain?
> 
> I realize there's FREEMAIL_FROM, etc. I'm interested in something like
> FREEMAIL_RECVD or something similar.

There's no man page for Mail::SpamAssassin::Plugin::FreeMail, but the
comments include
# header FREEMAIL_HDRX eval:check_freemail_header('header' [, 'regex'])
#
# Searches defined header for freemail address. Optional regex to match
# the found address (like in check_freemail_from).

So you could do
  eval:check_freemail_header('Received')

However, this looks for full email addresses, so I don't think it's of
use to you, unless you want to catch 'example.com'.

[BTW I wrote an incomplete patch to this function in bug 6664 so it
could be used as:
header FREEMAIL_FORGED_REPLYTO4
eval:check_freemail_header('Reply-To','\@','From')
describe FREEMAIL_FORGED_REPLYTO4 Any Reply-To freemail not in From
and then exclude __HAS_IN_REPLY_TO __DOS_HAS_LIST_UNSUB etc, which
improves accuracy in picking up 419s.
I still mean to upload a correct patch.]

So isn't what you want something like this?
  header RCVD_DIRTY_SERVERS   Received =~ /\.parking\.ru/
or
  header RCVD_DIRTY_SERVERS   X-Spam-Relays-Untrusted =~ /
helo=\S+\.(?:parking\.ru|dirty\.tld)/

> We're experiencing a higher than normal level of spoofing attempts,
> and don't have the ability to implement DKIM/DMARC at the moment. SPF
> is being worked on.
> 
> Having knowledge that a freemail sender was used in a spoof/phish
> attempt I believe would be helpful.

I'm seeing some 419s from parking.ru, but not what I'd call phish.  Do
you mean you're getting a lot of spam that comes from your own domain?
IMHO it's usually a mistake to focus on that characteristic, as it's
incidental.  It's better to check the first-level checks are working,
like RBLs.  Maybe pastebin some full samples?

HTH

CK


MIME header false positives (was Rule to score word documents)

2016-04-06 Thread Cedric Knight
On 30/03/16 21:11, @lbutlr wrote:
> On Wed Mar 30 2016 13:34:23 Alex   said:
>>
>> /^(Content-(Type|Disposition)\:|[[:space:]]+).*(file)?name="?.*\.doc"?;?$/
>> REJECT
> 
> /^\s*Content-(Disposition|Type).*name\s*=\s*"?(.*\.(ade|adp|bas|bat|chm|cmd|com|cpl|crt|dll|exe|hlp|hta|inf|ins|isp|js|jse|lnk|mdb|mde|mdt|mdw|msc|msi|msp|mst|nws|ops|pcd|pif|prf|reg|scf|scr\??|sct|shb|shs|shm|swf|vb[esx]?|vxd|wsc|wsf|wsh))(\?=)?"?\s*(;|$)/x
> REJECT Attachment name "$2" may not end with ".$3”

I'd like to take the opportunity to warn that regexes like this (and the
version in the Postfix documentation as "man header_checks") have
started blocking email from iPhones.

This is because some Apple email client adds a parameter to Content-Type
that may end in ".com".  The ".*\." can span between those parameters.
If you block extensions in Postfix, check your logs for
"x-apple-part-url" and you may see something like:

server postfix/cleanup[1234]: 123412341234: reject: header Content-Type:
 application/vnd.ms-publisher;??name="redacted
redacted.pub";??x-apple-part-url="abcd1234-1234-5678--123412341...@yahoo.com"

("??" is the CRLF line break.)

For postfix the rule can be rewritten to specify the parameter value to
avoid this type of false positive:

/^Content-(Disposition|Type).*name\s*=\s*
("(?:[^"]|\\")*|[^();:,\/<>\@\"?=<>\[\]\ ]*)
((?:\.|=2E)(
ade|adp|asp|bas|bat|chm|cmd|com|cpl|crt|dll|exe|
hlp|ht[at]|
inf|ins|isp|jse?|lnk|md[betw]|ms[cipt]|nws|
\{[[:xdigit:]]{8}(?:-[[:xdigit:]]{4}){3}-[[:xdigit:]]{12}\}|
ops|pcd|pif|prf|reg|sc[frt]|sh[bsm]|swf|
vb[esx]?|vxd|ws[cfh])(\?=)?"?)\s*(;|$)/x
REJECT Attachment name $2$3 may not end with ".$4"

So far as I can see, no standard SpamAssassin rule checks for .com so
shouldn't cause a false positive, but some rules that are intended to
just check filename extensions and might hit other parts of the header
include OBFU_TEXT_ATTACH, T_OBFU_DOC_ATTACH and __TVD_MIME_ATT_AOPDF.

> Just add the MS Office file extensions to that.
> 
> Then, when your users revolt and are banging on your door with pitchforks and 
> torches, take them out again.

:) or staff the machiolations because you know best.

Some that I seriously would add are .mso, .xl, .ocx and .jar.

CK



FPs on RCVD_IN_SORBS_WEB

2017-03-09 Thread Cedric Knight
On 11/09/16 22:10, Alex wrote:
>> COMMIT/trunk/rules/50_scores.cf
>>
>> Committed revision 1760066.
>>
>> score RCVD_IN_SORBS_SPAM 0 0.5 0 0.5
>>
>> should show up after next SA update
> 
> Has RCVD_IN_SORBS_WEB been considered for adjustment as well? It's
> hitting a lot more ham than spam here, including mail from facebook.

Over the last four months I've seen a fair number of false positives
from RCVD_IN_SORBS_WEB, including Facebook, Google, HaveIBeenPwned and
various legit servers.  A Facebook example:

  145.144.220.66.dnsbl.sorbs.net. 3600 IN TXT "Exploitable Server See:
http://www.sorbs.net/lookup.shtml?66.220.144.145";

The rule scored 3.253 in November, which has fallen to 2.034 now.  This
still seems high for a RBL, particularly one that does deep-parsing,
i.e. isn't -lastexternal, and hits end users (not servers) listed in the
x-originating-ip header.  To be fair, it is hitting some malware and
carder spam too, but not much that would otherwise be missed.  The list
is described as:

web.dnsbl.sorbs.net - List of web (WWW) servers which have spammer
  abusable vulnerabilities (e.g. FormMail scripts)
  Note: This zone now includes non-webserver
  IP addresses that have abusable vulnerabilities.

I've reduced the score on my installation to 0.5.  Would this kind of
thing be prevented by more people contributing to the mass checks?  Or
could it be adjusted downwards as Alex suggested?

CK


Re: FPs on RCVD_IN_SORBS_WEB

2017-03-09 Thread Cedric Knight
On 09/03/17 13:26, Kevin A. McGrail wrote:
> On 3/9/2017 8:22 AM, Cedric Knight wrote:
>> I've reduced the score on my installation to 0.5.  Would this kind of
>> thing be prevented by more people contributing to the mass checks?  Or
>> could it be adjusted downwards as Alex suggested?
> 
> I don't know if it's a floating rule but it sounds like it needs manual
> adjustment down.  How has 0.5 been working for you?

Well, not based on mass checks or any advanced analysis or anything, it
just stops obvious Facebook etc ham being marked as spam, so working
much better than the previous score of 3.253.

Compared to RCVD_IN_SORBS_SPAM, which I think Axb manually adjusted down
to 0.5 back in September, RCVD_IN_SORBS_WEB hits about a tenth as much,
but with a hit similarly being about a 25% risk of being a FP.  I could
write some local rules to try separating out the lastexternal hits and
see if it eliminates some FPs, but I doubt it will.  There was some
other experience upthread of RCVD_IN_SORBS_WEB (eg from Steve Zinski)
being a problem.

CK


Re: Bayes - one database per user or one for everybody?

2007-10-24 Thread Cedric Knight, GreenNet
Hi

I've a possibly related enquiry to an old one below, and would be
grateful for advice or pointers.

We haven't actually *needed* Bayes thanks to greylisting, remote URI
lookups and lots of custom rules.  While a few users are interested in
a filter they can manually train, most wouldn't bother, and most
receive similar types of ham mail, which makes me wonder whether a
single group-writeable database is best, currently
/var/amavis/.spamassassin/bayes, probably without bayes_auto_learn.

However, as some tech-savvy users do want their own Bayes db, one
thought was to use the default user .spamassassin folders but have
symbolic links to the central database for most users.  Is this crazy?
Has anyone tried it?  What are the implications on disk I/O of the
various options, including several GB worth of individual databases?
Is there anything I particularly need to look out for in terms of
performance on the live server?

The basic problem is that AFAIK bayes_path can't be set as a user
preference (global and then overridden by say a user preferences
database), as would be needed to have some users use a communal
database, and some their own.  I can see bayes_sql_override_username
could achieve a similar function, but that necessarily rules out
having DBM databases.  Users here do have their own home directories,
and would have ability to train via sending as MIME attachment, but no
shell access.  I realise as I write this that my wish is even more
difficult because amavis doesn't extract or pass user information to
SA in any case, and it would presumably mean running spamc in
procmailrc...  Is there any way of checking two dbs, one global and
one per-user?

A lot of questions, and any pointers or experience is appreciated.

One further one: are per-user databases important for accuracy of
auto-whitelisting?

Thanks

Ced

On 11 July 2007, Micha³ Jêczalik <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I'm migrating to SQL Bayes storage method. I have plenty of email
> accounts. By this time, all of them had their own database in their
> home directories. Such approach unfortunately consumes a lot of disk
> space, so now I'm thinking about bayes_sql_override_username option,
> which allows me to have one single database for all.
>
> I wonder if it's better to have a single database (which probably
> could be larger than the size of 8MB per user I allowed with DBM
> storage method) or keep per-user ones?
>
> So, what are the advantages of a single database? And does it make
any
> sense to make it larger? Maybe 8MB of tokens is simply enough and it
> doesn't pay to use more resources to seek in a larger base? Are
there
> any security or privacy problems with this setup?
>
> BTW, users don't have access to their databases, they are unable to
> feed any spam/ham manually, so loosing this ability is not a problem
> for me.
>
> Regards,
> --
> Michal Jeczalik, +48.603.64.62.97



Re: Intermediate Relay checked against RBL

2008-11-21 Thread Cedric Knight, GreenNet
Oliver Welter <[EMAIL PROTECTED]> wrote:
>   2.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
> bl.spamcop.net [Blocked - see
> ]
>   1.1 RCVD_IN_SORBS_WEB  RBL: SORBS: sender is a abuseable web
>  server [82.113.121.16 listed in
> dnsbl.sorbs.net]

In this situation, I'd add
  trusted_networks 82.113.121.16/32
to local.cf.  It looks like the O2 gateway has genuinely been abused.

If you are POP-before-SMTP authentication,
http://wiki.apache.org/spamassassin/POPAuthPlugin can add to
trusted_networks automatically.

>   1.3 MISSING_SUBJECTMissing Subject: header
>   0.1 RDNS_NONE  Delivered to trusted network by a host
> with no rDNS
>   1.5 MSGID_FROM_MTA_HEADER  Message-Id was added by a relay

These look like some problem with the MUA.  You might want to check
why the client isn't adding Message-Id and Subject headers.

HTH

CK