__FRAUD_JBU and __FRAUD_TCC

2019-08-13 Thread James Wilkinson
Hello all you happy people,

While debugging a FP on ADVANCE_FEE_3_NEW, I noticed that it included
body __FRAUD_JBU /\bforeign account\b/i
and
body __FRAUD_TCC /foreign (?:offshore )?(?:bank|account)/i

Correct me if I'm wrong, but won't anything matching __FRAUD_JBU also
match __FRAUD_TCC? It also means that the phrase "foreign account" has
twice the weight of "computer ballot system", "affidavits" or "as the
beneficiary", which seems wrong.

https://svn.apache.org/repos/asf/spamassassin/trunk/rules/20_advance_fee.cf
is the file in SVN trunk.

Is this worth raising as a bug (or improvement request)? Could someone
try variants of the ADVANCE_FEE rules without __FRAUD_JBU and see if
it performs better?

Unfortunately, I can't see enough of the original email, and would be
unlikely to have permission to supply it if I did.

Thanks,

James.


Re: SORBS bites the dust

2009-06-24 Thread James Wilkinson
mouss wrote (about the PBL):
> stop spreading FUD. if you know of false positives, show us so that we
> see what you exactly mean.
> 
> a lot of people, including $self, use the PBL at smtp time.

As usual, it depends on your definition of “false positive”.

If you mean “IP address that should not have been in the PBL but was”,
that’s one thing. It’s a consistent definition, but not very useful for
stopping spam.

If you mean “solicited and/or non-bulk email that would have been
stopped by the PBL”, then I’ve seen a number of small Indian and Chinese
companies who are unaware of a lot of things, including the existence of
the PBL and that it’s a Good Thing to send email through a smart host
with a consistent IP address and reverse DNS.¹

Obviously, everyone’s email stream is different. Mine includes a
commercially-significant amount of email from small companies in those
two countries, and probably doesn’t include email from other countries
where this takes place.

But by this definition, false positives do occur, and my company’s
SpamAssassin installation has to try to handle them.

James.

¹ Fortunately, they’re also unaware that signatures should be removed
when replying. That, a standard corporate signature including company
registration data, a standard domain in each Message-ID that doesn’t
appear in public DNS, a few negatively-scored custom rules to detect
these, and the AWL mean that once someone has responded to one of our
emails, they get automatically whitelisted. So at least existing
correspondents don’t get blocked.

-- 
E-mail: james@ | Top Tip: If you are being chased by a police dog, don’t
aprilcottage.co.uk | try to get away by crawling through a tunnel, going onto
   | a little see-saw, and jumping through a hoop of fire.
   | They are trained for that, you see.
   | -- “Bystander”, London magistrate


Re: 'anti' AWL

2009-05-02 Thread James Wilkinson
Charles Gregory wrote:
> Though again, legit senders that average negative are relatively rare  
> (well, on my system, anyways).

For what it’s worth, I’ve set up SA to identify replies to the
organisation’s email. It looks at the In-Reply-To and References headers
(our Message-IDs have a distinctive domain that’s not in public DNS and
isn’t easily guessable) and looks for the organisation standard
signature (again, this is very unlikely to come up in spam).

Most replies have one or the other, and it’s fairly common for a
correspondent to have an average score of less than -10.

It means that AWL really does work as an auto-white-list for us.

James.

-- 
E-mail: james@ | ... clueless he is not. He's just selective about which
aprilcottage.co.uk | clues to pay attention to.
   | -- Shmuel (Seymour J.) Metz


Re: Image spam and failing rule

2009-05-02 Thread James Wilkinson
Theo Van Dinter wrote:
> It's already been mentioned, but mimeheader is the right way to look
> at the headers of MIME parts.

Charles Gregory wrote:
> Look more closely at my rule. It is checking for TWO headers,
> one after the other (separated by \n), identifying a gif with no name.
>
>>> full /Content-Type: image\/gif;\n[^a-z]+name=""/

I think you’ll find that’s one header on two lines, and mimeheader copes
with it.

Hope this helps,

James.

-- 
E-mail: james@ | “As for Nitel, the state telephone monopoly, the less
aprilcottage.co.uk | said the better, which might well be the company’s
   | motto.”
   | -- The Economist, about Nigeria


Re: Image spam and failing rule

2009-04-25 Thread James Wilkinson
Gary Forrest wrote:
> Hi All
>
> We are receiving the same image spam many times, random text within the  
> body.
> The only common thing is a image attachment, with the filename in the  
> following format
>
>   DSL1234.png
>
> I have made the following ' RAWBODY ' rule
>
> /dsl[0-9]{4}\.png/i
>
> This rule works if the text appears in the body, when testing with a  
> hand telnet to port 25, but fails in practice.
> I think this is because the  RAWBODY rule does not search the text of a  
> attachment.
>
> example text of a spam
>
> --=_NextPart_000_0075_01C9C5DF.A7950570
> Content-Type: image/png;
>name="DSL6672.png"
> Content-Transfer-Encoding: base64
> Content-ID: 
> Content-Disposition: inline
>
> Any ideas ?

mimeheader LOCAL_DSL_ATTACHMENT Content-Type =~ /name="dsl[0-9]{4}\.png"/i
(Untested.)

Hope this helps,

James.

-- 
E-mail: james@ | top! to bottom from or backwards read not do I, post top
aprilcottage.co.uk | not do Please
   | -- Jeff Vian


Re: Another bad kind of spams, for Pfizer knockoffs with image

2009-04-24 Thread James Wilkinson
Charles Gregory wrote:
> I've been scoring the attachment name pattern with a 'full' test.
> But this will only work until they figure ways to randomize the 
> attachment names

The mimeheader plugin can do that and is much cheaper.

The

Abody
Ahead

part of the HTML seems to be a good spam sign, too. I can’t come up with
a test (other than a full test) that will actually match all of that
with 3.2.x: the rawbody rule matches one line at a time. A meta on both
Abody and Ahead in the rawbody seems to do a pretty good job.

To what extent should Windows Mail be counted as a variant of
Outlook/Outlook Express? It’s not caught in __ANY_OUTLOOK_MUA: should it
be?

Hope this helps,

James.

-- 
E-mail: james@ | ... a sign carefully conveying in pictograms the fact
aprilcottage.co.uk | that you should not leave wheelchairs on a certain river
   | bank as they would roll down the hill and the crocs would
   | eat the passenger.-- Skud


Re: sa-compile

2009-04-23 Thread James Wilkinson
I wrote:
> meta  __SEEK_LZH2GT  0 # Microsoft Office 2003 Pro
> meta  __SEEK_O1TQTY  0 # aving trouble viewing this e
> meta  __SEEK_QGCXIK  0 # lots of dots
> which relies on the names being derived from the string.

Benny Pedersen wrote:
> the above __SEEK_* is random so you disable random seek :=)

I’m sorry, Benny, I *really* don’t understand that sentence. What’s
random? As Justin said, the names are not random, they’re hashes of the
string. So the same string will always have the same name.

Look at a spamassassin -D run sometime: you’re quite likely to see
__SEEK_ and __SEEK_FRAUD_ rules being combined, because the same string
crops up in both corpuses. The names of the rules will be identical
apart from the _FRAUD.

It is, of course, possible (but very unlikely) that a different string
has the same hash as one I’ve disabled. That is a known problem with all
hashes, and is inherently present in the naming scheme Justin as chosen.

James.

-- 
E-mail: james@ | top! to bottom from or backwards read not do I, post top
aprilcottage.co.uk | not do Please
   | -- Jeff Vian


Re: sa-compile

2009-04-21 Thread James Wilkinson
Karsten Bräckelmann wrote:
> By looking at the sub-rules' names I got the impression they are just
> random. But maybe they actually are somehow based on the rule's content?
> Never checked.  Justin?

Justin Mason replied:
> yep, they're derived from a hash of the string.

Is that documented, or could it be documented somewhere?

I haven’t had to touch it recently, but I do have a small .cf of false
positives:
meta  __SEEK_LZH2GT  0 # Microsoft Office 2003 Pro
meta  __SEEK_O1TQTY  0 # aving trouble viewing this e
meta  __SEEK_QGCXIK  0 # lots of dots
which relies on the names being derived from the string.

Thanks,

James.

-- 
E-mail: james@ | Humans are humans. No one grouping is all good or all
aprilcottage.co.uk | bad, no one group has a monopoly on honour, decency or
   | truth. Anyone that claims to have one is a
   | dishonourable, indecent liar.   -- Phil Launchbury


Re: Near capitable punishment for all capitals?

2009-04-06 Thread James Wilkinson
Mark wrote:
> Eh, it's no biggie, really, I was just surprised it scores as high as,
> say, being listed on DCC. But then again, who actually *does* write in all
> caps, except a spammer? :)

Quite a few of my employer’s correspondents: and not just in the
subject!

I know a number of my users who do a lot of bulk data entry have Caps
Lock permanently on: much of our company data is in all-caps, and users
are strongly encouraged to keep new data looking the same way. And some
of them don’t see any reason to turn it off for a quick email. So site
config has a much lower score for this rule.

I’m not sure it’s worth putting these emails in a corpus: once you start
cherrypicking emails to make a point, the automatic score generation is
no longer statistically relevant.

James.

-- 
E-mail: james@ | “The duke had a mind that ticked like a clock and, like a
aprilcottage.co.uk | clock, it regularly went cuckoo.”
   | -- Terry Pratchett, Wyrd Sisters


Re: quirks with bayes ?

2009-03-31 Thread James Wilkinson
I wrote (about the AWL):
> In the absence of any sort of expire mechanism¹ (see, for example,
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6059) one can do
> a crude approximation by periodically resetting it.

LuKreme wrote:
> But why would you want to ever reset the AWL?

To quote that bug report:
at this stage I don’t think it’s worth having on by default, given
the problems it causes for disk load and out-of-control bloated db
files eating lots of disk space and memory, vs the marginal gains in
accuracy it provides.  let’s set it off by default.

One can temporarily get the db files under control by deleting them and
letting SpamAssassin recreate them from scratch.

I continued:
> Yes, you lose a lot of history. But it’s no worse than not having it at
> all, which is what is due for 3.3.0.

LuKreme wrote:
> 3.3.0 has no AWL?

Not by default: see
http://svn.apache.org/viewvc/spamassassin/trunk/rules/v310.pre?r1=563527&r2=759790&pathrev=759790

James.

-- 
E-mail: james@ | Q. "Why can't I print?"
aprilcottage.co.uk | A. "Because you're not a printer."
   | -- Stephen Judd


Re: quirks with bayes ?

2009-03-31 Thread James Wilkinson
LuKreme wrote:
> On 31-Mar-2009, at 11:13, Lucio Chiappetti wrote:
>> And even resetting the AWL ...
>
>
> Why would you reset the AWL? I can't see any circumstance on which that 
> would be a good idea.

In the absence of any sort of expire mechanism¹ (see, for example,
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6059) one can do
a crude approximation by periodically resetting it.

Yes, you lose a lot of history. But it’s no worse than not having it at
all, which is what is due for 3.3.0.

James.

¹ Unless one uses a real SQL database.
-- 
E-mail: james@ | We're beginning to see the results of bringing a rubber
aprilcottage.co.uk | chicken to a nuclear battle.
   | -- “Whitefang”, on groklaw.net


Re: interesting flash attack in spam

2009-03-19 Thread James Wilkinson
John Hardin wrote:
> No reason it shouldn't be. I'd suggest something like a rawbody match on  
> /]/i meta'd with HTML_MESSAGE should be worth a few (dozen)  
> points.

This would seem to FP on Microsoft HTML generated by certain versions of
Word. One example:

   


   
   

Re: Dealing with low scoring spam - tighter MTA integration

2009-03-05 Thread James Wilkinson
Andrzej Adam Filip wrote:
> At "RCPT TO:" stage there are available:
> * connecting client IP address (last mail hop)
>   so big part of DNSBL and DNSWL tests *CAN* be used
> * envelope sender for SPF based tests
> * envelope sender and envelope recipient for auto white/black listing
>   (producing some kind of grey-listing based for first attempt from
>   unknown reputation source)

Are you thinking that it might be good to tie this in to the
SpamAssassin AWL score? So a sender with an existing low AWL might be
allowed through even if the sending host gets on one or two DNSBLs?

And you’re missing the possibility of doing reverse DNS lookups, too.

James.

-- 
E-mail: james@ | A: Because people don’t normally read bottom to top.
aprilcottage.co.uk | Q: Why is top-posting such a bad thing?
   | A: Top-posting.
   | Q: What is the most annoying thing in e-mail and usenet?


Re: does SBL/XBL have a plugin?

2009-03-01 Thread James Wilkinson
Ricardo Kleemann wrote:
> Are SBL/XBL tests automatically enabled or is there a plugin I need to enable?

SBL/XBL are tested as part of the SpamAssassin ZEN list for 3.2.x if you
have network tests enabled.

Hope this helps,

James.

-- 
E-mail: james@ | “Sir, they’ve taken Mr. Rimmer!”
aprilcottage.co.uk | “Quick, let’s get out of here before they bring him
   | back!”
   | -- Kryten and Cat, ‘Red Dwarf’


Re: Spamassassin Upgrade

2009-03-01 Thread James Wilkinson
Kban35 wrote:
> I just upgraded my SA to 3.2.5 and now when I look in my /etc/init.d I do not
> see spamassassin listed anywhere in there.

Which OS? How did you upgrade – cpan? yum? apt-get? From where did you
get 3.2.5?

Thanks,

James.

-- 
E-mail: james@ | “Drums must never stop. Very bad if drums stop.”
aprilcottage.co.uk | “Why? What will happen if the drums ever stop?”
   | “Bass solo.”


Re: Googlegroups related spam

2009-02-24 Thread James Wilkinson
Karsten Bräckelmann wrote:
> However, there are some highly abusive patterns sticking out. A google
> URI with a ../ in the path? Sure! Score 2. :)  Alternating alpha and
> numbers might be worth another point. A question mark in a google groups
> URI? Punish that.

You can eliminate links to Usenet posts, too – make sure that the group
name doesn’t contain a dot.

James.

-- 
E-mail: james@ | Q. "Why can't I print?"
aprilcottage.co.uk | A. "Because you're not a printer."
   | -- Stephen Judd


Re: not seeing any advantage to sa-learn?

2009-02-08 Thread James Wilkinson
Ricardo Kleemann wrote:
> I'm running spamc/spamd 3.2.4 on a Ubuntu 8.04 server, it's the
> standard Ubuntu package. I have the default settings for Bayes (with
> auto_learn) and I'm using a mysql backend for BayesStore.

It’s worth noting that Bayes, by itself, is not allowed to condemn spam.
Its maximum score, by default, is 3.5. That means that if you’ve got the
standard required_score of 5, the spam will not be marked unless it also
hits other rules with a combined score of at least 1.5 – even with
perfect training.

If you are confident in your training (all spam gets learnt as spam, and
all non-spam gets learnt as non-spam) you may wish to try raising the
BAYES_99 score. Note that, in theory, 1% of email hit by this rule is
non-spam – but also that another theory says that the maths behind that
figure are wrong.

It can be difficult to stop “narrowcast” spammers that spam a limited
mailing list (perhaps industry specific), using standard bulk-mailing
tools, and who don’t get onto the various DNSBLs. Their spam looks like
solicited bulk email, there’s practically nothing about the spam that
distinguishes it from wanted email that isn’t specific to that
particular spammer, and they stay out of the notice of antispam
organisations.

If you want to stop spammers like these, you may have to create specific
rules (either sending domains, or sending IP address, or specific
phrases in their emails – or best of all, all three, with each rule
attracting a few points). The good news is there aren’t that many of
these spammers, and they tend to try to look like a legitimate company,
so they will usually send using a constant identity, probably use a
constant domain, and they’ll either use their own servers or e-mail
service providers. So static rules aimed at them can be very effective.

Alternatively, you could post sightings on
news.admin.net-abuse.sightings and see if Spamhaus pick up on them. That
way, more people can be spared their spam.

James.

-- 
E-mail: james@ | Remember, half-measures can be very effective if all you
aprilcottage.co.uk | deal with are half-wits.


Re: Are custom rules ignored if a white list entry is in playq

2009-02-04 Thread James Wilkinson
Gary Forrest - Netnorth wrote:
> Question, are custom rules ignored if a white list entry has the same 
> email address ?

Quick point – if you have short-circuiting turned on, then they may well
be…

James.

-- 
E-mail: james@ | Which do you consider was the stronger swimmer,
aprilcottage.co.uk | (a) The Spanish Armadillo,
   | (b) The Great Seal?
   | -- ‘1066 and All That’


Re: sought rules updates

2008-12-10 Thread James Wilkinson
LuKreme wrote:
> I read the man page, where there is no mention of how to obtain this  
> number. In fact, I read many posts, and many webpages and have still not 
> found that information.  I've seen the IDs in others posts, sure, but 
> where do they originate?
>
> Even searching the wiki (which just links to the previously linked 
> http://taint.org/2007/08/15/004348a.html )is merely a "here's the 
> random-looking digits you pass to --gpgkey" and not a "here's what the 
> --gpgkey is, means, and how it's generated".

These numbers are a way of identifying those keys. They are a
cryptographically strong hash: the idea is that it’s easy for users to
use numbers that short to confirm that the key they’ve received is the
key they thought they were receiving, and very difficult for any
attacker to generate another key with the same hash.

> Why doesn't sa-learn simply trust the keys that are added to its  
> keychain without this extra (and at least for me, confusing) step? I'm  
> starting to think the simplest way to do this is just ignore the gpg  
> flags entirely and use --nogpg.  What's the downside to this (other than 
> the obvious DNS hijacking to point the URL to some spammer site with bad 
> data which seems a remote enough chance to ignore).

That’s your choice.

Hope this helps,

James.
-- 
E-mail: james@ | “Right lads, we’ve got 45 minutes to score 37 goals.
aprilcottage.co.uk | No problem with that -- the other team just did.”


Re: skew the AWL on spam report

2008-12-04 Thread James Wilkinson
Matt Kettler wrote:
> If a spammer is using the same sending address over and over again,
> blacklist them entirely.
> 
> That said, I've never seen a spammer re-use the same address twice.

Doesn’t mean it doesn’t happen – only that you’re not on any
“narrowcast” lists (e.g. “Email 200,000 British business addresses!”)

A number of otherwise-legitimate companies will send to those lists.
SpamAssassin’s fixed rules will do very little against these spammers –
they’re using the same e-mail engines, the same language, and the same
sort of Internet connections as other companies with opt-in mailing
lists. For some addresses, they represent the vast majority of spam that
gets past SpamAssassin.

Fortunately, they often do use and re-use the same IP addresses and the
same domains, and can be blocked that way.

http://groups.google.com/group/news.admin.net-abuse.email/browse_thread/thread/78ac67e56ce35d90
lists a good (bad?) selection of them…

James.

-- 
E-mail: james@ | Remember, half-measures can be very effective if all you
aprilcottage.co.uk | deal with are half-wits.


Re: Help with bayes

2008-11-18 Thread James Wilkinson
Kai Schaetzl wrote:
> well, but how? By auto-learning? In that case you are just multiplying your 
> problem. It seems a lot of spam gets miscategorized as ham. Auto-learning 
> that spam as ham means enforcing this miscategorization and that's what you 
> see as a result.

When SpamAssassin decides whether or not to learn a message, it does not
take Bayes scores into account.

So if you have a message that only hits BAYES_00 (with a score of either
-2.3 or -2.6) and another rule with a score of 0.2, that message will
not be learnt (unless you change the limits), because 0.2 is greater
than 0.1 (the limit).

Hope this helps,

James.

-- 
E-mail: james@ | ‘Sir, they’ve taken Mr. Rimmer!’
aprilcottage.co.uk | ‘Quick, let’s get out of here before they bring him
   | back!’
   | -- Kryten and Cat, ‘Red Dwarf’


Re: Problem with learning bayes

2008-11-05 Thread James Wilkinson
Thomas Zastrow wrote:
> I have a new server where I installed Spamassassin. Next, I took a
> maildir with a lot of spam and learned the filter:
> 
> sa-learn --spam --showdots /path/to/maildir

Did you learn some non-spam, too?

Bayes needs at least 200 of each before it will work.

Hope this helps,

James.

-- 
E-mail: james@ | ...a probably apocryphal bilingual sign in darkest North
aprilcottage.co.uk | Wales. In English it says "70mph" and in Welsh "slow
   | down, sharp bend ahead".
   | -- Peter Corlett


SBL false positives?

2008-09-25 Thread James Wilkinson
mouss wrote:
> in which sublist? xbl, sbl or pbl? and when you say "a lot", how many?  
> can you show an example of an IP that you consider as an FP?

Well, since you asked…

I’m not the Original Poster, but I consider most of
http://www.spamhaus.org/sbl/sbl.lasso?query=SBL60174 to be a FP *when
used with SpamAssassin rules*.

This is a /19 range of VSNL dynamic addresses, which had (correctly)
been put on the PBL. I understand that many smaller Indian companies can
only get a dynamic IP, want to run an internal mail server (often
Exchange), and forget to relay outgoing e-mail through an appropriate
external mailserver.

At least one VSNL customer ran into trouble sending e-mail due to the
PBL listing, and rather than using a suitable relay, systematically (and
repeatedly) removed the entire /19 from the PBL! Spamhaus then stuck the
whole range into the SBL.

This is fine when the SBL is merely used against the last external
relay, but SpamAssassin will test *all* IP addresses in the headers
against the SBL. So non-spamming Indian companies get hit even if they
relay through a good mailserver.

I consider it a stretch putting this range under the SBL: the “Policy &
Listing Criteria” says that the range “appear[s] to Spamhaus to be under
the control of, or made available for the use of, senders of Unsolicited
Bulk Email (“spammers”).” This doesn’t seem to be the reason in this
case: there doesn’t seem to be any evidence that the individuals who
removed the range from the PBL intended to send unsolicited bulk e-mail.
It’s abuse of the Spamhaus web site, not directly abuse of e-mail, and
would better be handled by a PBL range which can’t be edited through the
website.

I wrote to Spamhaus querying the listing, but have heard nothing
(probably not surprisingly, since I’m not VSNL. Thank goodness!) I
haven’t raised a SpamAssassin bug, since I don’t think it *is* a
SpamAssassin bug.

James.

-- 
E-mail: james@ | 'Short for "Sic Transit Gloria Humanorum", which is Latin
aprilcottage.co.uk | for "There goes the neighbourhood!"'
   | -- Menno Willemse


Re: CPAN Install Fails

2008-09-03 Thread James Wilkinson
Bob Cohen wrote:
> I'm running Fedora v9.  All of the prerequisite and optional modules  
> installed with no problem.  Suggestions?

Well, there’s always “install it with yum”:
yum install spamassassin

Hope this helps,

James.

-- 
E-mail: james@ | “It has taken 24 years to get the Reichstag wrapped.
aprilcottage.co.uk | Chancellor Kohl said it would only be wrapped over his
   | dead body, so sensing an opportunity the Bundestag
   | outvoted him.”  -- The Guardian


Re: Honeypot Email Addresses

2008-08-19 Thread James Wilkinson
jdow wrote:
> I believe you could "blacklist_from". That would train SpamAssassin's
> Bayes filter -

Or not. Both USER_IN_BLACKLIST and USER_IN_BLACKLIST_TO have tflags set
to userconf noautolearn (in current 3.2.5 rules), which means that
SpamAssassin will ignore their scores when deciding whether to
autolearn.

> if you use global (shudder) Bayes.

Um. There is research ( http://www.ceas.cc/2007/papers/paper-74.pdf )
suggesting that “globally-trained text classification easily outperforms
personally-trained classification under realistic settings” (the
text-classifier they used was Bayes).

James.

-- 
E-mail: james@ | Ah yes. Thingie and thingamagig, and I'll throw in
aprilcottage.co.uk | whatchamacallit too. Because in tech support, "button" is
   | sometimes a bit too technical for the average caller.
   | -- "mr_scoot"


Re: Fwd: Attn: webmail Subscriber

2008-08-15 Thread James Wilkinson
John Hardin wrote:
> Is there any reason the base rules should _not_ contain a
> whitelist_from_spf or whitelist_from_rcvd for the list?

Larry Nedry wrote:
> Would you really want to auto-train your bayes with mail from this list?

The whitelist rules are ignored when SpamAssassin decides whether to
auto-train a message.

Mail::SpamAssassin::Plugin::AutoLearnThreshold says:
   Note that certain tests are ignored when determining whether a message
   should be trained upon:

   ·   rules with tflags set to ’learn’ (the Bayesian rules)

   ·   rules with tflags set to ’userconf’ (user configuration)

   ·   rules with tflags set to ’noautolearn’

All the WHITELIST rules have tflags of ‘userconf nice noautolearn’ (at
least).

Hope this helps,

James.

-- 
E-mail: james@ | Just remember: 1 virus 3 viriii
aprilcottage.co.uk |2 virii 4 viriv
   | -- Matt S Trout


Re: Moving ham/spam from Exchange folders to sa-learn?

2008-06-19 Thread James Wilkinson
Henry Kwan wrote:

> Thanks for the script but I don't think I can use it as Exchange2K7
> has dropped IMAP support for public folders.  Or least this blog post
> from MSFT seems to indicate:
>
> http://msexchangeteam.com/archive/2006/02/20/419994.aspx

I don't have any Exchange 2007 experience, but at least on 2003 "public
folder" and "normal mailbox into which everyone can copy e-mail and to
which no-one can send e-mail" are two separate concepts. And you can use
IMAP to read the contents of the latter.

Unfortunately, setting that up involves configuring Outlook on each
client PC, so depending on the number of users, this may not be
practical.

Hope this helps,

James.
-- 
E-mail: james@ | Never ask, "Oh, why were things so much better in the old
aprilcottage.co.uk | days?" It's not an intelligent question.
   | -- Ecclesiastes 7 v. 10


Re: Support for FC6, F7, F8?

2008-05-19 Thread James Wilkinson
Eric Wood wrote:
> Is there a website or repository where I can yum upgrade to the latest 
> spamassassin from, say, a FC6 system?

F7 and F8 (and F9) have version 3.2.4 in the standard Fedora updates
repo: just
sudo yum update
Note that Fedora 7 will go out of support in a month or so, and FC6 is
no longer supported: this means you won't get security updates, so these
systems should not be exposed to known-malicious traffic (like spam...)

Hope this helps,

James.

-- 
E-mail: james@ | The Inquirer was set up by Mike Magee (ticker: DODGY),
aprilcottage.co.uk | who co-founded well-known IT site The Register seven
   | years ago after countless years editing and managing all
   | manner of things which could be Aardvark Today and Fish
   | Farming Monthly but weren't.  -- The Inquirer, 2001


Re: Bayesiam Learning Paths for Spamassassin

2008-04-20 Thread James Wilkinson
> I have SA 3.17 running with amavisd-new, dovecot and Postfix 2.4.3 and
> Clama/v on freebsd 6.1
> 
> I am trying to"teach" sa using the following
> 
> sa-learn /var/mail/vmail/example.com/user/.INBOX.spam/cur/
> 
> this is a maildir I have put around 175 spam messages in..

find cur new -type f  -mtime -60 | xargs sa-learn --showdots --spam

works for me on Fedora 8 with SA 3.2.4.

Hope this helps,

James.

-- 
E-mail: james@ | Users are not like normal particles; they wave when you
aprilcottage.co.uk | observe them.
   | -- Andrew Dalgleish


Re: Canadian Spam - tired of writing rules!

2008-04-20 Thread James Wilkinson
Michael Hutchinson wrote:
> There's been a rise in Canadian Pharmaceutical Spam lately. This spam is
> quite basic, generally only including some text and a link. The link is
> always changing so we can't score against that.
> 
> About the only other thing it scores on is the FORGED_HOTMAIL_RCVD rule,
> which doesn't have a big enough score to push the Spam over the 5.0
> points threshold.
> 
> Does anyone have some effective rules / rulesets / update channels that
> would help to eliminate this stuff? I've been writing rules against it
> for the past few months. We've just employed our 61st rule against this
> type of Spam. Admittedly a lot of those are just basic phrase matching,
> and aren't complicated rules - but then the Spam changes enough each
> cycle, that it avoids complicated rules that I might write.

I find that a meta rule where the body contains "http://"; and has no
paragraphs above 100 to 140 characters¹ will give a few false positives,
so you can't score it too highly, but it catches a *lot* of spam.

The ham that matches this rule tends to be surprisingly rare, doesn't
score highly on anything else, and is from regular correspondents (so
the AWL helps).

If any of the SA developers are reading, I'd love to see how rules like
this play in the sandbox...

James.

¹ I'd like to do it on body length, but I can't find a suitable way of
doing this. body /.{100}/ will match on any e-mail which *has* got a
paragraph of > 99 characters...

-- 
E-mail: james@ | The opinions expressed herein are not necessarily those
aprilcottage.co.uk | of my employer, are not necessarily mine, and in fact are
   | probably not necessary at all...


Re: Logging

2008-04-02 Thread James Wilkinson
Skip wrote:
> I am on a linux, shared hosting site (Bluehost.com).  I don't 
> know how I can get it into the startup script for that box, and I only have 
> access to my own home directory.  That may be a showstopper right there.  
> I'll have no way of knowing when they reboot the box.

Earlier, Matt Kettler wrote:
> Running from cron is only for things you want to run 
> at regular intervals. It is not a valid way for starting daemons (ie: 
> something you want to run once and leave running)

Actually, something like this (from man 5 crontab on Fedora 8) might be
relevant:

   These special  time  specification  "nicknames"  are  supported, which
   replace the 5 initial time and date fields, and are prefixed by the 
[EMAIL PROTECTED]
   character:
   @reboot:Run once, at startup.

Skip may have permissions to edit his own crontab (with the crontab
command) and set a daemon going at reboot time.

There may be CPU time quota constraints, of course.

Hope this helps,

James.

-- 
E-mail: james@ |"Just for once, I wish we would encounter an alien
aprilcottage.co.uk | menace that wasn't immune to bullets..."
   | -- The Brigadier, 'Doctor Who'


Re: are the NORMAL_HTTP_TO_IP scores still valid?

2008-01-16 Thread James Wilkinson
Matt Kettler wrote:
> Yes. In fact, IP based URLs occur more commonly in nonspam than spam. 

Chip M. wrote:
> Matt, yes this is correct, however in this particular case "nonspam" is 
> perhaps a bit broad.  It's been my experience that these almost always
> occur in mass marketing ham, not person-to-person ham.

In my (limited) experience, nonspam IP-based URLs almost always have
paths after the IP address, whereas a *lot* of spam just points to the
IP address.

Does this match anyone else's experience?

James.

-- 
E-mail: james@ | How about an Australian-language version?
aprilcottage.co.uk | 'Your program just attempted an illegal instruction.
   | No worries, mate.'
   | -- Paul Tomblin


Re: Whitelist_from_rcvd not working

2008-01-04 Thread James Wilkinson
Dan Barker wrote:
> My whitelist_from_rcvd tags don't hit. I believe this has been happening
> since my upgrade from 3.1.7 to 3.2.3.



> Just in case there is something [else] I've done silly, my local.cf is at
> http://www.visioncomm.net/temp/080104Local.txt):

Here's what may be a thoroughly stupid question -- what does your local
network look like?

$ host mail.visioncomm.net
mail.visioncomm.net has address 74.254.46.133

Is that server behind a NAT router, or does it actually have that IP
address configured? If so, what happens if you add 74.254.46.133 to
local_networks and trusted_networks?

Hope this helps,

James.

-- 
E-mail: james@ | "Right lads, we've got 45 minutes to score 37 goals.
aprilcottage.co.uk | No problem with that -- the other team just did."


Re: I have a probleme with my content analysis

2007-08-02 Thread James Wilkinson
lochness wrote:
> 
> I'm running on windows and i'm using one software call "NoSpamtoday" that
> software is based on spamassassin I modify local.cf file but in my test I
> have this message bellow I put required_hits on 5 but in the message I have
> 0 so how can I apply my config

NoSpamToday puts its own value of required_hits in the local.cf file --
your setting might be above that, and the last value "wins".  Use the
admin tool, or look for all values of required_hits in the local.cf
file.

Hope this helps,

James.

-- 
E-mail: james@ | Machine. Unexpectedly, I’d invented a time
aprilcottage.co.uk | -- Alan Moore, "Very Short Story"
   |http://wired.com/wired/archive/14.11/sixwords.html


Outlook-style message-IDs?

2007-04-04 Thread James Wilkinson
Hello, all you happy people,

I have in my possession a legitimate e-mail with 
Message-ID: <[EMAIL PROTECTED]>
but no sign that it comes from a Microsoft product.

As far as I can see, this one header is causing it to get
 2.8 RATWARE_OUTLOOK_NONAME Bulk email fingerprint (Outlook no name)
found
 1.7 MSGID_DOLLARS  Message-Id has pattern used in spam
 1.9 RATWARE_MS_HASHBulk email fingerprint (msgid ms hash) found
for a total of 6.4 points.

It appears that these three rules are all scoring on effectively the
same thing. Is this intentional?

(For the time being, I intend to add negative rules to score these down
when all three are present).

Thanks,

James.

-- 
E-mail: james@ | actor: (n) a piece of scenery that has the audacity to
aprilcottage.co.uk | move once lit.