Re: please help, getting hammered with snowshoe spam

2009-02-12 Thread Chip M.
Rob McEwen wrote:
>(2) ivmSIP/24 is attempting a very dangerous mission... which is to
>preemptively block snowshoe spam by listing entire /24 blocks when
>only a handful of IPs on that block have sent spam so far. But keep
>in mind that (a) specifically--ivmSIP is going to block some spam
>where that snowshoer hadn't sent from enough IPs to possibly be
>listed (yet!) on ivmSIP/24 AND (b) The reason I call ivmSIP/24's
>mission as "dangerous" is because there is a high risk of FPs
>whereby spammers and legit senders share blocks of IPs within the
>*same* /24 block. I've taken steps to greatly minimize that amount
>of time that happens... but it is almost impossible to prevent this
>altogether. Therefore, both medium-to-large ISPs and those who are
>extremely concerned about FPs should use ivmSIP/24 for scoring
>instead of blocking--in spite of my continued attempts to get
>ivmSIP/24 to have just as few FPs as ivmSIP. (and I'm still working
>on that!)

Rob, yes, I'm with you there. :)

I'm also sympathetic to your lawsuit concerns.
There's abundant horror stories, here and elsewhere, about unskilled
sysadmins improperly implementing an RBL, and outright blocking on
DNS data that was meant to be ADVISORY only.

However, the snowshoe problem has gotten so bad, I've started
"labelling" all ranges of any host when I find enough "pure" snowshoe
blocks in their space.

I do NOT score on these merely "labelled" ranges, but use them as
the equivalent of an SA "meta", in combination with the other tests
I mentioned previously (i.e. on Barracuda, has an unsubscribe phrase,
has a "teaser" phrase in From/Subject).

I'm finding that is extremely effective, and some combos
(reliable "teaser" + any other single test) have zero FPs (so far).

My own IP-to-Nation data file (both real and hand-classified
"virtual" nations) is only used by my own people (all somewhat
cautiously screened).  I write all the "base" rules, and we have a
kick-butt FP pipeline, so I don't have to worry about a random user
misunderstanding what a particular IP block classification is for.
I can be far more aggressive than most. :)

What I want to do is expand my merely-labelled IP ranges, and was
hoping I could do a straight import of your /24 list into its own
unique country code, then run some MassChecks, and see how that goes.
Ideally, that should be helpful to both us and you.


>(4) And I'm about to implement a large improvement to ivmSIP. I
>found a bug in the programming (that had been there all along) which
>was preventing some deserving IP from getting into ivmSIP. So ivmSIP
>is about to get better. Therefore, substantial improvements are
>about to happen to BOTH ivmSIP and ivmSIP.24 --therefore, I'd prefer
>that any publicly available stats/testing be done in a week or two
>from now--AFTER these improvements are made.

I understand about you wanting to review your data first, so no
pressure. :)

I would be happy to do a non-published "quick" look if you like,
then send you any FPing-IPs I see, and wait until you're happy with
your own data before I shared any public results.


>(5) regarding the "shared hosting environment"... if ALL of these
>mail servers resolve their queries using the *same* locally hosted
>DNS server for resolving queries, then there is only need for a
>single setup of the lists, for that one DNS server--and then there'd
>be a single price based on the cumulative total number of
>mailboxes--and, therefore, many quantity discounts would apply (or,
>am I not understanding you? Aren't these all hosted at the same
>physical location?... or multiple datacenters owned by the same
>company?)

I should have been clearer:
I am _NOT_ a sysadmin/mailadmin.  I'm justaprogrammer. :)
About five years ago, a volunteer written filter at the main host I
was using, broke.  I ended up fixing it, which started me down the
path of filter programming. :)

Initially, my goal was merely to "fill in the holes" that exist in a
shared hosting environment (where SA's full potential is limited by
the need to target the lowest-common-denominator).

It turned into a much larger project when I realized the data
analysis potential of hand-classified data from a diverse group of
smallish domains.  All my volunteers grasp that they're helping each
other, and are very motivated & enthusiastic.

The project is still rather small (about forty domains, with about
half a million spams per month), however it's a nice size and quality
for doing serious research. :)

We're split among several different hosts, so the only way it would
be viable to use your lists in real-time, would be to set up our own
DNS server, only known to project members.  Since most of us are only
receiving a trickle of snowshoe spam, that's not viable at this time.

The ones who receive more than a trickle, receive a FLOOD.  As I
mentioned, in some cases 80% of their FNs are from snowshoers.
- "Chip"





Re: please help, getting hammered with snowshoe spam

2009-02-12 Thread Chip M.
While reading the "html picture spam" thread, it occurred to me to
check the sizes of Ham hitting Barracuda.

The largest one was 113,351 bytes.

I then checked the nation-of-origin for all Barracuda hitting
"large" spams (msg size >= 256 kb), and (during the 3-week period
I checked) only 4 out of 190 were from non-snowshoe IP ranges.

Actually, it was a bit more, but a quick review of them resulted
in me moving a few into my snowshoe "virtual" nations. :)

I've just added that as an extra test (i.e. on Barracuda plus
"large" message size), currently scored at the equivalent of
about 1 SA point.

I forgot to mention another combo test:
if it's on both Barracuda and the Day-Old-Bread list, I add the
equivalent of about 1 SA point.  Zero FPs so far.

I'll review all those scores and tests in a few more weeks.
- "Chip"




Re: please help, getting hammered with snowshoe spam

2009-02-04 Thread Rob McEwen
Chip M. wrote:
> *** Rob McEwen: ***
> Would you be willing to provide your /24 list, for even a short period,
> in some sort of plain text format (maybe one CIDR per line?), so those
> of us with good hand-classified corpi could try out your data?
>
> Most of my users are in a shared hosting environment, so they can't use
> your list suite as-is.  Based on what reliable people have posted, some
> of my users should probably benefit from your /24 list.  I'd be very
> glad to provide you with a list of any FPs I find. :)

Chip,

Here are some thoughts:

(1) if you are discussing hostkarma and barracuda's lists, then ivmSIP
is probably a more equivalent list to compare to rather than ivmSIP/24.
And they both work together VERY well for blocking snowshoe spam.
Moreover, I contend that the combination of my three lists (ivmSIP,
ivmSIP/24, and ivmURI), working together (and even if using ivmSIP/24 in
scoring mode), is the best and most cost effective solution specifically
for blocking hard-to-catch for snowshoe spam.

(2) ivmSIP/24 is attempting a very dangerous mission... which is to
preemptively block snowshoe spam by listing entire /24 blocks when only
a handful of IPs on that block have sent spam so far. But keep in mind
that (a) specifically--ivmSIP  is going to block some spam where that
snowshoer hadn't sent from enough IPs to possibly be listed (yet!) on
ivmSIP/24 AND (b) The reason I call ivmSIP/24's mission as "dangerous"
is because there is a high risk of FPs whereby spammers and legit
senders share blocks of IPs within the *same* /24 block. I've taken
steps to greatly minimize that amount of time that happens... but it is
almost impossible to prevent this altogether. Therefore, both
medium-to-large ISPs and those who are extremely concerned about FPs
should use ivmSIP/24 for scoring instead of blocking--in spite of my
continued attempts to get ivmSIP/24 to have just as few FPs as ivmSIP.
(and I'm still working on that!)

(3) Along these lines, I'm just about to make substantial changes to
ivmSIP/24--so that (a) in many cases, it will list subranges instead of
the whole /24 list and (b) that way, when I'm forced with a decision
about removing an ivmSIP/24 listing so as to not hurt an innocent sender
sharing a block with an egregious spammer.. I can then "have my cake and
eat it to"--I can avoid more innocent IPs... but then NOT have to give
the spammers a pass by delisting the whole /24 block--as I'm sometimes
having to do now. (I often use this to a put pressure on hosters to
remove the spammers FIRST--but I can only do so much of that--playing
that game take tremendous time and resources--and has large lawsuit risks!)

(4) And I'm about to implement a large improvement to ivmSIP. I found a
bug in the programming (that had been there all along) which was
preventing some deserving IP from getting into ivmSIP. So ivmSIP is
about to get better.  Therefore, substantial improvements are about to
happen to BOTH ivmSIP and ivmSIP.24 --therefore, I'd prefer that any
publicly available stats/testing be done in a week or two from
now--AFTER these improvements are made.

(5) regarding the "shared hosting environment"... if ALL of these mail
servers resolve their queries using the *same* locally hosted DNS server
for resolving queries, then there is only need for a single setup of the
lists, for that one DNS server--and then there'd be a single price based
on the cumulative total number of mailboxes--and, therefore, many
quantity discounts would apply (or, am I not understanding you? Aren't
these all hosted at the same physical location?... or multiple
datacenters owned by the same company?)

-- 
Rob McEwen
http://dnsbl.invaluement.com/
r...@invaluement.com
+1 (478) 475-9032




Re: please help, getting hammered with snowshoe spam

2009-02-04 Thread Chip M.
This snowshoe stuff has been a PITA for a while.

For most of my users (particularly the Geeks), it's not even on their
radar.

For others, (inluding my most complex domain), 80% of their FNs are
from snowshoers.

As well as the usual battery of anti-spam tests,
I'm using a layered/meta approach of tests:
1. "teaser" header word checks (see below)
2. sender IP checking against large hosts that have been known
   to host snowshoers (hand-maintained)
3. unsubscribe phrase(s) in the body
4. Barracuda

If you look at several snowshoe samples, you'll note that the "From"
and/or "Subject" pretty much ALWAYS contain some sort of "teaser"
word(s).

Those are the two headers that are (always?) displayed to the potential
victim, so the spammer has a strong incentive to continue using those
to try to lure in the victim.  They're a VERY good target for new
rules.

I've broken these "teasers" down into three general groups (and score
accordingly):
A. specific product names (e.g. "pedi paws") which are
   high-quality/low-risk spam signs
B. generic product names (e.g. "green tea") which are
   medium-quality/medium-risk spam signs
C. general terms (e.g. many variations on "insurance") which are
   medium-quality/higher-risk spam signs

I've never had an FP on the first group, and they're really easy to spot
and add to my rules.  I've even begun pre-emptively listing anything
I notice while watching TV.  The Weather Channel is particularly useful
for that. :)

The last group is the tricky one, and pretty much has to be used in
metas with the other rule groups listed above.

I regularly update my list of "active" snowshoe IP ranges, which catches
most of these.  That's my single most time intensive non-coding task, in
all of my anti-spam work.  I've gotten to the point where, if I notice
more than a few /24s in any one webhost's IP space, I re-classify _ALL_
of their blocks with a generally non-scoring code, then use that as a
meta at run-time.  The main problem is that I need more data to expand
these.

Anything which is sent from any of those IP blocks, then gets a HUGE
bonus if there is either a weak "teaser" and/or an unsubscribe term in
the body.

I'm planning to add another meta bonus rule for anything that's on
Barracuda.

I've found that HostKarma's blocklist is about as efficacious as
Barracuda, however I've experienced some timeouts, and some hinky
whitelist results, so I'm only using it in my FP pipeline, where it has
been extremely useful (Mark, if you're reading this, I'd be very happy
to send you more details and any specific data that would be helpful to
you - feel free to contact me off-list).

Some snowshoers have started putting the unsub link in a GIF, so I'll be
adding some rules for that, soonish.



*** Rob McEwen: ***
Would you be willing to provide your /24 list, for even a short period,
in some sort of plain text format (maybe one CIDR per line?), so those
of us with good hand-classified corpi could try out your data?

Most of my users are in a shared hosting environment, so they can't use
your list suite as-is.  Based on what reliable people have posted, some
of my users should probably benefit from your /24 list.  I'd be very
glad to provide you with a list of any FPs I find. :)

Contact me off-list, if you'd prefer.





Re: please help, getting hammered with snowshoe spam

2009-02-04 Thread Chip M.
Dennis Hardy wrote:
>Do people generally have good non-FP experience with BRBL? I am
>thinking of bumping up the score, but I get so much spam per day
>it is hard to check for FPs with it enabled.

Dennis, it depends on what sort of ham your people receive.

For evaluation purposes, I've been running Barracuda on three diverse
domains (one Geek, one "pure business", one mixed business&family).
Each maintains a decent-to-excellent hand-classified corpus (they share
summary data with me).

The Geek domain had 2 Barracuda FPs (between 28-Sep-2008 and today).
Both were from the same IP, so I've merely skip listed it.
Unfortunately, that particular sender was a webhost, with one of the
FPs being critically time-sensitive, so I consider those FPs to be
completely unacceptable (albeit easily avoided).  Both of those emails
received an SA score of "0.0", so the mentioned score of "3.0" would
NOT have stopped them (that particular webhost is extremely Geeky, and
doesn't commit any HTML atrocities).

The "pure business" domain had a zero Barracuda FP rate (note it's only
been running since 24-Jan-2009).

The other domain (running Barracuda since 16-Jan-2009) receives a LOT
of requested mailing list traffic (Constant Contact, Cheetah, etc), and
has had a significant number of FPs.

Here are the number of Barracuda hits for the last two weeks, for the
domain with FPs:
spam 5005
ham38

Of the ham IPs, 22 had been previously classified as (generally) legit
bulk mailers (i.e. "ESP"s).  Visual inspection of the rest showed that
_ALL_ were some sort of mailing list, mostly business oriented, the rest
charitable or social.  When I sorted that data by SA score, it was
uniformly distributed across the different types of senders.

Here is the breakdown of SA scores for the ham:
  SA range Hits Percent
25.3%
 0.0 - 0.67   18.4%
 1.0 - 1.8   13   34.2%
 2.0 - 2.437.9%
 3.0 - 3.57   18.4%
 4.0 - 4.26   15.8%

During that same period, there were 16 hams that had SA scores above
the cutoff threshold.  If I had scored Barracuda at "3.0", the
potential FPs would have doubled.

Note that I am not currently running Barracuda via SA
(I'm doing the testing in a different filter which runs right after SA).

Bottom-line:
Depending on the nature of your ham, you are likely to get some FPs,
even at the mentioned score of "3.0".

If you have a weak FP pipeline, then be very cautious.
Consider scoring Barracuda weakly, and using it in a "meta" context.

If anyone wants it, I can dump the specific SA tests for those FPs, as
well as a separate list of the spam hits (should be useful for creating
meta rules).

I will also update those stats in a few more weeks/months.





RE: please help, getting hammered with snowshoe spam

2009-02-02 Thread Faris Raouf
> Do people generally have good non-FP experience with BRBL?  I am
> thinking of
> bumping up the score, but I get so much spam per day it is hard to
> check for
> FPs with it enabled.  It seems like a great resource, will it be pushed
> out
> with "sa-update" soon?  I believe it is enabled in svn, from what I've
> read.
> 

On one of the systems we run we set it to 0.1 initially to see how it went.
After three months monitoring we upped it to 3.0. and have never had any
problems. However you have to take this in the context of the other settings
and mail throughput for this particular system: A tagging score of 4 and a
drop score of 12 (yes, this is a bit high), on roughly 4000 emails per day
(after zen.spamhause.org dnsbl blocking).

Faris.



Re: please help, getting hammered with snowshoe spam

2009-02-02 Thread Dennis Hardy

Yes, it has been a problem as there are so many domains used.  However..I
took everyone's earlier suggestions, including training Bayes against FN
snowshoe spam and adding the Barracuda RBL (BRBL), and this appears to
almost completely take care of the problem!!  So far I have been able to
remove all of my custom rules except for BRBL of course, and only a few of
these snowshoe spams get through now.  Nice!

Do people generally have good non-FP experience with BRBL?  I am thinking of
bumping up the score, but I get so much spam per day it is hard to check for
FPs with it enabled.  It seems like a great resource, will it be pushed out
with "sa-update" soon?  I believe it is enabled in svn, from what I've read.

Also I am using policyd-weight to do front-end greylisting if the DNSBL
checks trigger as this reduces load on the server.  Can anyone suggest how
to enable the BRBL in policyd-weight?  I'm not sure what values to use.

Again thank you for your help with this problem!  It is great to see SA
working so well now against it :-)


-- 
View this message in context: 
http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21792616.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: please help, getting hammered with snowshoe spam

2009-02-01 Thread Kai Schaetzl
Karsten Bräckelmann wrote on Fri, 30 Jan 2009 20:25:52 +0100:

> Dennis clearly stated a *week* ago that the "domains change too
> quickly" (actual quote). Getting them listed will not help him. Oh, and
> don't you think he would have created a trivial uri rule already, if
> that would get them caught?

Obviously they are caught for others ;-) Either by Bayes, rules, network 
checks or other measure. It's never a "one hits them all" solution, so 
adding a spam domain to uribl is always good.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Re: please help, getting hammered with snowshoe spam

2009-01-30 Thread Karsten Bräckelmann
On Fri, 2009-01-30 at 18:28 +0100, Benny Pedersen wrote:
> On Fri, January 23, 2009 17:36, Dennis Hardy wrote:
> 
> > Yes already done:  http://pastebin.com/m4400a74d
> 
> why not get it listed on http://uribl.com/ ?

Benny, this is going to help how?

Dennis clearly stated a *week* ago that the "domains change too
quickly" (actual quote). Getting them listed will not help him. Oh, and
don't you think he would have created a trivial uri rule already, if
that would get them caught?


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: please help, getting hammered with snowshoe spam

2009-01-30 Thread Rob McEwen
Benny Pedersen wrote:
> On Fri, January 23, 2009 17:36, Dennis Hardy wrote:
>   
>> Yes already done:  http://pastebin.com/m4400a74d
>> 
> why not get it listed on http://uribl.com/?
>   

Both uribl and ivmURI listed this domain back on January 23rd. But it is
unclear exactly *when* this spam sample was sent because the person who
started this thread didn't include full headers. So it is unclear if the
message hit this guy's server before these two URI blacklists listed
that domain? or after? (I'm guessing after?)

-- 
Rob McEwen
http://dnsbl.invaluement.com/
r...@invaluement.com
+1 (478) 475-9032




Re: please help, getting hammered with snowshoe spam

2009-01-30 Thread Benny Pedersen

On Fri, January 23, 2009 17:36, Dennis Hardy wrote:

> Yes already done:  http://pastebin.com/m4400a74d

why not get it listed on http://uribl.com/ ?

-- 
http://localhost/ 100% uptime and 100% mirrored :)



Re: please help, getting hammered with snowshoe spam

2009-01-24 Thread mouss
Dennis Hardy a écrit :
>> Is this spam for snowshoes or some "spam term"?
> 
> "Like a snowshoe spreads the load of a traveler across a wide area of snow,
> some spammers use many frequently-changing IP addresses and domains to
> spread out the spam load in order to dilute recipient reputation metrics and
> evade filters."
> 
> see http://www.spamhaus.org/faq/answers.lasso?section=Glossary#233
> 
>> If the former, put some example up on a pastebin (not ehre!).
> 
> Yes already done:  http://pastebin.com/m4400a74d

you need to show full headers. there are generally patterns in the
envelope sender and in few headers.

Also, consider using BRBL:

header   RCVD_IN_BRBL  eval:check_rbl('brbl-lastexternal',
'bb.barracudacentral.org.')
describe RCVD_IN_BRBL  Received via a relay in Barracuda BRBL
tflags   RCVD_IN_BRBL  net
scoreRCVD_IN_BRBL  3.0

adjust the score of course.








Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Dennis Hardy

Everyone has given very helpful feedback!  At present it definitely sounds
like I should tweak my rules and train my bayes.  I will try taking steps
here and see how it goes.

Thank you all so very much!


-- 
View this message in context: 
http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21631249.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Derek Harding

Dennis Hardy wrote:

Hi, I'm getting hammered by snowshoe spam :-(

Any thoughts/advice are appreciated :-)
  


When this started happening to us the only solution I found was manual 
CIDR blocks.


Yea I know very last millennium but I didn't find anything else to work 
with. Some particular snowshoers had patterns I could use but it seemed 
the addresses under attack were rapidly passed out among a large number 
of different outfits each with different styles. Bayes did not help sadly.


Derek



Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Kai Schaetzl
Dennis Hardy wrote on Fri, 23 Jan 2009 08:36:59 -0800 (PST):

> see http://www.spamhaus.org/faq/answers.lasso?section=Glossary#233

Ah. I know a lot of spam terms, but this is certainly new to me ;-)

> 
> > If the former, put some example up on a pastebin (not ehre!).
> 
> Yes already done:  http://pastebin.com/m4400a74d

As it doesn't contain any headers I don't know if I wouldn't have rejected 
it at MTA, anyway. I get:

X-Spam-Report: 
*  5.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
*  [score: 1.]
*  3.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
*  [URIs: twolumpsofcoal.net]
*  0.1 DIET_1 BODY: Lose Weight Spam

It may not have been in URIBL_BLACK at the time you got it. But there are 
two other good rules that hit on it. As you are getting BAYES_05 there's 
something wrong with your Bayes I'd say.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread John Hardin

On Fri, 23 Jan 2009, Dennis Hardy wrote:


Here is what I have been using (from previous help from this mail list!):

   uri SSS_URI30 /\bhttp:\/\/[^\.\/]+\.(?i:com|net|info|biz)\/\w{30}\b/
   uri SSS_URI30 1.5

this uri rule does work very well.  but they change the length 
sometimes, so I have a few rules that handle different lengths.  Maybe I 
should use 29,31 instead of just 30 for example?


Am I being too conservative?  Should I consider bumping the score of 
this up more?  And my meta up more perhaps?


Again, I'd have to see more examples to comment meaningfully. I would be 
especially interested in whether or not the part after the domain name is 
indeed free from punctuation.


A long string of unpunctuated letters is less likely to FP than a long 
string of letters, numbers and underscores.


You might want to anchor your rule with a $ as it may FP if there is stuff 
in the URI following the string of gibberish. Try it against this very 
legitimate looking (if overly verbose) URI:


  http://fnord.com/retrieve_document_as_pdf3_file.php?123456

And the rule I suggested makes an attempt to detect gibberish by looking 
for a "q" that is not followed by a "u", which is rare in English words.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista: because the audio experience is *far* more important than
  network throughput.
---
 4 days until Wolfgang Amadeus Mozart's 253rd Birthday


Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Dennis Hardy

> your BAYES is misfiring. Ths difference between BAYES_05 and BAYES_99 is
4.6
> so you could have score of 5.7 if you'd have well-trained BAYES.

Yes, that would be great.  I will look at trying this.  I do get tens of
thousands of e-mails a day through this system though so it is hard to do
manual processes.  I need to play conservative and can't afford FPs at
all...


-- 
View this message in context: 
http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628480.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Dennis Hardy

> Can you repost that with full headers?

Yes, I have to wait for more to come through though as I have gotten into
the habit of just deleting the FNs.

> No DNSBL hits on the URI domain?

No, the domains change too quickly, so I almost never get DNSBL hits for
these.  I have DNSBL greylisting front-ending SA as well, and I get no hits
there either.  It is really annoying.  Usually someone will submit and
URIBL_BLACK will hit after a few though.  I've added a meta for the URL
check (below) and URIBL_BLACK and DCC_CHECK, maybe all I really need to do
is bump up the meta score for this combination?

> We'd need more than one sample URI to do a good job. Have you been
> collecting a corpus?

Not of a FN set.  I should collect this.

> I notice that this URI has a format that may be a good spam sign: the 
> domain name, followed by a long string of unpunctuated text gibberish.

Here is what I have been using (from previous help from this mail list!):

uri SSS_URI30 /\bhttp:\/\/[^\.\/]+\.(?i:com|net|info|biz)\/\w{30}\b/
uri SSS_URI30 1.5

this uri rule does work very well.  but they change the length sometimes, so
I have a few rules that handle different lengths.   Maybe I should use 29,31
instead of just 30 for example?

Am I being too conservative?  Should I consider bumping the score of this up
more?  And my meta up more perhaps?


-- 
View this message in context: 
http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628431.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Matus UHLAR - fantomas
> > why are those scores low? What gives them negative score?
> > those rules have quite high score...

On 23.01.09 08:26, Dennis Hardy wrote:
> Here is an example (without my rules):  http://pastebin.com/m4400a74d

X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_05,DCC_CHECK,DIET_1,
SPF_HELO_PASS,SPF_PASS autolearn=no version=3.2.5

your BAYES is misfiring. Ths difference between BAYES_05 and BAYES_99 is 4.6
so you could have score of 5.7 if you'd have well-trained BAYES.

> The ones that get through are relatively short and simple, and many are very
> "clean".  This example is just one that focuses on weight loss, some are
> regarding tea or satellite companies or coffee makers or the like.  I worry
> about increasing FPs of real e-mails by training of "clean" spams as spam,
> when they are short and sweet and many times look like they could be
> legitimate e-mails.

just train on them, and remember to train on clean mails (especially those
which will start getting higher BAYES score).

> Also would training bayes on this sort of e-mail help if many things are
> different between each e-mail, and if the e-mail is so short and relatively
> "clean"?  Addresses change, company names change, sender domains are always
> different, etc

Iv you trained with enough of mail, it would help. However the result says
similar mails were trasined as ham, which is what you should investigate and
fix.

on some mailboxes I keep trained ham/spam in special folders so I could
whenever re-train or forget if anything was incorrect.

> I've been thinking about maybe writing an SA plugin that counts the three
> repeated URL patterns that are always present in all of these spams, but I
> don't know where to start in trying to do that.  I was hoping I could just
> handle this with SA rules or something (like using another RBL or
> something).

more mails could give an idea what should be hit. Maybe a rule would be
enough, not needed to create a plugin. But I'm sure BAYES training should be
enough for this mail...

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Support bacteria - they're the only culture some people have. 


Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread John Hardin

On Fri, 23 Jan 2009, Dennis Hardy wrote:




why are those scores low? What gives them negative score?
those rules have quite high score...


Here is an example (without my rules):  http://pastebin.com/m4400a74d


Can you repost that with full headers?

The ones that get through are relatively short and simple, and many are 
very "clean".


No DNSBL hits on the URI domain?

I've been thinking about maybe writing an SA plugin that counts the 
three repeated URL patterns that are always present in all of these 
spams, but I don't know where to start in trying to do that.


We'd need more than one sample URI to do a good job. Have you been 
collecting a corpus?


I notice that this URI has a format that may be a good spam sign: the 
domain name, followed by a long string of unpunctuated text gibberish.


Just off the top of my head and untested, how does this do against your 
corpus?


  uri GIBBERISH ;://[^/]{4,50}/(?=[a-z]{25,80}$)[a-z]{0,80}q[^u][a-z]{0,80}$;i

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control is nothing more than an attempt to return to feudalism,
  where the peasants are helpless and must humbly petition their lord
  and master to protect them from bandits and thieves (when they can
  get around to it), and where the lords and masters can abuse the
  peasants whenever they like without fear of effective resistance.
---
 4 days until Wolfgang Amadeus Mozart's 253rd Birthday


Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Dennis Hardy

> I've been using this rule to knock some of these down:
>   [...]
> Highly unusual to have a url like that in ham...
> I'm running a meta to bump up the score...

Yes, I've actually been doing the very same thing (URI detection and metas,
and then string matching in the tail part of the e-mail) !  However it has
been getting tedious maintaining the string list manually, because the "
Marketing" and " Media" etc. targets and addresses have been changing
far more frequently now.  They'll use them for a few days, then disappear
completely, and new ones will appear.  This type of spam is so incredibly a
pain...  Is there some more general way that this sort of thing could be
handled?


-- 
View this message in context: 
http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21628143.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Daniel J McDonald
On Fri, 2009-01-23 at 07:56 -0800, Dennis Hardy wrote:
> Hi, I'm getting hammered by snowshoe spam :-(  I've added rules to try to
> catch common formats of included URLs in the spam, but I'm wary of scoring
> these rules too high because of the potential for false positives.  It's
> hard to come up with other rules as the spam e-mail content is so generic. 
> By default these spams score incredibly low (bayes, etc.)  In many cases,
> the low bayes values are scoring negative, which completely offsets the few
> positive scoring rules that I have added.

I've been using this rule to knock some of these down:
uri AE_ASM  /\/[[:alpha:]]{28,40}$/
describe AE_ASM long gibberish path used by ASM Marketing
score AE_ASM1

Highly unusual to have a url like that in ham...
I'm running a meta to bump up the score...

-- 
Daniel J McDonald, CCIE #2495, CISSP #78281, CNX
Austin Energy
http://www.austinenergy.com



Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Dennis Hardy

> Is this spam for snowshoes or some "spam term"?

"Like a snowshoe spreads the load of a traveler across a wide area of snow,
some spammers use many frequently-changing IP addresses and domains to
spread out the spam load in order to dilute recipient reputation metrics and
evade filters."

see http://www.spamhaus.org/faq/answers.lasso?section=Glossary#233

> If the former, put some example up on a pastebin (not ehre!).

Yes already done:  http://pastebin.com/m4400a74d


-- 
View this message in context: 
http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627984.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Kai Schaetzl
Dennis Hardy wrote on Fri, 23 Jan 2009 07:56:44 -0800 (PST):

> Hi, I'm getting hammered by snowshoe spam

Is this spam for snowshoes or some "spam term"? If the former, put some 
example up on a pastebin (not ehre!).

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Dennis Hardy

> why are those scores low? What gives them negative score?
> those rules have quite high score...

Here is an example (without my rules):  http://pastebin.com/m4400a74d

The ones that get through are relatively short and simple, and many are very
"clean".  This example is just one that focuses on weight loss, some are
regarding tea or satellite companies or coffee makers or the like.  I worry
about increasing FPs of real e-mails by training of "clean" spams as spam,
when they are short and sweet and many times look like they could be
legitimate e-mails.

Also would training bayes on this sort of e-mail help if many things are
different between each e-mail, and if the e-mail is so short and relatively
"clean"?  Addresses change, company names change, sender domains are always
different, etc

I've been thinking about maybe writing an SA plugin that counts the three
repeated URL patterns that are always present in all of these spams, but I
don't know where to start in trying to do that.  I was hoping I could just
handle this with SA rules or something (like using another RBL or
something).

Thank you!

-- 
View this message in context: 
http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627664.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: please help, getting hammered with snowshoe spam

2009-01-23 Thread Matus UHLAR - fantomas
On 23.01.09 07:56, Dennis Hardy wrote:
> Hi, I'm getting hammered by snowshoe spam :-(  I've added rules to try to
> catch common formats of included URLs in the spam, but I'm wary of scoring
> these rules too high because of the potential for false positives.  It's
> hard to come up with other rules as the spam e-mail content is so generic. 
> By default these spams score incredibly low (bayes, etc.)  In many cases,
> the low bayes values are scoring negative, which completely offsets the few
> positive scoring rules that I have added.

train bayes properly, it's the first thing you should do for such mail.

> Are there other RBLs or domain checks or something that could be used to
> possibly get more indication that a spam is a snowshoe spam from a "bogus"
> domain?  I've also added a meta rule that combines URIBL_BLACK, DCC_CHECK,
> and my rules...but spam still gets by many times because it scores so
> low/negative otherwise.  Maybe I just need to score everything higher...?

why are those scores low? What gives them negative score?
those rules have quite high score...

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Quantum mechanics: The dreams stuff is made of. 


please help, getting hammered with snowshoe spam

2009-01-23 Thread Dennis Hardy

Hi, I'm getting hammered by snowshoe spam :-(  I've added rules to try to
catch common formats of included URLs in the spam, but I'm wary of scoring
these rules too high because of the potential for false positives.  It's
hard to come up with other rules as the spam e-mail content is so generic. 
By default these spams score incredibly low (bayes, etc.)  In many cases,
the low bayes values are scoring negative, which completely offsets the few
positive scoring rules that I have added.

Are there other RBLs or domain checks or something that could be used to
possibly get more indication that a spam is a snowshoe spam from a "bogus"
domain?  I've also added a meta rule that combines URIBL_BLACK, DCC_CHECK,
and my rules...but spam still gets by many times because it scores so
low/negative otherwise.  Maybe I just need to score everything higher...?

Any thoughts/advice are appreciated :-)


-- 
View this message in context: 
http://www.nabble.com/please-help%2C-getting-hammered-with-snowshoe-spam-tp21627042p21627042.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.