Effectiveness of Bayes poisoning (was Re: Spam Pattern)

2014-02-12 Thread David F. Skoll
On Wed, 12 Feb 2014 13:11:19 -0800 (PST)
John Hardin jhar...@impsec.org wrote:

 That only works if your hammy mail stream contains text that looks
 like the random garbage they put in to try to spoof bayes.

Indeed.  Just for kicks, I ran the OP's pastebin example through our
Bayes database and it scored 99.99% likelihood of spam.  The word
Wopsle, for example, was a dead giveaway... that never appears in
our ham stream, but has appeared in 93 spams in our database.

Bayes poisoning, in our experience, is only occasionally effective.

Regards,

David.



Re: Trouble with bayes poisoning spam

2012-12-02 Thread Alex
Hi,

 Actually, that's a Snowshoe IP.
 Which, on balance, can be a good thing, slaying-wise. :)

You mean that it's more likely to be added to the SBL with the other
IPs in the same range sooner?

 Almost four years ago, I posted my approach to snowshoe slaying:
 
 http://mail-archives.apache.org/mod_mbox/spamassassin-users/200902.mbox/%3c20090204.0...@iowahoneypot.com%3e

 It has continued to evolve since then.
 Both IP block tracking and identity (Subject  From.Realname)
 header token checking are still the two most useful approaches.

I read your email from four years ago. How has it evolved?

We have created a few scripts that allow you to paste a phrase from a
FN into a text file, which is then generated into a rule. So Olde
Brooklyn Lantern in the body would get a score, etc.

Combined wit ZEN and/or SBL, and I think this is similar to what
you're doing, correct?

 I see you have hits on RELAYCOUNTRY.  If you maintain your own
 virtual snowshoe nations, and merge them into your real nations,
 while building a list of snowshoe tokens, you'll have very good
 success catching these.

At one point I hoped I could exclude certain countries, or score some
higher than others, but too much legitimate mail is received from all
over the world. Got burned too many times.

 For example, that IP is in root eSolutions space, and they have
 had a snowshoe problem for at least a year and a half.

 Here's their ranges that I have in my small scale database:
 94.242.192.0 - 94.242.255.255
 188.42.0.0 - 188.42.127.255
 212.117.160.0 - 212.117.191.255

Do you list them all as class C's or is there a CIDR mask that matches
these? I've found many class C's in 41/8 that I'd really like to know
what valid companies use this whole class A, or better isolate the
class C's to block them.

 About two years ago, I hit a tipping point with my snowshoe IP
 data, and can now _VERY_ rapidly identify new blocks.

I would really be interested in that, especially if it's beyond what
is already available in the SBL.

 Both of these phrases are in my snowshoe tokens database:
 Classic Lantern
 Incredible Light

How do these phrases relate to a snowshoe IP range? And one that isn't
already part of the SBL?

You would have to at least catch that phrase on two IPs in the same
class C before you could consider it a snowshoe, correct?

 I checked, and one of my best data feeds was hit by the same
 IP block in your sample.  Here are quick dumps of the contents of
 the identity headers:

 frequency and contents of Field [Subject], filtered by [all  IP 
 w/188.42.11.]
 A unique christmas gift for the kids
 A variety of medigap options explained and simplified

Perhaps not to the same degree as you do, but I also have these
phrases in my local database from which rules are created. Do you have
a mechanism to auto-generate them? Shouldn't this be incorporated into
Justin's SOUGHT rules?

 As soon as I've finished a couple of timesink projects, I'll start
 on those.
 - Chip

Thanks,
Alex


re: Trouble with bayes poisoning spam

2012-11-30 Thread Chip M.
Hi Alex!

Actually, that's a Snowshoe IP.
Which, on balance, can be a good thing, slaying-wise. :)

Almost four years ago, I posted my approach to snowshoe slaying:

http://mail-archives.apache.org/mod_mbox/spamassassin-users/200902.mbox/%3c20090204.0...@iowahoneypot.com%3e

It has continued to evolve since then.
Both IP block tracking and identity (Subject  From.Realname)
header token checking are still the two most useful approaches.

I see you have hits on RELAYCOUNTRY.  If you maintain your own
virtual snowshoe nations, and merge them into your real nations,
while building a list of snowshoe tokens, you'll have very good 
success catching these.

For example, that IP is in root eSolutions space, and they have
had a snowshoe problem for at least a year and a half.

Here's their ranges that I have in my small scale database:
94.242.192.0 - 94.242.255.255
188.42.0.0 - 188.42.127.255
212.117.160.0 - 212.117.191.255

About two years ago, I hit a tipping point with my snowshoe IP
data, and can now _VERY_ rapidly identify new blocks.

Both of these phrases are in my snowshoe tokens database:
Classic Lantern
Incredible Light

I checked, and one of my best data feeds was hit by the same
IP block in your sample.  Here are quick dumps of the contents of
the identity headers:

frequency and contents of Field [Subject], filtered by [all  IP 
w/188.42.11.]
A unique christmas gift for the kids 
A variety of medigap options explained and simplified
Burn off that belly while you're sleeping 
Compensation information for those that suffered from mesh patch 
complications 
Endless inventory of electronics at 1/5th of what you'd pay for retail 
Ever wondered what it would be like to fly in a private jet?
It's time you chopped that home payment in half 
Learn a new tongue in days
Simple solutions for Medicare and Medigap
Speak Japanese in two weeks
Stop wasting time, start saving on your home payment 
We have your guide to being prepared in the event of a crisis or 
natural disaster 
You can get a Kindle Fire HD for around thirty bucks 
Your guide to being prepared in the event of a crisis or natural 
disaster 

frequency and contents of Field [RealnameFrom], filtered by [all  IP 
w/188.42.11.]
Adorable Santa Letters
Become Multilingual
Better Rates Today
Gain Kowledge
House Payment Halfer
Lose Pounds No Gym
MacBooks From 150.00
Medicare Made Simple
Medigap/Medicare Explained
Mesh Patch Patient Alert
Private Jet Share Packages
Samsung Galaxy Sold 28.54
Surgical Mesh Patch Patient Alert
Your Crisis Preparation Guide

When I get the time, AND some volunteers to help, I plan to publish
the most statistically significant data from BOTH databases. :)

Rob's Invalument data is supposed to be very helpful for snowshoe
detection.  Eventually, I'll get around to trying it. :)


*** John:
How practical would it be to create some metas that hinged off a
snowshoe nation hit on RelayCountry?  We'd have to define some
virtual nation codes, but that's easy.  I'm using a letter + number
combo, since none of the official two digit country codes contain
a number.

That way, you and others could come up with some very nifty 
snowshoe focused tests, and they would ONLY trigger if the sender
used a known snowshoe negligent host, AND the recipient server
chose to use IP-to-Nation tests.  Win-win. :)

I have the naively optimistic notion that some snowshoe hosts
simply do not have anti-spam expertise, and if there was a reliable
library of snowshoe patterns, they might test the outgoing mail of
new customers. :)

This week, I posted a list of proposed 2013 projects to my
volunteers, and at the top is exporting our MassCheck data for SA.
Also on the list are phish and snowshoe data sharing. :)

As soon as I've finished a couple of timesink projects, I'll start
on those.
- Chip




Trouble with bayes poisoning spam

2012-11-29 Thread Alex
Hi,

I have an example of spam that I just can't reliably detect:

http://pastebin.com/YuuLuA1x

It's basically some HTML with a URL to an ad for Lantern with 9 LED
bulbs. I've trained hundreds of these, and they still report
BAYES_50. I've just tested it now, a few hours after having first
received it, and it's already being flagged by several URIBLs and is
hitting BAYES_99 since I've now trained it.

I was just wondering if there was something else that could be
triggered on in the header to catch these sooner? I'm assuming the
sending IP part of a botnet? I'm using v3.3.2 on fc15 with amavisd.

Thanks,
Alex


Re: Trouble with bayes poisoning spam

2012-11-29 Thread John Hardin

On Thu, 29 Nov 2012, Alex wrote:


I have an example of spam that I just can't reliably detect:

http://pastebin.com/YuuLuA1x

I was just wondering if there was something else that could be
triggered on in the header to catch these sooner? I'm assuming the
sending IP part of a botnet? I'm using v3.3.2 on fc15 with amavisd.


I'm wondering why this didn't hit any rules:

   font-size:4px;

That's too small to read and should be a good indicator of bayes poison, 
just like setting the font to white.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Bother, said Pooh as he struggled with /etc/sendmail.cf, it never
  does quite what I want. I wish Christopher Robin was here.
   -- Peter da Silva in a.s.r
---
 26 days until Christmas


Bayes Poisoning

2011-10-18 Thread Daniel McDonald
One of my users submitted a spam for analysis, and I was amazed at the
efforts this troglodyte expended to poison bayes.
Is it worth the effort to try to find huge html comments hiding junk like
this?

Maybe something like

Rawbody OBFU_HTML_LONG_COMMENT /\--.{1024,}?--\/
Describe OBFU_HTML_LONG_COMMENT contains a ridiculously long html comment



-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281



Re: Bayes Poisoning

2011-10-18 Thread Bowie Bailey
On 10/18/2011 8:53 AM, Daniel McDonald wrote:
 One of my users submitted a spam for analysis, and I was amazed at the
 efforts this troglodyte expended to poison bayes.
 Is it worth the effort to try to find huge html comments hiding junk
 like this?

 Maybe something like

 Rawbody OBFU_HTML_LONG_COMMENT /\--.{1024,}?--\/
 Describe OBFU_HTML_LONG_COMMENT contains a ridiculously long html comment

It may be worthwhile trying to find overly-long comments, but
unfortunately, it's not quite as easy as that.  The problem is making
sure the beginning and ending markers are part of the same comment. 
Your example would be tripped up if there was a small comment at the
beginning of the message and another small comment at the end.  It would
count characters between the beginning of the first comment and the end
of the second one.

As far as Bayes Poisoning, I'm not sure there is any such thing.  Any
random text that a spammer dumps into his emails is unlikely to match
the pattern of your normal emails.  So just feed it to Bayes and let it
do its job.  Bayes works amazingly well if trained properly.  :)

-- 
Bowie


Re: Bayes Poisoning

2011-10-18 Thread Joseph Brennan

Daniel McDonald dan.mcdon...@austinenergy.com wrote:



Rawbody OBFU_HTML_LONG_COMMENT /\--.{1024,}?--\/
Describe OBFU_HTML_LONG_COMMENT contains a ridiculously long html comment



Tried with exactly that limit, 1 kb.

TargetX, which is used by universities in recruiting, uses a long comment
in its generated mail (I did not keep a note of how many kb).

Travelocity puts a 28 kb comment in confirmation messages.

We were scoring 1.0 for it, and we gave up after a few more fp cases,
rather than keep whitelisting.


It has to do with email generated from scripts written by web designers.
They're as good at email as I am at at designing web pages :-)


Joseph Brennan
Lead Email Systems Engineer
Columbia University Information Technology




Re: Bayes Poisoning

2011-10-18 Thread Karsten Bräckelmann
On Tue, 2011-10-18 at 07:53 -0500, Daniel McDonald wrote:
 One of my users submitted a spam for analysis, and I was amazed at the
 efforts this troglodyte expended to poison bayes.
 Is it worth the effort to try to find huge html comments hiding junk
 like this?

Hmm, wait -- Bayes and HTML comments in the same thought. Are you trying
to imply the malicious Bayes tokens are inside the comment?

While this kind of attack might work with other Bayesian Classifier
implementations out there, it does NOT fool SA. The (body) Bayes tokens
SA uses are gathered from the *rendered* body text. All HTML dropped,
including comments.

If you want to find out why that message has a low Bayes score, you'll
have to use Template Tags to extract and investigate the tokens.
Pointing at the HTML comment is a red herring.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes Poisoning

2011-10-18 Thread Daniel McDonald



On 10/18/11 12:12 PM, Karsten Bräckelmann guent...@rudersport.de wrote:

 On Tue, 2011-10-18 at 07:53 -0500, Daniel McDonald wrote:
 One of my users submitted a spam for analysis, and I was amazed at the
 efforts this troglodyte expended to poison bayes.
 Is it worth the effort to try to find huge html comments hiding junk
 like this?
 
 Hmm, wait -- Bayes and HTML comments in the same thought. Are you trying
 to imply the malicious Bayes tokens are inside the comment?
 
 While this kind of attack might work with other Bayesian Classifier
 implementations out there, it does NOT fool SA. The (body) Bayes tokens
 SA uses are gathered from the *rendered* body text. All HTML dropped,
 including comments.

Fair enough.  I see that the url's in this message have been picked up by
invaluement and razor, so we probably have enough points to toss it in the
quarantine now anyway.


-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281



RetrunPath and Bayes Poisoning

2010-02-23 Thread Jason Bertoch


Are there any internal checks that disable Bayes autolearn when these 
artificial whitelist rules match?  I'd disabled these rules in versions 
prior to 3.3.0 but, with all the discussion on the matter, I thought I'd 
leave them in to see the new and improved version.  Unfortunately, I'm 
still seeing false positives and am concerned that they are pushing the 
scores low enough to poison my Bayes database.



/Jason



smime.p7s
Description: S/MIME Cryptographic Signature


Re: RetrunPath and Bayes Poisoning

2010-02-23 Thread Michael Scheidell

On 2/23/10 9:03 AM, Jason Bertoch wrote:


Are there any internal checks that disable Bayes autolearn when these 
artificial whitelist rules match?  I'd disabled these rules in 
versions prior to 3.3.0 but, with all the discussion on the matter, I 
thought I'd leave them in to see the new and improved version.  
Unfortunately, I'm still seeing false positives and am concerned that 
they are pushing the scores low enough to poison my Bayes database.



you can edit the tflags and add noautolearn

example:
72_active.cf:tflags__RCVD_IN_DNSWLnice net


becomes:
72_active.cf:tflags__RCVD_IN_DNSWLnice net noautolearn




--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 *| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best Anti-Spam Product 2008, Network Products Guide
   * King of Spam Filters, SC Magazine 2008


__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/

__  

Re: RetrunPath and Bayes Poisoning

2010-02-23 Thread Bowie Bailey
Michael Scheidell wrote:
 On 2/23/10 9:03 AM, Jason Bertoch wrote:

 Are there any internal checks that disable Bayes autolearn when these
 artificial whitelist rules match?  I'd disabled these rules in
 versions prior to 3.3.0 but, with all the discussion on the matter, I
 thought I'd leave them in to see the new and improved version. 
 Unfortunately, I'm still seeing false positives and am concerned that
 they are pushing the scores low enough to poison my Bayes database.

 you can edit the tflags and add noautolearn

 example:
 72_active.cf:tflags__RCVD_IN_DNSWLnice net


 becomes:
 72_active.cf:tflags__RCVD_IN_DNSWLnice net noautolearn

Are these settings cumulative?  The man page doesn't specify.

If I do this:

tflagsRULENAMEnice net
tflagsRULENAME   noautolearn

what happens?  Does everything get set or do I only get 'noautolearn'?

-- 
Bowie


Re: RetrunPath and Bayes Poisoning

2010-02-23 Thread Jason Bertoch

On 2/23/2010 9:20 AM, Michael Scheidell wrote:

Unfortunately, I'm still seeing false positives and am concerned that
they are pushing the scores low enough to poison my Bayes database.


you can edit the tflags and add noautolearn

example:
72_active.cf:tflags RCVD_IN_RP_CERTIFIEDnet nice
72_active.cf:tflags RCVD_IN_RP_SAFE net nice

becomes:
72_active.cf:tflags RCVD_IN_RP_CERTIFIEDnet nice noautolearn
72_active.cf:tflags RCVD_IN_RP_SAFE net nice noautolearn


Nice, I didn't realize it worked like that.  To make this permanent, do 
I need to set the score to zero and copy the rules to a different name 
in local.cf, or will a second tflags declaration in local.cf simply 
override the one in 72_active.cf?


/Jason



smime.p7s
Description: S/MIME Cryptographic Signature


Re: RetrunPath and Bayes Poisoning

2010-02-23 Thread Michael Scheidell

On 2/23/10 9:28 AM, Bowie Bailey wrote:

Michael Scheidell wrote:
   

On 2/23/10 9:03 AM, Jason Bertoch wrote:
 

Are there any internal checks that disable Bayes autolearn when these
artificial whitelist rules match?  I'd disabled these rules in
versions prior to 3.3.0 but, with all the discussion on the matter, I
thought I'd leave them in to see the new and improved version.
Unfortunately, I'm still seeing false positives and am concerned that
they are pushing the scores low enough to poison my Bayes database.

   

you can edit the tflags and add noautolearn

example:
72_active.cf:tflags__RCVD_IN_DNSWLnice net


becomes:
72_active.cf:tflags__RCVD_IN_DNSWLnice net noautolearn
 

Are these settings cumulative?  The man page doesn't specify.

If I do this:

tflagsRULENAMEnice net
tflagsRULENAME   noautolearn

what happens?  Does everything get set or do I only get 'noautolearn'?

   

why not just do tflags RULENAME nice net noautolearn

(oh.. and to find them, grep '^tflags.*RCVD_IN' *.cf

some interesting ones. not sure why they rate a net nice:
RCVD_IN_IADB_OPTOUTONLY net nice?
describe is: IADB: Scrapes addresses, pure opt-out only

or
describe RCVD_IN_IADB_NOCONTROLIADB: Has absolutely no mailing 
controls in place


I would think a POSITIVE score for someone who we know violates federal 
can-spam laws (scrapes addresses.  violation of us federal can-spam laws)




__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
__  


Re: RetrunPath and Bayes Poisoning

2010-02-23 Thread Jason Bertoch

On 2/23/2010 9:35 AM, Michael Scheidell wrote:


why not just do tflags RULENAME nice net noautolearn

(oh.. and to find them, grep '^tflags.*RCVD_IN' *.cf

some interesting ones. not sure why they rate a net nice:


Grepping for 'autolearn' turns up the built-in whitelist and blacklist 
rules.  I wonder, why wasn't it applied to the RP and DNSWL rules as 
well?  Perhaps I should request a rule change.  Thoughts?


/Jason




smime.p7s
Description: S/MIME Cryptographic Signature


Re: RetrunPath and Bayes Poisoning

2010-02-23 Thread Bowie Bailey
Michael Scheidell wrote:
 On 2/23/10 9:28 AM, Bowie Bailey wrote:
 Michael Scheidell wrote:
   
 On 2/23/10 9:03 AM, Jason Bertoch wrote:
 
 Are there any internal checks that disable Bayes autolearn when these
 artificial whitelist rules match?  I'd disabled these rules in
 versions prior to 3.3.0 but, with all the discussion on the matter, I
 thought I'd leave them in to see the new and improved version.
 Unfortunately, I'm still seeing false positives and am concerned that
 they are pushing the scores low enough to poison my Bayes database.


 you can edit the tflags and add noautolearn

 example:
 72_active.cf:tflags__RCVD_IN_DNSWLnice net


 becomes:
 72_active.cf:tflags__RCVD_IN_DNSWLnice net noautolearn
  
 Are these settings cumulative?  The man page doesn't specify.

 If I do this:

 tflagsRULENAMEnice net
 tflagsRULENAME   noautolearn

 what happens?  Does everything get set or do I only get 'noautolearn'?


 why not just do tflags RULENAME nice net noautolearn

 (oh.. and to find them, grep '^tflags.*RCVD_IN' *.cf 

If I can just add 'noautolearn' in my local.cf, then I don't have to
worry about what is currently set in the distributed rules.  And if an
update adds or removes a setting, it will happen automatically without
me having to mess with it.

-- 
Bowie


Re: RetrunPath and Bayes Poisoning

2010-02-23 Thread Karsten Bräckelmann
On Tue, 2010-02-23 at 09:28 -0500, Bowie Bailey wrote:
 Michael Scheidell wrote:
  you can edit the tflags and add noautolearn

 Are these settings cumulative?  The man page doesn't specify.

Nope. tflags is of type CONF_TYPE_HASH_KEY_VALUE, so there's exactly one
tflags value per rule name.


 tflagsRULENAMEnice net
 tflagsRULENAME   noautolearn
 
 what happens?  Does everything get set or do I only get 'noautolearn'?

The latter wins and overwrites the former.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Solution to Bayes poisoning, high load levels, image spam, and botnet spam

2007-06-21 Thread Marc Perkel
I'm seeing a lot of people saying that bayes isn't working like it used 
to, that load levels are high, and that they are getting a lot of image 
and botnet spam. There are a few simple tricks you can do to get rid of 
90% of it.


First - use dummy MX records. Real mail retries. Botnet and must 
spammers don't. It's easier for them to try to spam someone else than to 
fight your filter. MX config is as follows:


dummy - 10
real - 20
real-backups - 30
dummy - 40
dummy - 50
dummy - 60
...

All dummy IP addresses are dead IPs. Port 25 closed. Don't do a 4xx on 
the lowest numbers IP because QMail is brain dead and won't retry the 
higher numbered servers. The upper MX can return 4xx if you want to log 
botnet traffic. This will eliminate 75%-90% of your spam with no false 
positives ust making this change.


Second - use blacklists in a way that blocks the spam, not just score 
it. If you use the spamhaus list you,ll get rid of about 1/3 of what's left.


Then - you just let SA process the rest. What you'll find is that most 
all botnet spam will be gone, Bayes will start working again. Load 
levels will drop dramatically.


Another thing - I don't know what everyone else uses but Exim is my MTA 
and it has the power to be easily configured to do just about anything 
you can imagine. If you are unhappy with your MTA Exim is the what I 
think is the right choice.


Another solution is to just have me get rid of your spam for you and 
make the problem go away. If anyone is tired of all this and just wants 
it done you can email me privately and I'll set you up.


Re: Solution to Bayes poisoning, high load levels, image spam, and botnet spam

2007-06-21 Thread arni

Marc Perkel schrieb:
I'm seeing a lot of people saying that bayes isn't working like it 
used to, that load levels are high, and that they are getting a lot of 
image and botnet spam. There are a few simple tricks you can do to get 
rid of 90% of it.



56th reinvention of the square wheel

You might wanna search this lists archive for further comments ...

arni


Re: Solution to Bayes poisoning, high load levels, image spam, and botnet spam

2007-06-21 Thread Matthias Häker



Marc Perkel schrieb:
I'm seeing a lot of people saying that bayes isn't working like it 
used to, that load levels are high, and that they are getting a lot of 
image and botnet spam. There are a few simple tricks you can do to get 
rid of 90% of it.




ah nice
can you tell me how to implant this ins SpamAssassin



Re: Solution to Bayes poisoning, high load levels, image spam, and botnet spam

2007-06-21 Thread Matt

First - use dummy MX records. Real mail retries. Botnet and must
spammers don't. It's easier for them to try to spam someone else than to
fight your filter. MX config is as follows:

dummy - 10
real - 20
real-backups - 30
dummy - 40
dummy - 50
dummy - 60


Currently I have mail.mydomain.com as 10.  Can I just change that to
20 and add mail5.mydomain.com as 10 but not have an IP associated with
mail5.mydomain.com or will that cause trouble?

Matt


Re: Solution to Bayes poisoning, high load levels, image spam, and botnet spam

2007-06-21 Thread Craig Carriere


Matt wrote:
 First - use dummy MX records. Real mail retries. Botnet and must
 spammers don't. It's easier for them to try to spam someone else than to
 fight your filter. MX config is as follows:

 dummy - 10
 real - 20
 real-backups - 30
 dummy - 40
 dummy - 50
 dummy - 60

 Currently I have mail.mydomain.com as 10.  Can I just change that to
 20 and add mail5.mydomain.com as 10 but not have an IP associated with
 mail5.mydomain.com or will that cause trouble?

 Matt


Are you sure about this approach?  Most of what hits our backup server,
listed at a higher MX record, is spam.  I was, and am, under the
impression that many spambots are set to fire at higher MXs under the
assumption that admins might not spend as much time on the anti-spam
set-up of this servers.


Re: Solution to Bayes poisoning, high load levels, image spam, and botnet spam

2007-06-21 Thread Marc Perkel



Craig Carriere wrote:

Matt wrote:
  

First - use dummy MX records. Real mail retries. Botnet and must
spammers don't. It's easier for them to try to spam someone else than to
fight your filter. MX config is as follows:

dummy - 10
real - 20
real-backups - 30
dummy - 40
dummy - 50
dummy - 60
  

Currently I have mail.mydomain.com as 10.  Can I just change that to
20 and add mail5.mydomain.com as 10 but not have an IP associated with
mail5.mydomain.com or will that cause trouble?

Matt




Are you sure about this approach?  Most of what hits our backup server,
listed at a higher MX record, is spam.  I was, and am, under the
impression that many spambots are set to fire at higher MXs under the
assumption that admins might not spend as much time on the anti-spam
set-up of this servers.

  


Yes - the trick works two ways. If the spambots hit the high server then 
there's nothing there and they go on. If they hit the lowest numbered 
server they also get nothing and go on. A real server will hit the 
lowest number MX and get nothing and then retry and get the second 
lowest one which is real.


The trick relies on the idea that spambots unlike real server won't walk 
the MX order looking for the real server. If I were a spammer I would 
think it easier to move on to the next email address than to try to 
fight a good spam filter.




Re: bayes poisoning

2007-01-16 Thread Chris Purves

maillist wrote:
I see a few emails every-now-and-then about bayes poisoning, and am 
wondering what is means.  From what I understand, it is some message 
that gets learned (only through autolearn?) that has certain 
characteristics that throw the bayes system off.




From what I've seen there are generally two ways it is referred to:

1. random text or phrases thrown into spam to make it look like spam and 
ham look more alike:  This is an imagined problem.


2. spam incorrectly leanred as ham or ham incorrectly learned as spam: 
Enough of these (either from manual or auto learning) and your Bayes 
database will be useless.


--
Chris



Bayes Poisoning

2006-10-09 Thread Marc Perkel
I've been having problem with bayes as of late with it marming nonspam 
as spam and spam as nonspam. I think it's the damn gif file spam causing 
it. Anyone else having this problem? Any solutions?


Bayes poisoning (was Re: your mail)

2006-09-27 Thread Peter Smith

 The messages are simply a random stream of words, with punctuation
 scattered in them. No HTML, no URLs being advertised, no excessive
 capitalisation, just meaningless text.

 Technically, then, it's not spam. Spam requires a commercial message
 of some sort. :)

Yeah, I think I said 'junk' rather than spam. I wonder if such mail has a name?

 I would agree that it's an attempt to poison your bayes database,
 assuming that you have autolearn turned on, either by skewing the
 scores towards ham or by bloating the database.

Do you think the perpetrators are poisoning the bayes db with a view to sending 
spam at
a later date? We aren't a big organisation - few hundred mail boxes - so it 
seems rather
long lengths for a spammer to go to. Another suggestion was that the spammer had
intended to attach an image, which hadn't got through. Given the technical 
competence of
many spammers, it seems more likely they screwed up and forgot to attach the 
image. But
I'm just guessing here.

 Any thoughts on what I can do about these messages? Even with
 bayes turned off, they would still fail to score more than say 2
 or 3. Each message contains a different paragraph of random text,
 so it's not possible to pick out keywords; and the messages are
 coming from dialup machines, so blocking IP isn't going to be very
 effective.

 Look for punctuation? A good deal of the random bayes poison at one
 time was totally without punctuation.

I'm cautious about feeding these messages to sa-learn as spam, in case it has a 
negative
impact on genuine messages. The punctuation is pretty good - full stops every 
dozen
words or so, the odd comma. In fact, it's probably better punctuation than most 
of my
users use:) At the moment I'm just black-listing host or netblocks which this 
junk is
coming from.

Apologies for not setting a subject in my original mail by the way

Peter Smith



RE: Bayes poisoning (was Re: your mail)

2006-09-27 Thread Bowie Bailey
Peter Smith wrote:
   The messages are simply a random stream of words, with punctuation
   scattered in them. No HTML, no URLs being advertised, no excessive
   capitalisation, just meaningless text.
 
 I'm cautious about feeding these messages to sa-learn as spam, in
 case it has a negative impact on genuine messages. The punctuation is
 pretty good - full stops every dozen words or so, the odd comma. In
 fact, it's probably better punctuation than most of my users use:) At
 the moment I'm just black-listing host or netblocks which this junk
 is coming from. 

As long as you learn the messages as spam, they will have no negative
impact.  The only way these messages could cause problems is if they
get autolearned as ham instead of spam.

-- 
Bowie


Bayes poisoning ?

2005-07-22 Thread Ramprasad A Padmanabhan
Hi
  We are using Spamassassin + Postfix + Mailscanner on our SMTP servers.
Of late I have noticed that a lot of ham mails are getting a high BAYES
score.

I have overriden bayes with lower scores in order to avoid false
postives ( and possibly mail loss ) 

How do I de-poison the bayes database, and are there any ways to avoid
bayes poisoning ? 


Thanks
Ram




--
Netcore Solutions Pvt. Ltd.
Website:  http://www.netcore.co.in
Spamtraps: http://cleanmail.netcore.co.in/directory.html
--


Re: Bayes poisoning ?

2005-07-22 Thread Loren Wilton
The best thing to do is probably throw the current database away and start
over.  As you seem to have several users, you should have bayes working
again within a very few hours, or less.

You should delete the current database, reset the scores to normal (and
increase the bayes_99 score to something around 4 if you aren't using
3.0.4), and then manually train Bayes on a few hundred known ham and spam
before letting autolearning take over.

The other thing you should do is decrease bayes autolearn ham threshold to 0
or even -.1 or so.  By default it is too high, and will far too often lead
to bayes poisioning if the state of the database isn't watched carefully.
You may also want to take the bayes autolearn spam threshold up to a higher
value than it has by default; although this usually isn't required.

Loren