Re: Are spammers finally feeling some pain? (update)

2005-01-10 Thread snowjack
On Mon, 03 Jan 2005 11:06:22 -0800, [EMAIL PROTECTED] said:
 Over the past month I've seen a ~25% dropoff in the amount of spam we're
 receiving on a daily basis. Anyone else seeing a significant drop in
 spam recently? 

(Replying to self)
The amount of spam we're receiving did go back up a little after the
holidays, as some of you suggested it would. But it didn't go back up to
pre-December levels. We are seeing a consistent decrease in the amount
of spam addressed to recipients in our domain. 

Back in March, when we started rejecting high-scoring messages with a
550 at our internet gateway, we were getting 20 messages per week on
average (over 90% spam), including rejected messages. After we started
rejecting, over a period of about a month it rapidly dropped to 13,
and for most of the summer it stayed around 11-13. Through
October and early November is was steady at 11 messages per week.
Then it started dropping again:
Week ending   total received messages
Dec 5 104217
Dec 12103839
Dec 19103748
Dec 26 89378
Jan 2  80315
Jan 9  89175

I don't know whether the 550s are getting our users removed from spammer
lists, or if there's some other factor. I was thinking maybe overall
Internet spam was dropping, but from your replies last week, that
doesn't seem to be the case. Anyway, I guess we're doing *something*
right. :-)
--
  
  snowjack(a)fastmail.fm



Re: maintaining the 2.6 branch (was: [2.64] FORGED_MUA_OUTLOOK buggy)

2005-01-07 Thread snowjack
Whoops, forgot to cc the list. Sorry for the dupe, Per.

On Thu, 06 Jan 2005 09:54:32 +0100, Per Jessen [EMAIL PROTECTED]
said:
 Ron Johnson wrote:
 
  Per Jessen wrote:
Show of hands,
   who's still on 2.64 with no exact plans to upgrade?
 
 Alright, so far I've seen 4-5, maybe 6 people saying they intend to stick
 to 2.64 for the foreseeable future.  Is that really all? I'm quite
 willing myself to put an effort in in maintaining 2.64, and I'll
 probably be doing it on a personal level anyway, but to work to produce
 actual releases for others, I think a bit more of an interest is needed. 

Me too. I'm a Debian user, so I'm sticking with 2.64 as long as it's
working well. Unless 3.X goes into Sarge, which I suspect is unlikely. 
--
  
  snowjack(a)fastmail.fm



Re: maintaining the 2.6 branch (was: [2.64] FORGED_MUA_OUTLOOK buggy)

2005-01-07 Thread snowjack

On Thu, 6 Jan 2005 21:33:34 -0700, Bob Proulx [EMAIL PROTECTED] said:
 [EMAIL PROTECTED] wrote:
Per Jessen wrote:
 who's still on 2.64 with no exact plans to upgrade?
  
  Me too. I'm a Debian user, so I'm sticking with 2.64 as long as it's
  working well. Unless 3.X goes into Sarge, which I suspect is unlikely. 
 
 I am also a Debian user, running Debian woody stable, running the
 www.backports.org spamassassin-3.0.2 version and am very happy with
 it.  Running Debian stable is not a good reason to avoid upgrading
 spamassassin to the best available version.

Thus my conditional, as long as it's working well. 2.64 is working for
me, and VERY well: ~99% spam hits. I see no reason to upgrade unless the
spammers start getting around it somehow. What makes you say 3.0.2 is
the best version? Will I suddenly get an accuracy boost to 99.999%? 
 
 Running stable systems with unchanging versions of software is fine
 when you are behind firewalls and isolated from the changing internet.
 It is okay to run appliances there.  But I would go so far as to claim
 that if you are interacting with the quite hostile Internet then you
 must keep the software that is doing the interacting up to date.

You must keep on top of security vulnerabilities, yes. Asserting that
new software == more secure software is a fallacy. Remember that
security problems can be caused both by problems with the code, and
problems with your configuration. If you keep up with the security
patches, then changing your configuration all the time as the upstream
source changes can only increase your chances of introducing a
configuration error.

 Many times people are simply thinking security updates only.  But when
 talking email it also includes virus checking filters and spam
 checking filters too.
 
 Your system may be stable but the Internet is not.

Which is why good spam filtration and virus checking software gets
dynamic information from pattern update servers, RBLs, SURBL, Razor,
DCC, etc. etc. etc.

In a nutshell: if it ain't broke, don't fix it.
--
  
  snowjack(a)fastmail.fm



Are spammers finally feeling some pain?

2005-01-03 Thread snowjack
Over the past month I've seen a ~25% dropoff in the amount of spam we're
receiving on a daily basis. Anyone else seeing a significant drop in
spam recently? 
--
  
  snowjack(a)fastmail.fm



Re: Any way to block really bad SPAMs?

2005-01-03 Thread snowjack

On Mon, 03 Jan 2005 13:09:02 -0800, [EMAIL PROTECTED] said:
 On Mon, 3 Jan 2005 15:49:33 -0500, Gustafson, Tim [EMAIL PROTECTED]
 said:
  Hello
  
  I know that it's generally frowned upon to actually block SPAMs (as
  opposed to marking them as SPAM and letting the user decide) but my
  company has some instances where we get things that are blatantly,
  absolutely, unequivocally SPAM (think scores in excess of 100 points
  without BAYES or any white/blacklisting) and I wonder if there is a way
  I can configure SpamAssassin to actually block (as in, return a 550 SMTP
  error code) SPAMs that exceed some ludicrous SPAM score? 

By the way, we reject messages that score above 10 with a 550. We found
that almost 95% of spam scores over 10, and almost zero ham scores above
five. Messages scoring between 5 and 10 are accepted, tagged, and
relayed to their recipient. We definitely don't frown on rejecting
messages with high scores. I believe rejecting most spam with a 550 at
the internet gateway has been responsible for the amount of spam
addressed to our domain dropping by over 50% since we started rejecting
in March 2004. We have not had a single complaint from our users, even
about any false positives in the 5-10 range tagged by SA.

I posted earlier today about a recent 25% drop over the past month. In
January 2004 we averaged more than 200,000 messages a week, over 90%
spam. Last week we got 80,000 messages. Some people suggested that it
was due to the holidays, but it was a fairly steady decline starting in
late November, so it may have been something else. I hope it doesn't
ramp up again, but who knows, maybe it was the holidays after all. I'll
post an update next week unless anyone objects.
--
  
  snowjack(a)fastmail.fm



RE: Any way to block really bad SPAMs?

2005-01-03 Thread snowjack

On Mon, 3 Jan 2005 16:45:41 -0500, Gustafson, Tim [EMAIL PROTECTED]
said:
 Thanks for all the help everyone.  I guess the real question for me is
 how do I make spamass-milter block e-mails of a certain score, because
 that's how I integrate SpamAssassin into Sendmail.

I don't use spamass-milter, so I can't vouch for how well it works, but
a quick Google search revealed that using the -r command-line option
to spamass-milter in the /etc/init.d script will set the rejection
level.
--
  
  snowjack(a)fastmail.fm



Re: Unsubscribe?

2004-12-21 Thread snowjack
William Holman wrote:
I've been over-ruled by those who pay the bills, so I can't use
SpamAssassin since it's open source
How do I unsubscribe from the lists?
Thank-you!
I have an anti-spam product I, ah, we, ah, my COMPANY, (yeah, that's the 
ticket) call Snowjack Scanner which is at least as effective as any 
other... commercial solution. It has a Bayes filter, score averaging by 
sender, whitelisting capabilities, many effective individual rules which 
are weighted by genetic algorithm, and the ability to incorporate 
information from RBL and SURBL lookups. For only $5000, much less than 
many competing commercial solutions, I will send you this amazing 
package as soon as I can gimp a logo and slap it on a CD. I'll even set 
it up to auto-install on a bare-bones PC, with automated security 
updates, and for only $200 per hour I... uh, one of our CONSULTANTS... 
will help you integrate it into your mail systems.

--
Snowjack Consulting Services Inc. LLC. TBS. OMFG.


Re: Unsubscribe?

2004-12-21 Thread snowjack
snowjack wrote:
I have an anti-spam product I, ah, we, ah, my COMPANY, (yeah, that's the 
ticket) call Snowjack Scanner which is at least as effective as any 
other... commercial solution. It has a Bayes filter, score averaging by 
sender, whitelisting capabilities, many effective individual rules which 
are weighted by genetic algorithm, and the ability to incorporate 
information from RBL and SURBL lookups. For only $5000, much less than 
many competing commercial solutions, I will send you this amazing 
package as soon as I can gimp a logo and slap it on a CD. I'll even set 
it up to auto-install on a bare-bones PC, with automated security 
updates, and for only $200 per hour I... uh, one of our CONSULTANTS... 
will help you integrate it into your mail systems.

--
Snowjack Consulting Services Inc. LLC. TBS. OMFG.
reply to self
alertFor the humor-impaired, that was a joke. OMFG./alert
/reply


Re: can any body help me understand this

2004-12-16 Thread snowjack
Kang, Joseph S. wrote:
As for the dump output..
0.000  0108 1103190407  N:H*i:sk:NNfNNNc

[snipped for brevity]
The fourth is the token itself. SA uses some prefix characters for 
encoding things, but without any prefix, a token is a word in 
the body of 
the message.

I think you meant the FIFTH column is the token itself, right?
-Joe K.
Agreed, I think the fourth field is the timestamp at which the token was 
last seen in a message, and the fifth field is the token. The 
timestamp's used for auto-expiry runs where tokens that haven't been 
seen in a while are removed from the db if other expiration requirements 
are met.




Re: sa-learn on a 15,000 email mbox file?

2004-11-29 Thread snowjack

On Mon, 29 Nov 2004 12:01:02 -0900, Andy Firman [EMAIL PROTECTED] said:
 I just started using Spamassasin 3.0 and am very
 impressed with it.  Recently, on an old server that I 
 just started to manage,  I just found a spam
 infested mbox spool file with 15,000 spams in it. (52MB)
 Nobody had checked the mailbox in about 10 months.
 
 Is it a good idea to run sa-learn on this giant spam
 mbox file on other servers that I get SA 3.0 installed on?
 
 Or no?
 

Unless the address has never been used by a real person, you should
manually check each message to see whether it's spam. Personally, I
never have the endurance to check more than about 500 messages at a
shot. So I'd just cut it into files of a size I could manually verify
without bleeding from the eyes, delete any hammy-looking stuff I find in
each file as I go through it, and then save the verified files and use
those for bayes training.

It would be safe to do what you propose if the account is one that you
are certain will never receive legit mail, but old mail accounts *will*
still get the occasional legit message. Hey Bob, why haven't I heard
from you in the past eight months? Here's all our new customer info...

For ongoing Bayes training, I have two IMAP folders that I copy messages
into, one for ham and one for spam. Any spams scoring less than 10 get
manually copied into the spam folder (the rest of the spam is rejected
at the mail gateway). Periodically I run through a bunch of recent ham
and copy it into the ham folder. A nightly script cleans out those IMAP
folders, runs sa-learn on the messages, and copies them into ham/spam
folders on the server, so I can use those if I need a corpus of manually
verified messages.
--
  
  snowjack(a)fastmail.fm



Re: feeding spam messages for training

2004-11-20 Thread snowjack
Richard Harding wrote:
I am looking at getting messages together to train spamassassin and told 
users to forward me messages that are spam that still get through. Is 
this an ok method of collecting or will the fact that so many are 
forwarded messages throw off the training?
In short, yes, it will not work as well as if you trained using the 
original messages, because forwarding a message usually blows away all 
the header goodness and replaces with new headers. But there are ways.

This is a FAQ:
http://wiki.apache.org/spamassassin/ResendingMailWithHeaders
http://wiki.apache.org/spamassassin/SiteWideBayesFeedback


Re: Rules List

2004-11-09 Thread snowjack

On Tue, 9 Nov 2004 11:53:13 -0800, Greg Earle
[EMAIL PROTECTED] said:
SNIP
 Ergo, are there 2.63-friendly .cf files out there with SURBL/SpamCopURI
 functionality in them?

Here's my surbl.cf. Edit the scores at the bottom to your taste.

# checks to do network lookups of URLs found within spam messages
# using the most excellent DNS database at surbl.org

uri   SPAMCOP_URI_RBL 
eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+2')
describe  SPAMCOP_URI_RBL  URI's domain appears in spamcop database at
sc.surbl.org
tflagsSPAMCOP_URI_RBL  net
 
uri   WS_URI_RBL  
eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+4')
describe  WS_URI_RBL   URI's domain appears in ws.surbl.org
tflagsWS_URI_RBL   net
 
uri   PH_URI_RBL  
eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+8')
describe  PH_URI_RBL   URI's domain appears in ph.surbl.org
tflagsPH_URI_RBL   net
 
uri   OB_URI_RBL  
eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+16')
describe  OB_URI_RBL   URI's domain appears in ob.surbl.org
tflagsOB_URI_RBL   net

uri   AB_URI_RBL  
eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+32')
describe  AB_URI_RBL   URI's domain appears in ab.surbl.org
tflagsAB_URI_RBL   net

uri   JP_URI_RBL  
eval:check_spamcop_uri_rbl('multi.surbl.org','127.0.0.0+64')
describe  JP_URI_RBL   URI's domain appears in jp.surbl.org
tflagsJP_URI_RBL   net


score   SPAMCOP_URI_RBL2.4
score   WS_URI_RBL 2.0
score   PH_URI_RBL 2.4
score   OB_URI_RBL 2.0
score   AB_URI_RBL 2.4
score   JP_URI_RBL 2.4
--
  
  snowjack(a)fastmail.fm



Re: Rules List

2004-11-09 Thread snowjack

On Tue, 09 Nov 2004 15:23:08 -0500, Kris Deugau [EMAIL PROTECTED]
said:
SNIP
 Snag mine from http://www.deepnet.cx/~kdeugau/spamtools/ 

Nice meta rules (BAYES_vs_SURBL) -- I like those a lot!!!
--
  
  snowjack(a)fastmail.fm



Re: Frustration...

2004-11-05 Thread snowjack

On Fri, 05 Nov 2004 13:31:27 +0100, Kai Schaetzl
[EMAIL PROTECTED] said:
 Doing this after a spamassassin scan is useless, nevertheless. See, my 
 other reply of today.

People with high accuracy requirements would disagree with you. For some
people, including me, false positive rates for straight RBL rejection
are unacceptable. I simply can't use straight RBL rejection. Not an
option.

Rejecting the mail after the spamassassin scan is MUCH more accurate
than rejecting based on RBLs alone. And it's definitely a lot better
than letting all the spam through. Sure, you don't get any bandwidth
advantage, but when a false positive could cost you $thousands, the
bandwidth is a lot less important than the accuracy.

So, it's not useless.
--
  
  snowjack(a)fastmail.fm



Re: Frustration...

2004-11-05 Thread snowjack

On Fri, 05 Nov 2004 22:31:26 +0100, Kai Schaetzl
[EMAIL PROTECTED] said:
  wrote on Fri, 05 Nov 2004 10:09:53 -0800:
 
  People with high accuracy requirements would disagree with you. For some 
  people, including me, false positive rates for straight RBL rejection 
  are unacceptable. I simply can't use straight RBL rejection. Not an 
  option. 
 
 You didn't get my point. It's useless if not harmful to bounce a message 
 after you already got it in full. Full stop.

You can reject with 5XX without bouncing even after you receive the
message in full. I'm not talking about bouncing. Rejecting! Rejecting!

For me, the cost of a false positive FAR exceeds what the extra
bandwidth costs me to download the body of the message before making the
decision of whether to accept or reject the message. 

 I agree that combining the both mathematically should be more accurate,
 but it needs a *multitude* of system ressources and traffic in exchange for
 an accuracy increase which is almost not measurable. No good deal in my
 eyes. 

The accuracy increase is very measurable. If a false positive isn't that
big a deal for you, fine. All I'm saying is that for some people, it
really is that important, and for us the accuracy increase is worth far
more than the cost of bandwidth and processing the extra data. 

 Anyway, if you like to go this way, fine, but that doesn't change the
 fact that bouncing stuff you already have taken in full is bad. It's *no* 
 advantage to you but maybe a burden to others. I don't see any 
 rectification for that.

1) It's a *huge* advantage for me. Our e-mail filtering accuracy is very
important to us, and using SA to scan the message body makes a big
difference.

2) I don't see how the difference between sending a 5XX after receiving
the message body is more of a burden on others than sending a 5XX before
receiving the message body. Please enlighten me. I'm paying for the
extra bandwidth that's used. My ISP pays their upstream for the extra
bandwidth used, and so on. It's just extra business for them, I'm sure
they're happy to see it. 
 
--
  
  snowjack(a)fastmail.fm



Re: Frustration...

2004-11-04 Thread snowjack
On Thu, 04 Nov 2004 17:39:43 -0500, Rick Macdougall [EMAIL PROTECTED]
said:
 If you don't bounce, what do you do ?  /dev/nulling the message is not a 
 real option since mail should never just vanish, and in the case of 
 false positives, the sender would never get the rejection message.

Some definitions relating to MTA behavior:

Bounce: Your MTA accepts the message, then generates a Delivery Status
Notification message (aka DSN, aka bounce message) explaining why the
message was not delivered, and sends it to the sender address of the
undelivered message, which in the case of spam is almost certainly not
the real sender in any case

Reject: Your MTA does not accept the message, sending a 5XX to the
sending MTA, and generates no DSN.
--
  
  snowjack(a)fastmail.fm



Re: ver 3.0 opinions

2004-10-29 Thread snowjack
On Thu, 28 Oct 2004 16:19:13 -0700, Bart Schaefer
[EMAIL PROTECTED] said:
 On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey [EMAIL PROTECTED]
 wrote:
  Is version 3 really any better at stopping spam that 2.63?
 
 Version 3 stops different spam than 2.63, in my experience so far. 
 E.g. it's better at catching the drug spam but not as good at the
 earn cash for making phone calls spam.

I would say that the *default* config of 3.X is significantly more
effective than the default config of 2.63 or 2.64. But I think after
some tweaking of 2.64 it's probably just as effective as 3.0, once
you've added in the SpamCopURI patch, antidrug.cf, some of the other
SARE custom rulesets. Both of them depend to a large extent on how much
care you put into setting it up, training BAYES properly, etc.

 Using it in local only mode, though, I've found it not very different.
  The spams that get through 3.x that do not get through 2.6x are
 generally (a) those that match BAYES_99, which by itself in the
 default configuration is no longer a large enough score to make me
 happy, or (b) would have been tagged as spam except that the AWL
 smoothed them down to just below the threshhold.

Yup. But I wouldn't turn off AWL if I were you. I think it's a very nice
feature, and has probably prevented a few false positives for me. Yes,
occasionally it will pull the score the wrong way across the threshold,
but if it's doing that, you're better off figuring out why this person's
messages get an *average* score on the wrong side of your threshold
anyway. If you get that fixed, then the AWL will stop pulling messages
the wrong way across your threshold. 

I do customize my BAYES scores because I'm not very happy with the
defaults. I find that a significant portion of spam manages to reduce
its Bayes probability to 40-60% by including large chunks of innocent
text at the end of the message. So I add the below to my lines to
local.cf, because most of my ham scores below 10%, while the messages
that hit Bayes' 40-60% range are more than 95% spam. This still catches
those spams which hit BAYES_99, and for that rare ham that hits BAYES_99
(I've never seen one, but I suppose they must be out there), the AWL or
another whitelist rule will hopefully pull it back under five points.

score BAYES_00 -4.9
score BAYES_01 -2.0
score BAYES_10 -1.5
score BAYES_20 -1.0
score BAYES_30 -0.5
score BAYES_40 0.1
score BAYES_44 0.7
score BAYES_50 1.0
score BAYES_56 1.5
score BAYES_60 2.1
score BAYES_70 3.1
score BAYES_80 4.2
score BAYES_90 4.9
score BAYES_99 5.4
--
  
  snowjack(a)fastmail.fm



Re: [sa-list] Re: slightly OT: sudden rise in Rumplestiltskin attacks?

2004-10-27 Thread snowjack
On Wed, 27 Oct 2004 15:05:44 -0400 (EDT), Dan Mahoney, System Admin
[EMAIL PROTECTED] said:
 On Tue, 26 Oct 2004, Jeff Chan wrote:
 
  Pedantic nit-pick of the day:
  I'm sure you meant reject instead of bounce, right?
 
  Bounce to some means reject.  Bounce to others means forward.
 
 The distinction between forwarding and bouncing for me is the same as in 
 pine -- bounce means to simply re-email the mail to the final
 destination, 
 whereas forward means to encapsulate the original email as an attachment, 
 or enclose the email inside a new email with a (fwd) subject line, and 
 additional headers.
 
 -Dan

Well, to some, Bounce means to hit a hard surface and then spring back
the other way. But I don't think there was any ambiguity in the original
discussion, because we were talking about MTA behavior. If you want to
talk about basketball or mail clients or Majordomo or forwarding, then
the term 'bounce' might have another meaning. In MTA context, a bounce
is well defined:
http://www.wordiq.com/definition/Bounce_message

I'm picky about this one because it can make a big difference for
joe-job victims. If your server bounces messages addressed to invalid
users (which was suggested by the person I originally replied to), then
you're creating a problem by generating DSN messages which flood joe-job
victims. If your server's rejecting the messages, then the remote MTA
software might still create a DSN, but hopefully most ratware running
through a proxy doesn't bother to try notifying the bogus sender address
that the spam wasn't delivered.

It's easy to set up a mail system that bounces instead of rejects
messages with invalid addressees, especially if you have SpamAssassin
installed on a system which relays the mail to another server for final
delivery. Not an uncommon setup, especially on this list, and there are
people who aren't aware of the distinction. Apologies for the pedantry,
hopefully someone will find this educational and configure their gateway
MTA to reject mail addressed to invalid users, instead of trying to
relay to the final delivery server and then bouncing when it is refused.
--
  
  snowjack(a)fastmail.fm



Re: [OFFTOPIC] Opinions on DSPAM

2004-10-18 Thread snowjack

On Mon, 18 Oct 2004 14:32:10 -0400, Mathieu Nantel said:
 As I've read a few articles on DSPAM claiming that it's
 better/faster/sexier than spamassassin, I would appreciate having this
 list's comment on DSPAM. 
 I'm sure quite a few of you have tried it and might have some interesting 
 experiences to share. My understanding is that DSPAM relies solely on 
 algorithms (Bayes, CHI^2), and that complications arise when you have to 
 teach your users on how to train the system (which SA doesn't require as
 it's based on other things aside from Bayes).
 

Hi Mathieu,
I haven't done any carefully controlled studies, but I've been very
successful with SpamAssassin. I think SA has the better approach.
Spammers have gotten quite good at fooling the pure algorithm methods.
Some get their messages' spam probabilities down to 50% or so if not
lower, mainly by including a lot of innocuous text in their messages. If
you can also use DNS-based databases to look up IP addresses and domains
associated with known spammers, plus other rules such as known ratware
patterns in headers, keywords like 'viagra', and use that information in
combination with the Bayes and other algorithms to determine messages'
spamminess, you will have better accuracy. Blacklisting domains that are
included in the content of spam messages has been a very successful
technique for us.

Unfortunately, SA is a real memory hog and has higher hardware
requirements than DSPAM to handle the same message load. But that's not
a big issue for us. We average about 25,000 messages per day from the
Internet with ~400 users. SpamAssassin is running on a dedicated Athlon
1.5 GHz machine with about 750MB of RAM, and we haven't had any
problems. Peak RAM usage is about 600MB.

--
  
  snowjack(a)fastmail.fm



Re: Antidrug.cf

2004-10-11 Thread snowjack
How completely rude. What are you, twelve years old?
jdow wrote:
It seems anabolic steroids are flat out missed by antidrug.cf. Of course,
I observe the idiot Apache spam trap on the spamassassin list does catch
the message sample when I attach it. Somebody needs to apply a clue bat
to the Apache mail manager to get it to have this and the dev lists
bypass his antispam bazoola. What's the mail manager got, spit for brains?
{^_^}



Re: AWL auto_expire?

2004-10-09 Thread snowjack
Nate Schindler wrote:
Just a curiosity question for now - is auto-expiring the AWL a planned 
feature?
My auto-whitelist is about 3x the size of bayes_toks.  I imagine it'll 
become problematic eventually, since it's only growing.

...or is there already some way to expire old entries from the AWL, and 
i'm just a 'tard? or both?
I use this successfully with SA 2.64. I run it automatically once per 
month. (Thanks, Kris!)

http://www.deepnet.cx/~kdeugau/spamtools/trim_whitelist


Re: score changes in local.cf not recognized.

2004-10-07 Thread snowjack
[EMAIL PROTECTED] wrote:
I am running Postfix 2.1 with a content_filter (latest amavisd-new) which sends
all mail through SA 2.64.  I understand *some* variables that are defined
explicitly in amavisd-new are special, and thus have no effect when defined
(differently) in local.cf.  AFAIK, scores are not included in this restriction.
 I will ask on the amavis list, but in case I'm experiencing another pitfall or
screwing up the syntax, here goes:
I set score RCVD_IN_BL_SPAMCOP_NET 5.000 in local.cf and noticed incoming spam
still tags such emails with a score of 2.2.  Is there anything else I should
check before assuming this is an external, non-SA issue?
...Did you restart amavisd-new after making the change?


Re: score changes in local.cf not recognized.

2004-10-07 Thread snowjack
[EMAIL PROTECTED] wrote:
Quoting snowjack [EMAIL PROTECTED]:
...Did you restart amavisd-new after making the change?
Yes. :-)
Is there any evidence that local.cf is getting read at all?


Re: Memory usage spikes ...

2004-10-04 Thread snowjack

On Mon, 04 Oct 2004 10:29:38 -0700, Justin Mason [EMAIL PROTECTED] said:
 Please note that pretty much *all* our documentation notes that this is
 the case.   You should NOT scan very large messages.

We configure our spamd client to only pass spamd up to the first 50KB of
a message. Definitely helps keep memory usage under control, and doesn't
seem to hurt effectiveness at all.
-- 
  
  [EMAIL PROTECTED]



Re: SA 3.0 is eating up all my memory!!!

2004-10-02 Thread snowjack
Loren Wilton wrote:
80M doesn't strike me as unusual for spamd if you have any of the addon
rulesets. 
[EMAIL PROTECTED]@#sputter...! Yes, that is too unusual unless you're using 
ALL the addon rulesets, including BigEvil, which, I hear, eats pets and 
small children when nobody's looking, and should be avoided. And also 
probably several non-SARE rulesets too.


Re: 2.6 - 3.0 migration questions

2004-10-01 Thread snowjack
Kelson wrote:
How about ROSS: Real Open Source Software?
Bitchin' Open Source Software: BOSS
:-)


Re: Preferred DNSBL

2004-09-28 Thread snowjack

On Tue, 28 Sep 2004 08:57:28 -0500, Bob Apthorpe
[EMAIL PROTECTED] said:
 Hi,

Hello.

 On Mon, 27 Sep 2004 15:10:30 -0700 [EMAIL PROTECTED] wrote:
 
  On Mon, 27 Sep 2004 12:52:41 -0400 (EDT), Dan Mahoney, System Admin
  [EMAIL PROTECTED] said:
   Hey guys, as a quick survey, if you're blocking ips at the MTA level, 
   which are you using?
  
  I think it's a bad idea and don't do it at all. Much better to configure
  your MTA to reject mail based on a SpamAssassin score which nicely
  combines the RBLs and other spam indicators. Our MTA returns a 550 after
  the DATA is received on any message that SpamAssassin scores higher than
  10, which blocks about 90% of all spam we get (that's about 70% of all
  incoming mail, lately). 
 
 I'll counter that rejecting before DATA saves on bandwidth and CPU, and
 can be done safely with a judicious choice of DNSBLs. 

I like your choice of RBL's, but your definition of 'safely' doesn't
match up with what my users consider an acceptable number of false
positives.
-- 
  
  [EMAIL PROTECTED]



Re: Bayes keeps forgetting learned messages

2004-09-28 Thread snowjack

On Tue, 28 Sep 2004 12:46:17 -0700, Erik Wickstrom
[EMAIL PROTECTED] said:
 Hi all,
 
 2 problems.
 
 First, when I train SA on ham or spam, it seems to forget the
 counterpart.
 
 Example:
 sa-learn --mbox --showdots --ham inbox
 
 Would add say 300 hams to the Bayes DB, but turns the spam count to 0
 or a very small number and vice versa (sa-learn --dump magic)

You're not somehow accidentally using the same messages with --ham as
with --spam, are you? If SA has learned a message with --ham, then you
feed it the same message with --spam, it will un-learn the --ham tokens
it got from the first 'learning experience'...
-- 
  
  [EMAIL PROTECTED]



Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

2004-09-23 Thread snowjack
Juhapekka Tolvanen wrote:
Myth 4: PERL is designed for language processing, so
SpamAssassin is written in a more appropriate language.
Let me preface this with the fact that I've had about 10
years of experience coding PERL. While PERL is very useful
for language processing and web applications, it is also an
extremely slow, interpreted language. 
Process startup is slow. Perl is pretty efficient once the process is 
running, and a well-set-up SpamAssassin 3 configuration will already 
have the processes started before a spam is even received.

	  The average overhead
for a single PERL process is around 2MB of RAM. 
Yeah, and it is true that SpamAssassin uses lots of RAM (20M per 
process?) So what, RAM is cheap!

I really don't care about attitudes of author of DSPAM. I just want to
know, how much faster SpamAssassin will be, if its Bayesian engine is
replaced with something else, for example with DSPAM. It does not hurt,
if we try it out and see what happens. And it does not hurt, if people
have more alternatives.
I had a little single-processor 1GHz Athlon machine with 256MB RAM using 
SpamAssassin to scan about 30,000 e-mails per day for a while. That was 
pushing the RAM usage a little, but it worked fine. I've since upgraded 
to about 750MB RAM just to be safe, and our load has dropped to about 
25,000 mails per day since I started rejecting (550) the high-scoring 
messages. The DSPAM authors are making it sound like SpamAssassin is 
more of a performance problem than it really is.

If you want to know, what kind of computer I used, here are its specs:
http://iki.fi/juhtolv/eng/tietokone.eng.html
Your biggest problem on this computer is only having 64M RAM and having 
all kinds of other software (Gnome? Enlightenment? Those will use a lot 
of your 64M all by themselves!) running when you're trying to load 
SpamAssassin. Your problem is that you need more RAM, not that there's 
something wrong with SpamAssassin! Yes, DSPAM will possibly use quite a 
bit less RAM, so it might be a decent choice for you. But I doubt that 
it's really as effective as SpamAssassin.

BTW Creating SA-plugin that runs crm114 may be good thing to try
out, too. And I don't mind, if some people create bogofilter- and
SpamProbe-plugins for SA. Just do it, if you feel so. But DSPAM seems
more interesting for me. I haven't been able to try it out, because it
is not yet available as Debian-package and I haven't yet bothered to
compile it myself. SpamAssassin is packaged in Debian already, but
version 3.0 is not yet available as Debian-package.
I reiterate: It does not hurt, if we try out and see what happens.
Having SpamAssassin call some other program like DSPAM will make your 
performance much worse, because you will already have SA loaded, taking 
up a chunk of RAM, and then it is trying to load another program, which 
will use even _more_ RAM.

Your options are:
1) buy more RAM
-or-
2) quit using Gnome, Enlightenment, and SpamAssassin on that box, find a 
nice thin window manager (IceWM?) and use some low-memory-friendly spam 
scanner. There are mail clients out there that have Bayes filters built 
in. Bogofilter may also be an option.


Re: Auto White List

2004-09-23 Thread snowjack
Rick Macdougall wrote:
Yup, I understand how the whole AWL works but my problem is that border 
line spam is being dropped to ham.  Example: A normal markup of 5.6 and 
an AWL score of -0.8 drops it below the average user required_hits of 5 
and does not get marked as spam.
Right, but it's an averaging system, so if you're seeing negative AWL 
scores, that means that future mails from the same sender will be 
averaged higher, eventually auto-blacklisting that address.

I think a longer example would make it clearer. Let's say the first 
message from that sender scored 4.8, and the second and third messages 
scored 5.6, and so on. Let's see what happens:

Message 1:  4.8  (first message from this sender  IP, no AWL hit)
Message 2:  5.6  AWL:-.8 final score: 4.8
Message 3:  5.6  AWL:-.4 final score: 5.2
Message 4:  5.6  AWL:-.27final score: 5.33
Message 5:  9.8  AWL:-4.6final score: 5.4
Message 6:  3.5  AWL:+4.35   final score: 7.85
Message 7:  4.8  AWL:+2.18   final score: 6.98
Message 8:  3.5  AWL:+2.17   final score: 5.67
Message 9: 15.5  AWL:-10.1   final score: 5.4
Message 10: 3.5  AWL:+3.02   final score: 6.52
Note how every time you see a large negative number in the AWL score, 
the very next message has a significantly higher final score. The AWL 
always sets the message's final score to exactly match the average score 
of all the messages received from that sender in the past. Note how 
message #8 originally scored 3.5, but AWL gives it +2.17, while message 
#10 (also originally 3.5) gets an AWL adjustment of +3.02 because of the 
high score message #9 received.

This is great for reducing false positives from a person you correspond 
with often, who sends you something spammy-looking once in a while. Also 
note how the resulting final scores are much less erratic than they 
would be without AWL. In this example, we would have half the messages 
scoring below the spam threshold without AWL, but with AWL enabled, only 
the first one gets through.

SA developers: feel free to add this example to the wiki if you think it 
would be helpful.


Re: [sa-list] Re: DSPAM-plugin for SpamAssassin 3.* ?

2004-09-23 Thread snowjack
David Brodbeck wrote:
On Wed, 22 Sep 2004 17:26:12 -0700, snowjack wrote
Yeah, and it is true that SpamAssassin uses lots of RAM (20M per 
process?) So what, RAM is cheap!
If I'm not mistaken, some of that 20M is actually shared amongst all the 
spamd
processes, so it's not as much memory usage as you'd think.  Five spamd
processes that each claim to be using 20M may not actually be consuming a
total of 100M.  *nix is tricky that way. ;)
Hmm
# top
  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
31162 spamdata   9   0 25108  17M  6680 S17.5  2.3   0:00 spamd
31163 spamdata  14   0 24596  16M  7284 S 7.7  2.2   0:00 spamd
25108 - 6680 = 18428 KB physical RAM usage
24596 - 7284 = 17312 KB physical RAM usage
Am I missing something?


Re: What the Hell? Fw: Mail delivery failed: returning message to sender

2004-09-21 Thread snowjack
jdow wrote:
My understanding is that Earthlink servers are open so that people
who are mobile can still send mail through their Earthlink accounts.
The way they handle the spam issue is a tarpit operation. The more
mails you send in a given interval the slower the mail processes. So
Earthlink mailers can be used for spam - in very small quantities.
And you're blaming DSBL?
If what you say is true, do you suppose Earthlink has ever heard of 
SMTP-AUTH for their mobile users? I wonder how Earthlink would prevent a 
spammer from spoofing his/her IP address to a different address for each 
message.

I have a hard time believing that Earthlink doesn't use SMTP-AUTH. It's 
more likely that some legitimate (if somewhat clueless) Earthlink 
customer accidentally used DSBL software to list their own SMTP relay.


Re: Speak to me of Bayes and scoring in SA 3.0

2004-09-16 Thread snowjack
On 16 Sep 2004 13:39:30 -0700, Daniel Quinlan [EMAIL PROTECTED]
said:
 Bart Schaefer [EMAIL PROTECTED] writes:
 
  Feeding the Bayes rules through the scoring algorithm seems to imply a
  lack of trust in the accuracy of the classifier.
 
 Mostly not.  It's needed to map from the 0 to 1.0 probability to the
 SpamAssassin threshold-based scoring method.  Even in more pure Bayesian
 systems, users still have to figure out where to put stuff into the spam
 bucket and it's often not at 0.50.  Our technique avoids the problem of
 people having two different calibrations.  Plus, there's the lack of
 trust thing, but that's a lesser factor.
 
 I think we could use a better way to merge Bayesian results into the
 SpamAssassin score, though.

I thought so too... I added the following to my local.cf based on Bayes
scores of spam we receive. Spammers are really trying hard to make their
spams look hammy, but regular users are (hopefully) not trying to make
their hams look spammy. So I weighted the scores in that direction since
my Bayes engine seems much more likely to give my ham a very low score
than to give my spam a very high score. Spammers can fairly easily get
their Bayes scores down to about 50% probability, but it's much more
difficult to get them down below 40% probability since they would have
to know your particular organization's 'hammy' tokens (which would not
remain hammy for long if you're training regularly).

score BAYES_00 -4.9
score BAYES_01 -2.1
score BAYES_10 -1.5
score BAYES_20 -1.0
score BAYES_30 -0.5
score BAYES_40 0.1
score BAYES_44 0.7
score BAYES_50 1.0
score BAYES_56 1.5
score BAYES_60 2.1
score BAYES_70 3.1
score BAYES_80 4.2
score BAYES_90 4.9
score BAYES_99 5.4

-- 
  
  [EMAIL PROTECTED]



RE: Subject line

2004-09-14 Thread snowjack
On Tue, 14 Sep 2004 14:14:17 -0700, Bret Miller [EMAIL PROTECTED]
said:
 Maybe it wouldn't make *your* life easier. But because it's visual, it
 allows me to more easily discern relevance when I put more than one list
 together in a mailbox. A certain subject in one list would be more
 relevant to me than the same subject in another.

First of all, not only does it NOT make my life easier, I'm actively
annoyed by that 'feature' especially on lists with
[StupidlyLongIdentifiers]. Even the shorter names are just visual
clutter, they're the same on every message in a given thread. So, on
behalf of the mostly-silent group of people who don't care enough about
such trivialities to add to this already overlong thread, I want to
thank the list admins for keeping Subject line clutter to a minimum.

 I use Outlook so I don't have a lot of options for sorting like some
 other apps do. The question was asked; I answered it.

I don't use OutLook but I'm pretty sure I've helped one of my users
define a rule to filter messages based on a header that wasn't
pre-defined in OutLook...
-- 
  
  [EMAIL PROTECTED]