Re: FuzzyOCR only runs when specifying spamassassin -D

2009-04-28 Thread Matt Kettler
Andrew Bruce wrote:
>
> I've been looking at some of the spam emails I've received lately with
> images attached and noticed that FuzzyOCR wasn't running against them.
>
>  
>
> The same seems to be true when I take these messages and run them with:
>
> spamassassin -t < img-email.eml
>
>  
>
> However if I run them through as follows, I get FuzzyOCR showing up in
> the results:
>
> spamassassin -t -D < img-email.eml
>
Well, the rule that tripped was FUZZY_OCR_KNOWN_HASH, I'm no FuzzyOCR
expert, but I'm guessing that's related to it storing the hashes of
images attached to previous spam in a SQL database. So, in that case, it
would have fired the second time regardless of -D being enabled. It's
just firing off because it's already seen the image once before and
cataloged it as belonging on spam.

Glancing at fuzzyOCR's code for the first time, I think this is realated
to the focr_enable_image_hashing option.
>
>  
>
> I also get substantially different AWL results between the two
> (although I guess that maybe part of the debug procedure).
>
-D does not change the AWL.

The AWL score change that's a function of two things:

1) scanning the message multiple times. Every time you process it, the
AWL will change, because every scanned message gets factored into the
AWL's historical average score.

2) fuzzyOCR triggered off, raising the pre-AWL score, which is going to
drive down the AWL score. (remember, the AWL score is based on the
difference between this message and the past average). Adding +10 to the
pre-AWL (which FuzzyOCR did) score should change the AWL score by -5.0,
assuming the default AWL factor of 0.5.

You saw a total swing of  -7, so it looks like the first run raised the
average by 4.0, in turn affecting the AWL score by -2.0, and then
fuzzyOCR caused another -5.0 change in the AWL.

In both cases the AWL still "thought" the message was spam, but in the
second case it noted it had a much higher spam score than the previous
spam, so it brought it back down a bit to split the difference. That's
what the AWL does.

See also:
http://wiki.apache.org/spamassassin/AwlWrongWay
http://wiki.apache.org/spamassassin/AutoWhitelist



>  
>



Re: FuzzyOCR only runs when specifying spamassassin -D

2009-04-28 Thread René Berber
Andrew Bruce wrote:

> I've been looking at some of the spam emails I've received lately with
> images attached and noticed that FuzzyOCR wasn't running against them.
> 
[snip]
> However if I run them through as follows, I get FuzzyOCR showing up in
> the results:
> 
> spamassassin -t -D < img-email.eml
> 
[snip]
> Does anyone know why this might be happening?  I seem to recall
> experiencing this before, but can't remember what I did to fix it.

That's the way FuzzyOCR works: if a message already has scored above a
configurable threshold it doesn't scan it, if you run in debug mode the
threshold is ignored.
-- 
René Berber



FuzzyOCR only runs when specifying spamassassin -D

2009-04-28 Thread Andrew Bruce


I've been looking at some of the spam emails I've received lately with
images attached and noticed that FuzzyOCR wasn't running against them. 

The same seems to be true when I take these messages and run them with: 

spamassassin -t < img-email.eml 

However if I run them through as follows, I get FuzzyOCR showing up in the
results: 

spamassassin -t -D < img-email.eml 

I also get substantially different AWL results between the two (although I
guess that maybe part of the debug procedure). 

Does anyone know why this might be happening? I seem to recall
experiencing this before, but can't remember what I did to fix it. 

spamassassin -t: 

Content analysis details: (22.2 points, 5.0 required)

 pts rule name description
 --
--
 1.2 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
 [68.186.154.187 listed in zen.spamhaus.org]
 3.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
 0.9 RCVD_IN_SORBS_DUL
RBL: SORBS: sent directly from dynamic IP address
 [68.186.154.187 listed in dnsbl.sorbs.net]
 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
 [score: 1.]
 1.0 FH_HELO_EQ_CHARTER Helo is d-d-d-d charter.com
 4.3 HELO_DYNAMIC_HCC Relay HELO'd using suspicious hostname (HCC)
 4.4 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP addr
 2)
 0.0 FH_HELO_EQ_D_D_D_D Helo is d-d-d-d
 2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
 [Blocked - see ]
 0.0 HTML_MESSAGE BODY: HTML included in message
 0.1 RDNS_DYNAMIC Delivered to trusted network by host with
 dynamic-looking rDNS
 1.8 AWL AWL: From: address is in the auto white-list

spamassassin -t -D: 

Content analysis details: (25.7 points, 5.0 required)

 pts rule name description
 --
--
 3.0 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
 [68.186.154.187 listed in zen.spamhaus.org]
 1.2 RCVD_IN_PBL RBL:
Received via a relay in Spamhaus PBL
 0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address
 [68.186.154.187 listed in dnsbl.sorbs.net]
 3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
 [score: 1.]
 1.0 FH_HELO_EQ_CHARTER Helo is d-d-d-d charter.com
 4.3 HELO_DYNAMIC_HCC Relay HELO'd using suspicious hostname (HCC)
 4.4 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP addr
 2)
 0.0 FH_HELO_EQ_D_D_D_D Helo is d-d-d-d
 2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
 [Blocked - see ]
 0.0 HTML_MESSAGE BODY: HTML included in message
 0.1 RDNS_DYNAMIC Delivered to trusted network by host with
 dynamic-looking rDNS
 10 FUZZY_OCR_KNOWN_HASH BODY:
-5.2 AWL AWL: From: address is in the auto white-list


Re: 'anti' AWL

2009-04-28 Thread LuKreme

On 28-Apr-2009, at 20:14, Matt Kettler wrote:

The AWL uses the LAST non-private..

This is, IMO, completely broken.



Yep, have to agree.  This is seriously retarded.


--
I love as only I can, with all my heart



Re: 'anti' AWL

2009-04-28 Thread Matt Kettler
Matt Kettler wrote:
> LuKreme wrote:
>   
>> On 28-Apr-2009, at 15:38, RW wrote:
>> 
>>> It's based on the first routable IP address,
>>>   
>> Well, that's a very silly thing for it to be looking at.  It should be
>> looking at the LAST routable IP address outside of the trusted
>> network. Looking at the first routable address is completely worthless.
>> 
> It's actually based on the last IP not matching your internal_networks.
> If you haven't declared internal_networks or trusted_networks manually,
> then the auto-guesser is going to set it to be the second-to-last
> routable IP (it assumes the last routable is your MX, which may or may
> not be correct depending on how you route/firewall your DMZ.)
>
> Of course, first, or last depends on your perspective. I assume RW was
> thinking of "first" from a "starting at the inside, working backwards in
> time" approach. This is backwards, if you think about the chronology of
> the headers, like SA does. However, it makes sense from a "I'm at my
> server looking outward at the world" point of view that most folks work
> from when thinking about network topologies.
>   

Darnit, I should have checked before sending.

The AWL uses the LAST non-private..

This is, IMO, completely broken. Why are we allowing folks to declare
internal_networks if we're not going to use it, and assume the last
non-private is "external". (which, mind you, is different from what the
trust-path guesser does. It assumes that IP is your MX.)


Relevant code:

foreach my $rly (reverse (@{$pms->{relays_trusted}}, 
@{$pms->{relays_untrusted}}))
{
  next if ($rly->{ip_private});
  if ($rly->{ip}) {
$origip = $rly->{ip}; last;
  }
}






>
>
>
>
>
>
>
>
>
>
>   



Re: 'anti' AWL

2009-04-28 Thread Matt Kettler
LuKreme wrote:
> On 28-Apr-2009, at 15:38, RW wrote:
>> It's based on the first routable IP address,
>
>
> Well, that's a very silly thing for it to be looking at.  It should be
> looking at the LAST routable IP address outside of the trusted
> network. Looking at the first routable address is completely worthless.
It's actually based on the last IP not matching your internal_networks.
If you haven't declared internal_networks or trusted_networks manually,
then the auto-guesser is going to set it to be the second-to-last
routable IP (it assumes the last routable is your MX, which may or may
not be correct depending on how you route/firewall your DMZ.)

Of course, first, or last depends on your perspective. I assume RW was
thinking of "first" from a "starting at the inside, working backwards in
time" approach. This is backwards, if you think about the chronology of
the headers, like SA does. However, it makes sense from a "I'm at my
server looking outward at the world" point of view that most folks work
from when thinking about network topologies.












Re: 'anti' AWL

2009-04-28 Thread LuKreme

On 28-Apr-2009, at 15:38, RW wrote:

It's based on the first routable IP address,



Well, that's a very silly thing for it to be looking at.  It should be  
looking at the LAST routable IP address outside of the trusted  
network. Looking at the first routable address is completely worthless.



--
Adolescence is the period between childhood and adultery



Re: Physician List

2009-04-28 Thread Karsten Bräckelmann
On Tue, 2009-04-28 at 19:43 -0400, Casartello, Thomas wrote:
> Has anyone else noticed these messages as a problem? I have had a few
> complaints about messages getting through my spam filter involving
> “Physicians List in the USA” or something like that usually talking

I have seen quite a few myself. Unfortunately, they tend to slip by.
Made a first attempt at catching them, which helped -- though I do see
new variants going under the radar of a few of my meta's.

I'd be interested in getting more samples (contact me off-list first!)
by anyone, to tighten and broaden (yes, both) my local rules and drop
them publicly.

Interestingly, I seem to ever get them only on list role accounts and
non-published OSS forwarder addresses.

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



my emailBL is live!

2009-04-28 Thread Adam Katz
This was actually rather simple to set up.  I'll publish the code
(AGPL) that runs it in a bit (I need to clean it up to withstand the
heavy-handed criticism on this list ...).  Note, I'm using ZoneEdit's
free NS mirroring, which has limited bandwidth.  I'm willing to pay
their minimum threshold if it gets that popular, but any more than
that and I'll be looking for other options.  (NOT PRODUCTION GRADE!)

A SpamAssassin plugin will be needed to get it working, too ... I
suspect there are gurus here who can do that part as easily as I did
the scraper and BIND code.  If nobody bites, I'll get to it in time.

For now, we have a functional proof-of-concept.  I'll post the code, a
more formal announcement, and more documentation to my blog and
website in a few days ("a few" might be a large number).  The emailBL
syncs with the upstream every 4h (I'd reduce the TTL and increase the
syncing frequency, but I'd risk running out of bandwidth).

(Note, the DNS will take another 1-4 hours to propagate.)


The structure of the upstream list:

ADDRESS,TYPE[TYPE...],DATE

ADDRESS is an email address like 
TYPE is one or more letters of A B C D as follows:
A (reply-to)
B (from, !reply-to)
C (msg body has ADDRESS)
D (msg body has ADDRESS obfuscated)
DATE is the last time it was seen, formatted MMDD, in UTC(?).

The structure of domains in my emailBL index:

USER.DOMAIN.emailbl.khopesh.com  TXT  
USER.DOMAIN.emailbl.khopesh.com  A127.0.0.

USER is the ADDRESS's username, altered as follows:
  s/^([...@+]{1,16})[...@]*@.*/$1/;  # truncate to 16 characters
  s/^[^a-z0-9]*|[^a-z0-9]*$//g;  # fix leading/trailing chars
  s/[^-a-z.0-9]/-/g; # fix illegal chars
DOMAIN is the ADDRESS's domain
N_TYPE is a numerical version of TYPE above (A=1, B=2, C=3, D=4)

Main test points (with no space after the at sign, obviously):

test@ example.com
-> test.example.com.emailbl.khopesh.com
test@ emailbl.khopesh.com
-> test.emailbl.khopesh.com.emailbl.khopesh.com

Alternate test point (mimicking DNSBLs):

2.0.0.127.emailbl.khopesh.com


Let's pretend we're in a shell (I've spaced all emails):


# Look up TXT record (last-seen DATE) for 
$ host -t txt test.example.com.emailbl.khopesh.com.
test.example.com.emailbl.khopesh.com descriptive text "20090328"
$

# Look up A record (inclusion TYPE[s]) for 
$ host test.example.com.emailbl.khopesh.com.
test.example.com.emailbl.khopesh.com has address 127.0.0.3
test.example.com.emailbl.khopesh.com has address 127.0.0.4
test.example.com.emailbl.khopesh.com has address 127.0.0.1
test.example.com.emailbl.khopesh.com has address 127.0.0.2
$




More comments in-line:

Jesse Thompson (developer of anti-phishing-email-reply) wrote me:
> Yes, I and others have thought of it.  But I don't need it since we
> only use the list to scan log files and populate mapping tables.  I
> don't have time or money to do any of this, and I'm kept pretty
> busy just updating the list...on top of my other bazillion other
> responsibilities.
> 
> You are welcome to use the list to create your own URIBL of course.

(Jesse is BCC'd.)  And so I did.  Thanks for keeping the list updated.
 Hopefully this emailBL will open your list to new horizons.  Clearly,
credit for the real work goes to you and the other APER developers.

Rob McEwen wrote:
>>> Personally, I think the obfuscation is overkill. Instead, I'd
>>> prefer to change the "@" symbol to an underscore (and any other
>>> minor change that might be needed to work with dns queries) and
>>> be done with it. This would also make the implementation easier,
>>> and research by ISPs easire.

Mike Cardwell contended:
>> It would definitely require a hashing algorithm, like MD5. IIRC
>> there is a maximum length for a hostname, and that is 255
>> characters. What if the hostname in your email address is 255
>> characters long on it's own...?

When MD5sums were first proposed (in place of my wild escaping), it
seemed like a great idea.  However, a voice in the back of my head,
now spoken (typed?) by Rob, has been growing louder.  My
implementation now merely truncates email usernames to 16 characters
(plus the noted defanging, which makes it complicated again ...) and
replaces the @ with a dot (not an underscore, that's not a legal
character).

In fact, collisions here could be regarded as good, as usernames that
long can include tracking strings (e.g. the mailer for our list,
users-return-12345-joe=bob.com@ spamassassin.apache.org, becomes
users-return-123.spamassassin.apache.org), which should help.

I did fully implement my proposed latter 16 characters (of MD5's 32)
plus dot plus the domain, complete with hash lookups, but I just
removed it (which is why non-test lookups will fail for the next ~4h).

>> Having access to the plain text email address would only make it
>> easier for ISPs to do anything if they had access to the zone file.
>> In which case, you could just give the

Physician List

2009-04-28 Thread Casartello, Thomas
Has anyone else noticed these messages as a problem? I have had a few
complaints about messages getting through my spam filter involving
"Physicians List in the USA" or something like that usually talking about
dentists too. I made this to target it (someone on the list showed me how to
do things like this which really seems to be helping to block EDU Spear
attacks)

 

body WSC_DENTISTSCAM /Dent ists|Send an email to Slater|Directory in the
United States|have won a prize money|D.entists|Reach Dentists|Physician
Mailing List|receive money|you will have your email taken off|Physicians in
the US|Pharmaceutical Company List|List of US Hospitals|Directory of US
Dentists/i

describe WSC_DENTISTSCAM Dentist scam.

score WSC_DENTISTSCAM 15

body   WSC_DENTIST_D /dentist/i

describe   WSC_DENTIST_D Email Contains dentist

score  WSC_DENTIST_D 0.1

body   WSC_DENTIST_P /physician|MD/i

describe   WSC_DENTIST_P Email contains physician

score  WSC_DENTIST_P 0.1

body   WSC_DENTIST_L /list|directory/i

describe   WSC_DENTIST_L Email contains directory/list

score  WSC_DENTIST_L 0.1

body   WSC_DENTIST_U /United States/i

describe   WSC_DENTIST_U Email contains United States

score  WSC_DENTIST_U 0.1

meta   WSC_DENTIST_1 WSC_DENTIST_D && WSC_DENTIST_P && WSC_DENTIST_L

describe   WSC_DENTIST_1 Likely dentist/physician list spam..contains
physician, dentist, and list or directory

score  WSC_DENTIST_1 7

meta   WSC_DENTIST_2 WSC_DENTIST_D && WSC_DENTIST_P && WSC_DENTIST_L &&
WSC_DENTIST_U

describe   WSC_DENTIST_2 Very Likely dentist/physician list spam

score  WSC_DENTIST_3 10

 

Has anyone else been seeing these types of messages? 

 

Thomas E. Casartello, Jr.

Staff Assistant - Wireless Technician/Linux Administrator

Information Technology

Wilson 105A

Westfield State College

(413) 572-8245

 

Red Hat Certified Technician (RHCT)

 



smime.p7s
Description: S/MIME cryptographic signature


Re: How can I tell if the rules are being read?

2009-04-28 Thread Karsten Bräckelmann
On Tue, 2009-04-28 at 14:44 -0700, Adam Harrison wrote:
> I’m seeing a lot of mail with Viagra in the subject coming through,
> even though there is the drugs rules file(20_drugs.cf) in the upgrades
> directory(/var/lib/spamassassin/3.002004/updates_spamassassin_org).

That doesn't necessarily suffice, regardless of that name. ;)  Usually
there are lots of other things to tweak, third-party rule-sets and
plugins...

If you upload one or two full samples including all headers somewhere,
providing a link, someone is likely to have a look and can give hints.

> Is there a simple way to see what rules files are being read?

Sure. :)  Watch out for "config: read file" in the full debug stuff.
And of course any errors, warnings, strange stuff.

  $ spamassassin -D --lint 2>&1 | less

Alternatively, limit it to the config only.

  $ spamassassin -D config --lint 2>&1 | less


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: sa-compile command-line?

2009-04-28 Thread Mark
Never mind, it works. J Just calling it without any parameters

has it default do The Right ThingT.

 

-  Mark

 

 

From: Mark [mailto:ad...@asarian-host.net] 
Sent: dinsdag 28 april 2009 23:24
To: users@spamassassin.apache.org
Subject: sa-compile command-line?

 

Ok, finally got re2c compiled. :) But now sa-compile doesn't seem to

output anything. I run:

 

/usr/local/bin/sa-compile --config-file=/etc/mail/spamassassin

--updatedir=/var/db/spamassassin/

 

But no rules are being generated anywhere (that I can find). A single

command-line example in the sa-compile docs wouldn't have hurt. :) So, can

someone give me an example of a working command-line for sa-compile?

 

Thanks,

 

- Mark



How can I tell if the rules are being read?

2009-04-28 Thread Adam Harrison
I'm seeing a lot of mail with Viagra in the subject coming through, even
though there is the drugs rules file(20_drugs.cf) in the upgrades
directory(/var/lib/spamassassin/3.002004/updates_spamassassin_org).

 

Is there a simple way to see what rules files are being read?

 

Thanks,

-Adam



Re: 'anti' AWL

2009-04-28 Thread RW
On Tue, 28 Apr 2009 11:13:56 -0600
LuKreme  wrote:

> On 28-Apr-2009, at 08:56, Matus UHLAR - fantomas wrote:
> > We have more servers users send mail through. Users can't choose
> > which server will they connect.
> 
> That already happens now.

I think his point is that that doesn't currently cause a problem, but
would with your scheme. 

>  The AWL has a confidence based on number of
> messages received, right? If I get messages from b...@example.com that
> come from a variety of servers, the confidence is much lower than if
> they all come from the same server, so the adjustment is lower.

I'm not aware that it has any such concept, AFAIK the AWL score is a
 configurable fraction of average-score - current-score. 


> No, if they get spam from the SAME senders on DIFFERENT servers, the  
> AWL would go up even faster.

It's based on the first routable IP address, not the last-hop into the
trusted network, so someone using other people's wireless networks could
go through a huge number of addresses even with the same
outgoing smtp-server.

Note also that the email address and ip address used by AWL are
both forgable by spammers.


Re: Procmail Setup NOT Working

2009-04-28 Thread Theo Van Dinter
2009/4/28 Robert Ober :
> It was global and I want it to stay global.  The old procmailrc is:
>
> DROPPRIVS=yes
>
> :0fw
> | /usr/bin/spamc

That's a global config, but you're running it per-user due to the
DROPPRIVS line.  fyi.

> All I want to do now is have all the identified spam(X-Spam-Status: Yes ?)
> go to a global file instead of delivered to the users.  The global spam file
> will be readable by only myself and management.

Just create a file and set the permissions to be globally writable,
then point procmail at it.
You can set the read perms however you want.

This makes it hard for users to figure out that some of their mail is
missing though, and makes it harder for them to recover it.


sa-compile command-line?

2009-04-28 Thread Mark
Ok, finally got re2c compiled. :) But now sa-compile doesn't seem to

output anything. I run:

 

/usr/local/bin/sa-compile --config-file=/etc/mail/spamassassin

--updatedir=/var/db/spamassassin/

 

But no rules are being generated anywhere (that I can find). A single

command-line example in the sa-compile docs wouldn't have hurt. :) So, can

someone give me an example of a working command-line for sa-compile?

 

Thanks,

 

- Mark



Re: Procmail Setup NOT Working

2009-04-28 Thread John Hardin

On Tue, 28 Apr 2009, Robert Ober wrote:

All I want to do now is have all the identified spam(X-Spam-Status: Yes 
?) go to a global file instead of delivered to the users.  The global 
spam file will be readable by only myself and management.  Company owned 
systems, so no privacy implied nor should be expected.


Do you really want that mailbox file to be world-writable?

Alternative: create a spam user, and have procmail _forward_ spams to that 
user. Procmail would have to skip SA scoring and forwarding if it was 
running as that user, of course.


Then you don't need to worry about access permissions on the spam box.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Ignorance doesn't make stuff not exist.   -- Bucky Katt
---
 10 days until the 64th anniversary of VE day


Re: Procmail Setup NOT Working

2009-04-28 Thread Robert Ober

On 4/28/09 3:00 PM, Karsten Bräckelmann wrote:

On Tue, 2009-04-28 at 13:32 -0500, Robert Ober wrote:

On 4/28/09 11:34 AM, Karsten Bräckelmann wrote:




It was global and I want it to stay global.  The old procmailrc is:

DROPPRIVS=yes

:0fw
| /usr/bin/spamc


No .procmailrc for the users.  And Spamassassin is set to rewrite the 
subject with *Possible SPAM*


All I want to do now is have all the identified spam(X-Spam-Status: Yes 
?) go to a global file instead of delivered to the users.  The global 
spam file will be readable by only myself and management.  Company owned 
systems, so no privacy implied nor should be expected.


I appreciate the responses.

Thanks,
Robert A. Ober
PS: If not, how else?








Re: sa-compile problem

2009-04-28 Thread Karsten Bräckelmann
> > > I was just doing an update and compile and ran into this problem which is
> > > new, as I never had troulbe before. Error is token exceeds limit, as
> > > below. Any help would be appreciated. 
>  
> > What's your re2c version?
> 
> as below, you are correct, re2c.0.13.3 

> > > re2c: error: line 159, column 2: Token exceeds limit
   ^^^
> > > command failed! at /usr/bin/sa-compile line 288, <$fh> line 6173.
> > 
> > May I take a guess?  re2c 0.13.3 -- if so, update to 0.13.5 or newer.
> 
> many thanks for your input and quick reply.. 

No problem, glad it helped...

Always happy to do someone else's googling. ;)  Pretty clear picture, if
you ask for sa-compile and the error message.

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Procmail Setup NOT Working

2009-04-28 Thread Karsten Bräckelmann
On Tue, 2009-04-28 at 13:32 -0500, Robert Ober wrote:
> On 4/28/09 11:34 AM, Karsten Bräckelmann wrote:
> 
> >> DROPPRIVS=yes
> >
> > procmail is being run on behalf of the recipient.
> 
> Makes sense,  any way to make sure the log is writeable other that to 
> put all the users in a group?

Ah, just answered the same question at the very end. ;)

> >> LOGFILE=/var/log/procmail.log
> >> VERBOSE=yes
> >> LOGABSTRACT=all
> >
> > MAILDIR is not set, so it defaults to $HOME.
> 
> How does this apply for doing Spamassassin globally?

It doesn't. I mentioned it to point out where mail will be delivered to
by procmail. Or rather would, if the $HOME would exist...

However, there *is* a point here that matters to SA. It's not the
delivering, which is important only to your IMAP server, or whatever
else you plan to access the "spam" folders procmail delivers to.

The point that matters to SA is the existence of a $HOME. Since you told
procmail to drop privs, and do the filtering on behalf of the recipient
user, spamc will be invoked as that user, too -- and spamd will attempt
to access per-user configs, and maybe even attempt to create it.

How exactly did you do the SA filtering before?

Site-wide config and dedicated SA or mail processing user? Are these
email users real system users, or virtual? Sounds like you have been
using some site-wide setup before -- and now you just switched to a
per-user config.  Do you really want that?


> > Does your "main offsite user" even have a $HOME? What user is this being
> > run as? Check its home...
> 
> Yes, but all mail goes to /var/spool/mail.  Each user has a file there 
> under their name.

So?  See my post again, about the setting of MAILDIR and where procmail
will deliver according to your recipes. Which, BTW, does not impact the
default folder, when procmail reaches the end of the recipes. It most
likely will be the same as it currently is -- given you're doing
*per-user* processing with procmail...

Which might not be what you want to switch to. Humm...

Site-wide SA integration with procmail using a single, side-wide
quarantine folder. Anyone? :)


Did you check the SA site and wiki for some hints?


> >> SPAMFOLDER=spam
> >> :0:
> >> * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
> >> #/dev/null
> >> almost-certainly-spam
> >
> > This would deliver in *mbox* format into $MAILDIR/almost-certainly-spam
> >
> >> :0 w :$SPAMFOLDER/.lock

That lock file likely isn't writable either.

> >> * ^X-Spam-Status: Yes
> >> $SPAMFOLDER/.
> >
> > Here you specify *MH* format, delivering into $MAILDIR/spam/
> 
> Well I just copied from an article.  How do I change it for mbox?

You'd better carefully review the source you copy from. That's quite a
gross mis-configuration. Oh, and also carefully check if the source
actually applies to your case.

As for changing to mbox, see man procmailrc, last paragraph of the
section "Recipe action line".  Spoiler: mbox format will be used if you
specify a regular *file*, that's no / or /. suffix.


> >> No spam is going to the spam file in /var/spool/mail although the main
> >> offsite user did have a .lock . I even dropped the level from 8 to 5 .
> >> The main offsite user is being flooded and sees all the spam on his
> >> phone.  I even rebooted the server (Fedora Linux Core 6) last night.
> >> Also, what ownership should the logfile(procmail.log) have?  I did 660
> >> and tried mail.mail and it still complains in the maillog that it cannot
> >> write to the logfile.
> >
> > procmail is not being run as user mail. See DROPPRIVS in man procmailrc.
> 
> Will do.
> 
> > You should sort out *where* to deliver, and what *format* to use. Also
> > it seems the user procmail runs as is not allowed to write to the
> > delivery destinations -- and/or does not have a $HOME.
> 
> Sendmail with mbox.  As I stated, it was working just for rewritting the 

Well, *how* was it working before? How did you integrate SA? (see above)

> subject.  How do I set procmail to run as mail or whatever.  This is 
> unclear to me.  I want this to work globally, all spam to the same file.

Hmm, never done such a stunt, but this *could* work.  NOTE: I did NOT
try it, use on your own risk!

In the global procmailrc file, first do the filtering through spamc/d,
deliver spam to dedicated, system mbox files -- and then set DROPPRIVS
for default mail spool delivery.

Again, this is untested!

And I really don't like the idea of a global quarantine anyway, possibly
containing sensitive and private data. Who will review the spam !?


> > You will see the failed delivery attempts and falling through to the
> > next recipe / default mailbox in the procmail logs, once they are
> > writable...
> 
> Still do not understand how to do that.

Add the user to the group? Or even make it world-writable, just for
debugging purposes. But without a log, you're stabbing in the dark.
Procmail can't even complain to you, which it would loudly.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf

Re: sa-compile problem

2009-04-28 Thread Gary
On Tue, Apr 28, 2009 at 07:44:08PM +0200 or thereabouts, Karsten Bräckelmann 
wrote:

> On Tue, 2009-04-28 at 11:16 -0500, Gary wrote:
> > I was just doing an update and compile and ran into this problem which is
> > new, as I never had troulbe before. Error is token exceeds limit, as
> > below. Any help would be appreciated. 
 
> What's your re2c version?

as below, you are correct, re2c.0.13.3 
 
> > SA ~ # sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org
> > --channel updates.spamassassin.org
> > SA ~ # sa-compile
> ...
> > re2c -i -b -o scanner15.c scanner15.re
> > re2c: error: line 159, column 2: Token exceeds limit
> > command failed! at /usr/bin/sa-compile line 288, <$fh> line 6173.
> 
> May I take a guess?  re2c 0.13.3 -- if so, update to 0.13.5 or newer.

many thanks for your input and quick reply.. 
 

-- 
Gary



Re: Procmail Setup NOT Working

2009-04-28 Thread Robert Ober

On 4/28/09 11:34 AM, Karsten Bräckelmann wrote:


DROPPRIVS=yes


procmail is being run on behalf of the recipient.



Makes sense,  any way to make sure the log is writeable other that to 
put all the users in a group?



LOGFILE=/var/log/procmail.log
VERBOSE=yes
LOGABSTRACT=all


MAILDIR is not set, so it defaults to $HOME.


How does this apply for doing Spamassassin globally?


Does your "main offsite user" even have a $HOME? What user is this being
run as? Check its home...


Yes, but all mail goes to /var/spool/mail.  Each user has a file there 
under their name.





:0fw
| /usr/bin/spamc


# Mail that is very likely spam (>15) can be dropped on the floor.
# Move the # down one line to drop it.
# Note that dropping mail on the floor is a *bad*
# idea unless you really, really believe no false positives will
# have a score greater than 15.
SPAMFOLDER=spam
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
#/dev/null
almost-certainly-spam


This would deliver in *mbox* format into $MAILDIR/almost-certainly-spam


:0 w :$SPAMFOLDER/.lock
* ^X-Spam-Status: Yes
$SPAMFOLDER/.


Here you specify *MH* format, delivering into $MAILDIR/spam/



Well I just copied from an article.  How do I change it for mbox?


No spam is going to the spam file in /var/spool/mail although the main
offsite user did have a .lock . I even dropped the level from 8 to 5 .
The main offsite user is being flooded and sees all the spam on his
phone.  I even rebooted the server (Fedora Linux Core 6) last night.
Also, what ownership should the logfile(procmail.log) have?  I did 660
and tried mail.mail and it still complains in the maillog that it cannot
write to the logfile.


procmail is not being run as user mail. See DROPPRIVS in man procmailrc.


Will do.


You should sort out *where* to deliver, and what *format* to use. Also
it seems the user procmail runs as is not allowed to write to the
delivery destinations -- and/or does not have a $HOME.


Sendmail with mbox.  As I stated, it was working just for rewritting the 
subject.  How do I set procmail to run as mail or whatever.  This is 
unclear to me.  I want this to work globally, all spam to the same file.



You will see the failed delivery attempts and falling through to the
next recipe / default mailbox in the procmail logs, once they are
writable...




Still do not understand how to do that.

Thanks for the help,
Robert:-)


Re: sa-compile problem

2009-04-28 Thread Karsten Bräckelmann
On Tue, 2009-04-28 at 11:16 -0500, Gary wrote:
> I was just doing an update and compile and ran into this problem which is
> new, as I never had troulbe before. Error is token exceeds limit, as
> below. Any help would be appreciated. 

What's your re2c version?


> SA ~ # sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org
> --channel updates.spamassassin.org
> SA ~ # sa-compile
...
> re2c -i -b -o scanner15.c scanner15.re
> re2c: error: line 159, column 2: Token exceeds limit
> command failed! at /usr/bin/sa-compile line 288, <$fh> line 6173.

May I take a guess?  re2c 0.13.3 -- if so, update to 0.13.5 or newer.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: 'anti' AWL

2009-04-28 Thread LuKreme

On 28-Apr-2009, at 08:56, Matus UHLAR - fantomas wrote:

We have more servers users send mail through. Users can't choose which
server will they connect.


That already happens now.


It can also happen when user switched ISP, mail provider, or the mail
provider changes IP address, DNS names or what is used there.
This would require much more logic that is curerntly in AWL.


No it wouldn't.  The AWL has a confidence based on number of messages  
received, right? If I get messages from b...@example.com that come from  
a variety of servers, the confidence is much lower than if they all  
come from the same server, so the adjustment is lower.



This would even be useful if the original AWL entry is spammish since
multiple servers might be a sign of a botnet or host hopping, so
applying a little spammish nudge to these messages is probably  
going to

help out a lot, especially if spam...@fakedoamin.tld is sending mails
from, say, 10 different server then all those AWL mismatches are  
going to

feed each other into moving that AWL up very very fast.


The question is if users tend to repeatedly get spam from the same  
sender

through the same servers.


No, if they get spam from the SAME senders on DIFFERENT servers, the  
AWL would go up even faster.



On 28-Apr-2009, at 09:07, Jeff Mincy wrote:


Your idea will FP anytime anybody adds a new email device or the ISP
changes (etc).



That's why the adjustment would be, initially, small.

f...@example.com sends me lots of mail.  Say it's over 100.  It's all  
ham and it all comes from mail.example.com. The AWL for this email  
couplet is , say -2.1.  An email comes in from f...@example.com but  
sent from spam.spammer.tld and score 7.0.  It gets an additional,  
say, .42 (20% of the AWL) to score 7.42 instead. Now, another mail  
from f...@example.com comes in from mail.spam2.tld, this one scores  
4.3. It gets a +.42 for missing the match on mail.example com, and  
gets a +.288 for missing the match on spam.spammer.tld (1% of the AWL,  
double for being positive, doubled again for being over 5), for a  
total score of 4.3+.288+.42 = 5.08, pushing it over the spam threshold.


Now, say example.com adds a second mail server, mail2.example.com. It  
will start off with a 'penalty' of +0.708 for being an unknown  
sender.  But, if the message scores under 0, we don't adjust the AWL  
at all. If the message is over 0, yes it will have an initial penalty  
but the AWL is pretty darn good at adjusting.


Now, say another AWL entry is based on only 20 emails, instead of  
adjusting by 20% of the awl, we adjust only 4%.  (or something.  the  
point is, the more emails the AWL is based on, the more confident it  
is, and that confidence should count AGAINST messages that don't match  
the AWL).


--
When we woke up that morning we had no way of knowing that in a
matter of hours we'd changed the way we were going.  Where would
I be now? Where would I be now if we'd never met?  Would I be
singing this song to someone else instead?



Re: Procmail Setup NOT Working

2009-04-28 Thread Karsten Bräckelmann
On Tue, 2009-04-28 at 11:07 -0500, Robert Ober wrote:

> filter in Outlook.  Problem is that some users are setup to have their 
> email forwarded to their cellphone/blackberry and the spam is in that 
> inbox.  So I found some articles and decided to have the spam go to a 
> file.  The following is the new version of the /etc/procmailrc:
> 
> DROPPRIVS=yes

procmail is being run on behalf of the recipient.

> LOGFILE=/var/log/procmail.log
> VERBOSE=yes
> LOGABSTRACT=all

MAILDIR is not set, so it defaults to $HOME.

Does your "main offsite user" even have a $HOME? What user is this being
run as? Check its home...

> :0fw
> | /usr/bin/spamc
> 
> 
> # Mail that is very likely spam (>15) can be dropped on the floor.
> # Move the # down one line to drop it.
> # Note that dropping mail on the floor is a *bad*
> # idea unless you really, really believe no false positives will
> # have a score greater than 15.
> SPAMFOLDER=spam
> :0:
> * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
> #/dev/null
> almost-certainly-spam

This would deliver in *mbox* format into $MAILDIR/almost-certainly-spam

> :0 w :$SPAMFOLDER/.lock
> * ^X-Spam-Status: Yes
> $SPAMFOLDER/.

Here you specify *MH* format, delivering into $MAILDIR/spam/


> No spam is going to the spam file in /var/spool/mail although the main 
> offsite user did have a .lock . I even dropped the level from 8 to 5 . 
> The main offsite user is being flooded and sees all the spam on his 
> phone.  I even rebooted the server (Fedora Linux Core 6) last night.   
> Also, what ownership should the logfile(procmail.log) have?  I did 660 
> and tried mail.mail and it still complains in the maillog that it cannot 
> write to the logfile.

procmail is not being run as user mail. See DROPPRIVS in man procmailrc.


You should sort out *where* to deliver, and what *format* to use. Also
it seems the user procmail runs as is not allowed to write to the
delivery destinations -- and/or does not have a $HOME.

You will see the failed delivery attempts and falling through to the
next recipe / default mailbox in the procmail logs, once they are
writable...


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



sa-compile problem

2009-04-28 Thread Gary
Hi guys,

I was just doing an update and compile and ran into this problem which is
new, as I never had troulbe before. Error is token exceeds limit, as
below. Any help would be appreciated. 

SA ~ # sa-update --gpgkey 6C6191E3 --channel sought.rules.yerp.org
--channel updates.spamassassin.org
SA ~ # sa-compile
[13915] info: generic: base extraction starting. this can take a while...
[13915] info: generic: extracting from rules of type body_0
100% [==] 662.83
rules/sec 00m04s DONET
100% [==]  26.10
bases/sec 02m31s DONE
[13915] info: body_0: 3450 base strings extracted in 155 seconds
[13915] info: generic: extracting from rules of type body_500
100% [==]   5.35
rules/sec 00m00s DONE
100% [==]  48.23
bases/sec 00m00s DONE
[13915] info: body_500: 3 base strings extracted in 0 seconds
cd /tmp/.spamassassin13915f5BFZ7tmp
cd Mail-SpamAssassin-CompiledRegexps-body_0
Wide character in print at /usr/bin/sa-compile line 385, <$fh> line 5635.
Wide character in print at /usr/bin/sa-compile line 385, <$fh> line 5684.
re2c -i -b -o scanner1.c scanner1.re
re2c -i -b -o scanner2.c scanner2.re
re2c -i -b -o scanner3.c scanner3.re
re2c -i -b -o scanner4.c scanner4.re
re2c -i -b -o scanner5.c scanner5.re
re2c -i -b -o scanner6.c scanner6.re
re2c -i -b -o scanner7.c scanner7.re
re2c -i -b -o scanner8.c scanner8.re
re2c -i -b -o scanner9.c scanner9.re
re2c -i -b -o scanner10.c scanner10.re
re2c -i -b -o scanner11.c scanner11.re
re2c -i -b -o scanner12.c scanner12.re
re2c -i -b -o scanner13.c scanner13.re
re2c -i -b -o scanner14.c scanner14.re
re2c -i -b -o scanner15.c scanner15.re
re2c: error: line 159, column 2: Token exceeds limit
command failed! at /usr/bin/sa-compile line 288, <$fh> line 6173.


-- 
Gary



Procmail Setup NOT Working

2009-04-28 Thread Robert Ober

Hello Folks,

I am using Spamassassin 3.2.5 with Sendmail 8.14.1 in an installation 
for office and offsite users.  The initial setup was to have 
Spamassassin to rewrite the subject so that the users could setup a 
filter in Outlook.  Problem is that some users are setup to have their 
email forwarded to their cellphone/blackberry and the spam is in that 
inbox.  So I found some articles and decided to have the spam go to a 
file.  The following is the new version of the /etc/procmailrc:


DROPPRIVS=yes


LOGFILE=/var/log/procmail.log
VERBOSE=yes
LOGABSTRACT=all

:0fw
| /usr/bin/spamc


# Mail that is very likely spam (>15) can be dropped on the floor.
# Move the # down one line to drop it.
# Note that dropping mail on the floor is a *bad*
# idea unless you really, really believe no false positives will
# have a score greater than 15.
SPAMFOLDER=spam
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
#/dev/null
almost-certainly-spam

:0 w :$SPAMFOLDER/.lock
* ^X-Spam-Status: Yes
$SPAMFOLDER/.




No spam is going to the spam file in /var/spool/mail although the main 
offsite user did have a .lock . I even dropped the level from 8 to 5 . 
The main offsite user is being flooded and sees all the spam on his 
phone.  I even rebooted the server (Fedora Linux Core 6) last night.   
Also, what ownership should the logfile(procmail.log) have?  I did 660 
and tried mail.mail and it still complains in the maillog that it cannot 
write to the logfile.


Ideas would be most welcome.

Thanks,
Robert A. Ober




Re: emailBL

2009-04-28 Thread John Hardin

On Tue, 28 Apr 2009, Mike Cardwell wrote:

Alternatively, just stick the original email address in the 
TXT record. So in rbldnsd, you'd have a record like this:


98f22901b17b13d910456597685c1963 :127.0.0.1:the.r...@email.address


I was going to suggest that. Another thing to put in the TXT record might 
be a URL to evidence - e.g. (one of) the phishing emails containing that 
address as the contact point.


There's no advantage of sticking the email address in the TXT record 
rather than having a separate file, apart from keeping the data 
together.


Ease of access?

OTOH, if you're (not you, Mike) going to host this data, you'll probably 
have a webby interface for interactive lookups, and that might be the 
proper way to publish the evidence. If the email address typed into the 
web form hits, offer a link to view the evidence supporting the listing.


I don't think there's any reason to keep the email address or the evidence 
(suitably sanitized of the targeted victim's contact information) 
confidential.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Windows Genuine Advantage (WGA) means that now you use your
  computer at the sufferance of Microsoft Corporation. They can
  kill it remotely without your consent at any time for any reason;
  it also shuts down in sympathy when the servers at Microsoft crash.
---
 10 days until the 64th anniversary of VE day


Re: emailBL

2009-04-28 Thread Mike Cardwell

Rob McEwen wrote:


If you're worried about spammers gaming the hash system


Most likely, they won't care. They'll happily pursue the "low hanging
fruit". The only exception is if/when freemail ISPs started using such a
list to start investigating individual accounts for possible
termination. But, even then, that is a good problem to have.

Personally, I think the obfuscation is overkill. Instead, I'd prefer to
change the "@" symbol to an underscore (and any other minor change that
might be needed to work with dns queries) and be done with it. This
would also make the implementation easier, and research by ISPs easire.


It would definitely require a hashing algorithm, like MD5. IIRC there is 
a maximum length for a hostname, and that is 255 characters. What if the 
hostname in your email address is 255 characters long on it's own...?


Having access to the plain text email address would only make it easier 
for ISPs to do anything if they had access to the zone file. In which 
case, you could just give them access to a separate list which has the 
email addresses in plain text. Alternatively, just stick the original 
email address in the TXT record. So in rbldnsd, you'd have a record like 
this:


98f22901b17b13d910456597685c1963 :127.0.0.1:the.r...@email.address

Doing an A record lookup on 98f22901b17b13d910456597685c1963.example.com 
would return "127.0.0.1" and doing a TXT record returns 
"the.r...@email.address". There's no advantage of sticking the email 
address in the TXT record rather than having a separate file, apart from 
keeping the data together.


--
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)


Re: Code Rot?

2009-04-28 Thread John Hardin

On Tue, 28 Apr 2009, Matt wrote:


Steve Freegard wrote:

 Is it possible to get SVN access just to the sandboxes though? I'd be
 happy to submit rules for testing.


Ditto



+1

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Windows Genuine Advantage (WGA) means that now you use your
  computer at the sufferance of Microsoft Corporation. They can
  kill it remotely without your consent at any time for any reason;
  it also shuts down in sympathy when the servers at Microsoft crash.
---
 10 days until the 64th anniversary of VE day


Re: 'anti' AWL

2009-04-28 Thread Jeff Mincy
   From: LuKreme 
   Date: Tue, 28 Apr 2009 08:43:46 -0600
   
   OK, working on my first cup of coffee this morning, so maybe this has  
   potential.
   
   The way the AWL works is by keeping track of the origin of emails,  
   both the address and the server (the top line Received header?) that  
   send the email.  So, lets say that I have a lot of email from 
f...@example.com 
 and that foo's email is sent to me via mail.example.com.
   
   Now, I get an email claiming to be from f...@example.com but sent to me  
   from suspiciousserver.tld, so the AWL is not applied.
   
Your idea will FP anytime anybody adds a new email device or the ISP
changes (etc).

You could use the sagrey plugin to add a point to email from a new
email address+ip pairs.

-jeff


Re: emailBL

2009-04-28 Thread Rob McEwen
Ben Winslow wrote:
> If you're worried about spammers gaming the hash system

Most likely, they won't care. They'll happily pursue the "low hanging
fruit". The only exception is if/when freemail ISPs started using such a
list to start investigating individual accounts for possible
termination. But, even then, that is a good problem to have.

Personally, I think the obfuscation is overkill. Instead, I'd prefer to
change the "@" symbol to an underscore (and any other minor change that
might be needed to work with dns queries) and be done with it. This
would also make the implementation easier, and research by ISPs easire.

As with all DNSBLs, the really hard part is not listing legitimate
items. For example, consider that guy out there is probably sending
financial newsletters to his very own clients, uses his ISP's MTA for
sending, but uses a gmail "from" address. His e-mail address might have
a high chance of being mistakenly blacklisted!

The last time 2-3 times I saw this idea come up on either SA or Spam-L,
I recall that the idea was strongly shot down by a number of people for
this and other reasons. But I kept out of the discussion and I actually
thought this could be a great idea... if done right and if FPs are kept
to a minimum. I'd been planning on starting such a list for quite some
time, but it kept getting delayed by more urgent needs.

-- 
Rob McEwen
http://dnsbl.invaluement.com/
r...@invaluement.com
+1 (478) 475-9032




Re: 'anti' AWL

2009-04-28 Thread Matus UHLAR - fantomas
On 28.04.09 08:43, LuKreme wrote:
> OK, working on my first cup of coffee this morning, so maybe this has  
> potential.
>
> The way the AWL works is by keeping track of the origin of emails, both 
> the address and the server (the top line Received header?) that send the 
> email.  So, lets say that I have a lot of email from f...@example.com and 
> that foo's email is sent to me via mail.example.com.
>
> Now, I get an email claiming to be from f...@example.com but sent to me  
> from suspiciousserver.tld, so the AWL is not applied.
>
> But if I've gotten 50 emails from f...@example.com and all came through  
> mail.example.com it seems that it would be beneficial to have a 'anti'  
> AWL score score applied to this particular email, since it claims to be 
> from one place, but doesn't match the AWL entry. This, naturally would 
> start of a new AWL entry, but with a slightly higher score than  
> otherwise.

We have more servers users send mail through. Users can't choose which
server will they connect. 
It can also happen when user switched ISP, mail provider, or the mail
provider changes IP address, DNS names or what is used there.
This would require much more logic that is curerntly in AWL.

> This would even be useful if the original AWL entry is spammish since  
> multiple servers might be a sign of a botnet or host hopping, so  
> applying a little spammish nudge to these messages is probably going to 
> help out a lot, especially if spam...@fakedoamin.tld is sending mails 
> from, say, 10 different server then all those AWL mismatches are going to 
> feed each other into moving that AWL up very very fast.

The question is if users tend to repeatedly get spam from the same sender
through the same servers. 
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
If Barbie is so popular, why do you have to buy her friends? 


'anti' AWL

2009-04-28 Thread LuKreme
OK, working on my first cup of coffee this morning, so maybe this has  
potential.


The way the AWL works is by keeping track of the origin of emails,  
both the address and the server (the top line Received header?) that  
send the email.  So, lets say that I have a lot of email from f...@example.com 
 and that foo's email is sent to me via mail.example.com.


Now, I get an email claiming to be from f...@example.com but sent to me  
from suspiciousserver.tld, so the AWL is not applied.


But if I've gotten 50 emails from f...@example.com and all came through  
mail.example.com it seems that it would be beneficial to have a 'anti'  
AWL score score applied to this particular email, since it claims to  
be from one place, but doesn't match the AWL entry. This, naturally  
would start of a new AWL entry, but with a slightly higher score than  
otherwise.


This would even be useful if the original AWL entry is spammish since  
multiple servers might be a sign of a botnet or host hopping, so  
applying a little spammish nudge to these messages is probably going  
to help out a lot, especially if spam...@fakedoamin.tld is sending  
mails from, say, 10 different server then all those AWL mismatches are  
going to feed each other into moving that AWL up very very fast.


--
The Germans wore gray, you wore blue.



Re: Stop Counting!

2009-04-28 Thread LuKreme

On 28-Apr-2009, at 08:27, John ffitch wrote:

On Tue, 28 Apr 2009, LuKreme wrote:
I was thinking that, particularly for people who trash messages  
over a certain threshold and are worried about the SA overhead, a  
stop-counting threshold might be a good idea.


So, for example, for my personal mail I could set stop_counting at  
7.0, once a message hits 7.0 (with bayes) SA simply passes it along  
with a score of 7.0+ (to indicate it stopped processing) and is done.


As long as you do not have negative scores.


Oh right.  Must stop posting before coffee.

--
 we all have our moments when we lose it
 the key is though, to conceal the evidence before the police  
arrive




Re: Stop Counting!

2009-04-28 Thread John ffitch



On Tue, 28 Apr 2009, LuKreme wrote:

I was thinking that, particularly for people who trash messages over a 
certain threshold and are worried about the SA overhead, a stop-counting 
threshold might be a good idea.


So, for example, for my personal mail I could set stop_counting at 7.0, once 
a message hits 7.0 (with bayes) SA simply passes it along with a score of 
7.0+ (to indicate it stopped processing) and is done.




As long as you do not have negative scores.

This has come up before, and I seem to remember that the cost of sorting 
rules into order was considered more expensive than brute force and not 
attempting this optimisation


But others may have better memories

==John ff



Re: emailBL

2009-04-28 Thread Ben Winslow
On Tue, 28 Apr 2009 02:09:02 +0100
Steve Freegard  wrote:
> Well in the case of an emailBL - the worst that can happen is that one
> listed md5 collides with an innocent e-mail address.  By adding in the
> string length it reduces that possibility because both colliding
> addresses would have to be exactly the same length.  I believe you'll
> find that ClamAV uses this method for it's MD5 signatures - to get a
> match it has to match the MD5 and the file size has to match.

MD5 already adds the message length (in bits, as a 64-bit integer) at
the very end of the input before the hash is finalized, so adding it
again as an ASCII representation of bytes isn't really going to improve
anything.

If you're worried about spammers gaming the hash system (e.g. using a
botnet to compute an address with a hash which collides with some
target address), you should bite the bullet and use a longer hash
(something in the SHA family, maybe?)  You could make up for the extra
hash length (in terms of DNS traffic) by using a more efficient encoding
of the hash than hex (e.g. base64 or better) with the obvious caveat
that it'd be more difficult to query.

Given that most software will need new code to support an
email-address-based BL, you should give operational concerns (e.g.
bandwidth requirements) some serious thought while you have the chance.

-- 
Ben Winslow 


Stop Counting!

2009-04-28 Thread LuKreme
I was thinking that, particularly for people who trash messages over a  
certain threshold and are worried about the SA overhead, a stop- 
counting threshold might be a good idea.


So, for example, for my personal mail I could set stop_counting at  
7.0, once a message hits 7.0 (with bayes) SA simply passes it along  
with a score of 7.0+ (to indicate it stopped processing) and is done.


Or is this a silly idea?

--
Everybody hates a tourist, especially one who thinks it's all such
laugh. Yeah, and the chip stains and grease will come out in the
bath. You will never understand how it feels to live your life
with no meaning or control, and with nowhere left to go.  You
are amazed that the exist, and they burn so bright whilst you
can only wonder why.



Debugging update channels (was: sought.rules.yerp.org site down?)

2009-04-28 Thread Karsten Bräckelmann
On Sun, 2009-04-26 at 08:17 -0700, Bill Landry wrote:
> 
>dig sought.rules.yerp.org
> 
> finds no "A" record.  Although yerp.org has an "A" record, the site
> cannot be access via browser, at least not from here...

Yeah, there was another downtime, obviously fixed since.

However, just to clarify on the debugging technique -- as I mentioned
the other day in this thread, you're dig'ing up the wrong name.


sought.rules.yerp.org is *NOT* supposed to have either an A nor a TXT
record. Only with the reversed SA version prepended it does have any
record at all -- a TXT record encoding the latest channel version.

  $ host -t TXT 5.2.3.sought.rules.yerp.org
  5.2.3.sought.rules.yerp.org descriptive text "320769313"

The actual http mirror is not necessarily in the same domain, and
usually *not* the channel name (your dig above). The mirrors are cached
in the MIRRORED.BY file, and can be checked fresh with a DNS lookup:

  $ host -t TXT mirrors.sought.rules.yerp.org

Bottom line:  Please stop digging the channel. :)

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: emailBL

2009-04-28 Thread Steve Freegard
John Hardin wrote:
> 
> I suppose I should ask, what do you mean by a spammer "reversing the list"?
> 

I guess I meant that it makes it harder for the spammer if he/she gets a
copy of the list to casually look for addresses to avoid without doing
the extra work of encoding the address in the same way and looking it
up.  But with fresh eyes this morning the benefit of this is tenuous -
it just means that they have to do a bit of extra work ;-)

My idea for creating an emailBL was in the vain hope that if I could get
it to work well enough that the actual mailbox providers hosting the
dropboxes might actually use it to terminate the mailbox provided I let
them see evidence for each address (I know - probably no chance of that;
but I can hope).

I'm also thinking of doing the same with 'full URIs' that cannot be
listed by the existing URI blacklists due to the spammers abusing
services specifically to avoid the existing lists so they don't burn up
an actual domain name e.g. http://groups.yahoo.com/groupname/message/1
would be as easy as:

s...@laptop-smf:~$ perl -MDigest::MD5 -e
'$uri="http://groups.yahoo.com/groupname/message/1";; print
Digest::MD5::md5_hex($uri).length($uri).".bl.org\n"'
f499f872e8276a4777c3dba48481915a43.bl.org

Cheers,
Steve.


Re: X-Spam-Report: not wrapped sometimes

2009-04-28 Thread Karsten Bräckelmann
On Tue, 2009-04-28 at 12:21 +0200, Matus UHLAR wrote:
> I often receive see mail where X-Spam-Report header is longer than 80
> characters. This causes mutt to re-wrap the header, which causes the header
> be hardly readable. Since SA already wraps other headers, can we consider
> that as a bug or does that have an reason/option to tune?

No option to tune. Come on. ;)

After a quick look at the code, I guess I see what's going on. Actually
confirmed my suspicion looking at your headers. Probably worth filing a
low priority, enhancement / minor bug.


>   *  1.8 HTML_NONELEMENT_30_40 BODY: 30% to 40% of HTML elements are
>   *  non-standard
>   *  0.6 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image area

This shows it quite nicely. The first one is exactly 80 chars long (yes,
indeed), while the second one is shorter, only 76 chars. So why should
we wrap the latter?

I guess the problem is with the leading tab. The \t is a *single* char,
thus leading to the wrapping problem -- when displaying a tab 8 spaces
wide.

M::SA::PerMsgStatus::_process_header() calls M::SA::Util::wrap() with a
line width of 79. Simply using 72 instead is quite nasty with respect to
the first line, though. So we'd need to make wrap() smarter, at least
understanding about a leading tab's width...

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Code Rot?

2009-04-28 Thread Yet Another Ninja

On 4/28/2009 12:52 PM, Matt wrote:

Steve Freegard wrote:

Is it possible to get SVN access just to the sandboxes though? I'd be
happy to submit rules for testing.  My membership of the -dev list was
after the PreflightByMail announcement and I would have definitely used
it had I been aware of it.

  

Ditto on both counts.




met too!



Re: Code Rot?

2009-04-28 Thread Matt

Steve Freegard wrote:

Is it possible to get SVN access just to the sandboxes though? I'd be
happy to submit rules for testing.  My membership of the -dev list was
after the PreflightByMail announcement and I would have definitely used
it had I been aware of it.

  

Ditto on both counts.

matt


Re: Code Rot?

2009-04-28 Thread Steve Freegard
Justin Mason wrote:
> On Mon, Apr 27, 2009 at 17:38, John Hardin  wrote:
>> On Mon, 27 Apr 2009, Justin Mason wrote:
>>
>>> On Mon, Apr 27, 2009 at 17:03, Yet Another Ninja  wrote:
>>>
 SARE had a nice system where you could submit a rule via email and got
 the masscheck results via email. Sadly all the boxes which did this are
 dead.
>>> actually, I _did_ come up with one of those, but nobody used it :(
>>>
>>> http://wiki.apache.org/spamassassin/PreflightByMail
>> Did you announce it to the users list?
> 
> nope -- on the dev list.  A couple of SARE folks responded saying
> "cool!" though.
> 
>>> btw, don't bother trying it now -- I turned it off again after it was
>>> never used.
>> Ooo. Can it be resurrected?
>>
>> But this is only part of the problem. How difficult is it for third parties
>> to submit rules for review and inclusion in the base ruleset without
>> necessarily joining the dev group? Is posting the proposed rule to bugzilla
>> sufficient?
> 
> getting the rule into the "rulesrc" area is all that's needed.  it
> gets auto-promoted
> based on linting ok, getting good performance etc
> 
> it's a hell of a lot easier to use SVN these days though.  Would it
> really be impossible
> to do it that way?  that's as simple as
> 
>   svn up
>   edit rulesrc/sandbox/jm/20_whatever.cf
>   svn commit rulesrc/sandbox/jm/20_whatever.cf
> 
> and wait ;)
> 

Is it possible to get SVN access just to the sandboxes though?  I'd be
happy to submit rules for testing.  My membership of the -dev list was
after the PreflightByMail announcement and I would have definitely used
it had I been aware of it.

Cheers,
Steve.


X-Spam-Report: not wrapped sometimes

2009-04-28 Thread Matus UHLAR - fantomas
Hello,

I often receive see mail where X-Spam-Report header is longer than 80
characters. This causes mutt to re-wrap the header, which causes the header
be hardly readable. Since SA already wraps other headers, can we consider
that as a bug or does that have an reason/option to tune?

Examples (from 2 different systems, both 3.2.5)

X-Spam-Report:
*  0.0 MISSING_MID Missing Message-Id: header
*  0.0 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  1.1 MPART_ALT_DIFF BODY: HTML and text parts are different
*  1.8 HTML_NONELEMENT_30_40 BODY: 30% to 40% of HTML elements are
*  non-standard
*  0.6 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image 
area
*  2.8 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/)
*  0.2 URIBL_GREY Contains an URL listed in the URIBL greylist
*  [URIs: streamsend.com]


X-Spam-Report:
* -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
* -0.0 SPF_PASS SPF: sender matches SPF record
*  1.4 DATE_IN_FUTURE_96_XX Date: is 96 hours or more after Received: 
date
*  0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60%
*  [score: 0.4989]

descriptions of DATE_IN_FUTURE_96_XX and HTML_IMAGE_RATIO_02 are too long,
while HTML_NONELEMENT_30_40, URIBL_GREY and BAYES_50 are wrapped correctly.

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Eagles may soar, but weasels don't get sucked into jet engines. 


Re: emailBL

2009-04-28 Thread Henrik K
On Tue, Apr 28, 2009 at 10:51:33AM +0100, Matt wrote:
> Henrik K wrote:
>>
>> If someone wants to try it on their mail feed:
>>
>> http://sa.hege.li/pra.cf
>>
>> Don't mind the size, as optimized they only take millisecond or two to run.
>>
>> Of course when if it starts getting 10x the size, DNS will start looking
>> attractive..
>>
>>   
>
> I have been publishing a sa-update channel for this for some time
>
> the details are on Julian Field's blog (he wrote a script to do what  
> Regexp::Assemble does)
>
> http://www.jules.fm/Logbook/files/anti-spear-phishing.html

Ah nice.. though I'd rather see actually optimized regexp and not 200
separate rules. :)

What comes to my previous files: as it isn't clear to some of you, my code
is an example and I have no mention of usage or promise to update the rules.
Try at your discretion.

Hopefully someone will come up with the DNS based list, it certainly would
stop the need for costly spamassassin reloads.



Re: emailBL

2009-04-28 Thread Mike Cardwell

Henrik K wrote:

This might sound a big picky, but using backticks to call the date   
command in a perl script is horrible. Try using the standard gmtime   
function. Eg:


$date = gmtime().' (UTC)';

Rather than:

$date = `date -u`; chomp($date);

/me too busy to man perlfunc

Let this thread be an inspiration for all coders out there.

Now back to the real world..
Sorry, I assumed that if you were releasing source code to the public,  
you'd want to make sure it was cross platform compatible. I wont point  
out the various other limitations with your script then.


Are you actually serious or is this some geek humor that I don't get?


I was serious. Your code is a bit shit. I was just trying to help. Never 
mind.



If you are serious, would you be willing to audit SpamAssassin code with such
enthusiasm? It might actually _matter_.


No, I'm too busy.

--
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)


Re: emailBL

2009-04-28 Thread Henrik K
On Tue, Apr 28, 2009 at 10:31:42AM +0100, Mike Cardwell wrote:
> Henrik K wrote:
>
>>> This might sound a big picky, but using backticks to call the date   
>>> command in a perl script is horrible. Try using the standard gmtime   
>>> function. Eg:
>>>
>>> $date = gmtime().' (UTC)';
>>>
>>> Rather than:
>>>
>>> $date = `date -u`; chomp($date);
>>
>> /me too busy to man perlfunc
>>
>> Let this thread be an inspiration for all coders out there.
>>
>> Now back to the real world..
>
> Sorry, I assumed that if you were releasing source code to the public,  
> you'd want to make sure it was cross platform compatible. I wont point  
> out the various other limitations with your script then.

Are you actually serious or is this some geek humor that I don't get? If you
are serious, would you be willing to audit SpamAssassin code with such
enthusiasm? It might actually _matter_.



Re: Pyzor ?

2009-04-28 Thread Matus UHLAR - fantomas
> > > On 22.04.09 13:39, Benny Pedersen wrote:
> > > > still running here as server and client
> > 
> > On 24.04.09 15:19, Matus UHLAR - fantomas wrote:
> > > client only here. searching for PYZOR string in SA logs didn't 
> > > findanything
> > > for last two days (gotta re-check). 
> > > seems I will turn pyzor off too...
> 
> On 24.04.09 15:51, Matus UHLAR - fantomas wrote:
> > no hit for a week, at least on my employer's machines. Got some on this one.
> > Does anyone get HITS from PYZOR?

On 24.04.09 23:29, Matus UHLAR - fantomas wrote:
> OK, thank you. I see the problem is apparently on our side, I'll look for
> it.

Seems it had to do something with new pyzor servers. I added
"pyzor_options --homedir /etc/pyzor" to local.cf, ran
"pyzor --homedir /etc/pyzor discover"

and it's hitting now. I even have FP's because of pyzor :) but that should
be solved on different place :)

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Depression is merely anger without enthusiasm. 


Re: emailBL

2009-04-28 Thread Matt

Henrik K wrote:


If someone wants to try it on their mail feed:

http://sa.hege.li/pra.cf

Don't mind the size, as optimized they only take millisecond or two to run.

Of course when if it starts getting 10x the size, DNS will start looking
attractive..

  


I have been publishing a sa-update channel for this for some time

the details are on Julian Field's blog (he wrote a script to do what 
Regexp::Assemble does)


http://www.jules.fm/Logbook/files/anti-spear-phishing.html

matt


Re: emailBL

2009-04-28 Thread Mike Cardwell

Henrik K wrote:

This might sound a big picky, but using backticks to call the date  
command in a perl script is horrible. Try using the standard gmtime  
function. Eg:


$date = gmtime().' (UTC)';

Rather than:

$date = `date -u`; chomp($date);


/me too busy to man perlfunc

Let this thread be an inspiration for all coders out there.

Now back to the real world..


Sorry, I assumed that if you were releasing source code to the public, 
you'd want to make sure it was cross platform compatible. I wont point 
out the various other limitations with your script then.


--
Mike Cardwell
(https://secure.grepular.com/) (http://perlcv.com/)


Re: emailBL

2009-04-28 Thread Henrik K
On Tue, Apr 28, 2009 at 09:46:44AM +0100, Mike Cardwell wrote:
> Henrik K wrote:
>
>>> (note, I'm guessing at the appropriate mailing list for cross-post)
>>>
>>> Dennis Davis wrote:
 http://code.google.com/p/anti-phishing-email-reply/

 is also useful as it attempts to detail the compromised accounts.
 Just block/quarantine email for those accounts.
>>> Interesting ... this seems like it would be best served by DNS in a
>>> manner similar to URIBLs ... does such an "emailBL" exist?
>>
>> If someone wants to try it on their mail feed:
>>
>> http://sa.hege.li/pra.cf
>>
>> Don't mind the size, as optimized they only take millisecond or two to run.
>>
>> Of course when if it starts getting 10x the size, DNS will start looking
>> attractive..
>
> This might sound a big picky, but using backticks to call the date  
> command in a perl script is horrible. Try using the standard gmtime  
> function. Eg:
>
> $date = gmtime().' (UTC)';
>
> Rather than:
>
> $date = `date -u`; chomp($date);

/me too busy to man perlfunc

Let this thread be an inspiration for all coders out there.

Now back to the real world..



Re: emailBL

2009-04-28 Thread Mike Cardwell

Henrik K wrote:


(note, I'm guessing at the appropriate mailing list for cross-post)

Dennis Davis wrote:

http://code.google.com/p/anti-phishing-email-reply/

is also useful as it attempts to detail the compromised accounts.
Just block/quarantine email for those accounts.

Interesting ... this seems like it would be best served by DNS in a
manner similar to URIBLs ... does such an "emailBL" exist?


If someone wants to try it on their mail feed:

http://sa.hege.li/pra.cf

Don't mind the size, as optimized they only take millisecond or two to run.

Of course when if it starts getting 10x the size, DNS will start looking
attractive..


This might sound a big picky, but using backticks to call the date 
command in a perl script is horrible. Try using the standard gmtime 
function. Eg:


$date = gmtime().' (UTC)';

Rather than:

$date = `date -u`; chomp($date);

--
Mike Cardwell
(https://secure.grepular.com) (http://perlcv.com/)


Re: emailBL

2009-04-28 Thread Mike Cardwell

Dave Funk wrote:


Nah - I really don't like it that way; it doesn't really bring you any
benefit and is more likely to cause collisions if you do it that way.
Don't see how it can cause less DNS traffic either.  At least using MD5
hashes your DNS query will only be 32 characters + blacklist zone name
regardless of the size of the input string.

To reduce the likelihood of collisions then it's better to add the input
string length at the end of the md5 like ClamAV does in it's MD5 sigs 
e.g.


s...@laptop-smf:~$ perl -MDigest::MD5 -e '$email="s...@fsg.com"; print
Digest::MD5::md5_hex($email).length($email).".emailbl.org\n"'
c18782f8d94595d5e016e3ab9ab3f8f610.emailbl.org

This also has the benefit of making it impossible to reverse the list if
the spammer were to rsync the list.


Silly question, given that RFC-2181 says that you can put almost anything
you want into a DNS zone file, why go to the bother with the munging, 
why not just put the raw unadulterated e-mail address in there and do 
direct queries on it?


EG: nslookup syst...@administrativos.com.marc.icaen.uiowa.edu.

Assuming you're running reasonably up-2-date DNS stuff it does just work.


You can also put pretty much any character you want in an email address 
local part. Eg, this is a valid email address...


"Personal em...@o'Reilly, Peter"@example.com

MD5 is cryptographically secure enough for this purpose. Just hashing 
the entire address with md5 is the simplest and most workable solution. 
I expect it would be simple to use such a bl in all modern mta's without 
too much hacking. Eg, in Exim, the configuration to look up such an 
address against an emailbl called "example.com" would be (untested):


deny dnslists = example.com/${md5:$sender_address}
 message  = $sender_address is listed on $dnslist_domain

--
Mike Cardwell
(https://secure.grepular.com) (http://perlcv.com/)


Re: Code Rot?

2009-04-28 Thread Justin Mason
On Tue, Apr 28, 2009 at 02:33, RW  wrote:
> On Mon, 27 Apr 2009 18:04:36 +0100
> Justin Mason  wrote:
>
>> that's pretty much it.  low FPs and a useful number of hits (ie. over
>> 1% iirc).
>
> Unfortunately, that doesn't necessarily mean that the rule is useful.
> It's easy to create rules that match the above criteria, but most of
> them never make a difference as they only fire on spam that's already
> caught with a high score. It's much harder to create new rules that
> really make a difference - I've found that those that do are mostly
> specific to my own mail.
>
> I'm not really convinced that a *lot* of new rules are really needed,
> particularly when you consider that the main complaint against SA is
> the number cpu cycles it consumes.

yes.  we have ways to measure and mitigate this -- once we have the rules
in SVN.

--j.