SOLVED Re: Load Average Problems

2004-10-31 Thread John Fleming
> > What am I missing?  - John
>
> OK, from the "spamd --help" output:
>  -m num, --max-children num Allow maximum num children
>
> So that option is positively "a spamd thing." So how does one get that
> option into spamd? On the Mandrake test machine I have the init script
> in /etc/init.d as "spamassassin". It includes these lines:
> 

Ahhh, THAT's what's missing from my understanding!  (plus a lot of other
stuff!)

I reviewed my spamassassin init.d script and saw the options in there.  A
comment line in there directed me to /etc/default/spamassassin (specific to
Debian).  GUESS WHAT I FOUND IN THERE?

OPTIONS="-c -m 10 -a -H"

GOOD GRIEF!  -m 10 and me with 512 RAM!

Hope my load average will go down soon!!  Thanks especially to Jason and
jdow and **WAIT** - I see I just got a msg from Duncan that directed me
specifically to /etc/default/spamassassin!!!

So I think the -m option could be added in the init script OR the
/etc/default/spamassassin file, but the init script is probably overwritten
during updates, so better to use the default file.

I know Fedora Core didn't used to have the /etc/default/spamassassin file,
so that is specific to Debian.

Thanks everyone!  - John




Re: Load Average Problems

2004-10-31 Thread Duncan Findlay
On Sun, Oct 31, 2004 at 02:41:36PM -0800, jdow wrote:

[ In the future, please trim the message you are replying to so that
you only include the relavent bits. ]

> > OK, I'll bare my ignorance here in hopes of enlightenment.  I'm probably
> > lucky that I have SA working as well as I do.  I only have a loose
> > understanding of the different roles of "spamassassin", "spamc", and
> > "spamd".  I start things with /etc/init.d/spamassassin.  Then in procmail,
> I
> > pipe the msg to spamc.  In neither of these places do I see how to pass
> any
> > options to spamd.
> >
> > I've also tried:
> > # spamd -m 2
> > but this gets an error about the socket being in use.
> >
> > What am I missing?  - John
> 
> OK, from the "spamd --help" output:
>  -m num, --max-children num Allow maximum num children
> 
> So that option is positively "a spamd thing." So how does one get that
> option into spamd? On the Mandrake test machine I have the init script
> in /etc/init.d as "spamassassin". It includes these lines:
> 
> # Source spamd configuration.
> if [ -f /etc/sysconfig/spamassassin ] ; then
> . /etc/sysconfig/spamassassin
> else
> SPAMDOPTIONS="-d -c -m5 -Hi --user-config"
> fi

Sorry, Mandrake not Debian. Anyways, change options in
/etc/sysconfig/spamassassin, I think.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Load Average Problems

2004-10-31 Thread John Fleming
> SpamAssassin: Processing program.  It loads, processes, and unloads.
> SpamD: It is SpamAssassin, but doesn't unload, so it is always ready.  It
> listens for a communication from SpamC (on same or different computer).
> SpamC: It passes a message to be processed to SpamD (on same or different
> computer).
>
> So what you really want to do is get SpamD running with a line like:
>
> spamd -i 0.0.0.0 -A 192.168/16,127.0.0.1

Thanks Jason - I understood most of that previously, but it's nice to have
it summarized so well.  I've been starting spamd using
/etc/init.d/spamassassin start.  This certainly starts spamd, but doesn't
give me the option of sending any options like -m 3.

I've tried /etc/init.d/spamassassin stop, and I get the expected message
that the Daemon has stopped.  However, if I simply then try:
# spamd -m 3
It hangs with no return to prompt.  Is it not OK to just use the -m 3
option??

I do have 512 RAM, but it gets clogged a couple of times a day, and I'd like
to try limiting the spamd children.  - John




Date problems with sa-stats.pl

2004-10-31 Thread Gavin Cato
Anyone seen this? It seems bent on choosing 4pm.

The date on the box is correct. Hope I'm not missing something incredibly
obvious :)


assassin# zcat /var/log/maillog.0.gz | ./sa-stats.pl -T 15 -l - -s
'2004-10-31 00:00:00' -e '2004-10-31 23:
59:58'
Report Title : SpamAssassin - Spam Statistics
Report Date  : 2004-11-01
Period Beginning : Sun Oct 31 16:00:00 2004
Period Ending: Mon Nov  1 15:59:58 2004

Reporting Period : 24.00 hrs
--








Re: Load Average Problems

2004-10-31 Thread Duncan Findlay
On Sun, Oct 31, 2004 at 04:20:40PM -0500, John Fleming wrote:
> OK, I'll bare my ignorance here in hopes of enlightenment.  I'm probably
> lucky that I have SA working as well as I do.  I only have a loose
> understanding of the different roles of "spamassassin", "spamc", and
> "spamd".  I start things with /etc/init.d/spamassassin.  Then in procmail, I
> pipe the msg to spamc.  In neither of these places do I see how to pass any
> options to spamd.
> 
> I've also tried:
> # spamd -m 2
> but this gets an error about the socket being in use.
> 
> What am I missing?  - John

If you're using Debian, and from the sounds of it, you are, command
line options are set in /etc/default/spamassassin. Try
adding/subtracting options there.

-- 
Duncan Findlay


signature.asc
Description: Digital signature


Re: Load Average Problems

2004-10-31 Thread jdow
From: "John Fleming" <[EMAIL PROTECTED]>
> From: "jdow" <[EMAIL PROTECTED]>
> > From: "John Fleming" <[EMAIL PROTECTED]>
> >
> > > jdow said:
> > > > On another paw I note that most family tools are not left running
> > > > 24x7. If this is his case then a large portion of his 250 messages
> > > > may be coming in right after he boots. If he is setup to spawn
> > > > too many spamds then he could experience a memory crisis.
> > >
> > > That's not it.  It's mostly a family/hobby server, but it functions
> > > "fairly professionally" - I just meant I'm not an ISP or big business
> > > with thousands of emails a day.  The server's on 24/7/365 running
> > > Apache, Mailman and other common server stuff - but all at a VERY low
> > > activity/use level.
> > >
> > > I've reviewed my local.cf, and there was some duplication.  I've
> > > removed the dupes and we'll see if that helps.
> > >
> > > I call spamd via spamc in procmail.  I've read man spamc/d - I see
> > > where to limit the spamd children when using the spamd option, but I
> > > don't see how to pass that option on when using spamc.  IOW, I don't
see
> > how
> > > to limit spamd children when using spamc.
> > >
> > > Also, my procmailrc uses a lock file when evaluating the results of
> > > spamd - I guess that doesn't limit starting another spamd before
> > > that file has been evaluated?  - John
> >
> > Um, you do not limit with spamc. You simply setup the limit in spamd
when
> > you start or restart it. It is probably a good idea to play with several
> > values to see which gives you performance closest to your desired
> > performance. As soon as you get enough spamds up to trigger paging the
> > overall performance will take a serious dive. To a fairly real extent
> > a limit of two or three is probably best for single processor systems
> > modulo how much time is spent computing compared to waiting on IO for
> > any given spamd. If it is heavily compute bound 2 might be optimum.
>
> OK, I'll bare my ignorance here in hopes of enlightenment.  I'm probably
> lucky that I have SA working as well as I do.  I only have a loose
> understanding of the different roles of "spamassassin", "spamc", and
> "spamd".  I start things with /etc/init.d/spamassassin.  Then in procmail,
I
> pipe the msg to spamc.  In neither of these places do I see how to pass
any
> options to spamd.
>
> I've also tried:
> # spamd -m 2
> but this gets an error about the socket being in use.
>
> What am I missing?  - John

OK, from the "spamd --help" output:
 -m num, --max-children num Allow maximum num children

So that option is positively "a spamd thing." So how does one get that
option into spamd? On the Mandrake test machine I have the init script
in /etc/init.d as "spamassassin". It includes these lines:

# Source spamd configuration.
if [ -f /etc/sysconfig/spamassassin ] ; then
. /etc/sysconfig/spamassassin
else
SPAMDOPTIONS="-d -c -m5 -Hi --user-config"
fi

[ -f /usr/bin/spamd -o -f /usr/local/bin/spamd ] || exit 0
PATH=$PATH:/usr/bin:/usr/local/bin

# See how we were called.
case "$1" in
  start)
# Start daemon.
gprintf "Starting spamd: "
daemon spamd $SPAMDOPTIONS
RETVAL=$?
echo
[ $RETVAL = 0 ] && touch /var/lock/subsys/spamassassin
;;

The first clause sets some options either from a file in the sysconfig
directory or a default from the /etc/init.d/spamassassin file itself.
As can be seen that default includes the option "-m5". (It appears
that spamd may not be happy with the form "-m 5"?) The next clause
makes sure the /usr/bin and /usr/local/bin directories are on the
path for spamd's execution if spamd is in either of those directories.
Otherwise it leaves the PATH variable unchanged. The final clause is
the start of the case statement on the command arguments for the
/etc/init.d/spamassassin script. In the "start" argument case spamd
gets run as a daemon with the "SPAMDOPTIONS" set in the first one of
the excterpted clauses.

So basically you want to look for the location in your version of
/etc/init.d/spamassassin that holds the "spamd" starting as a daemon
instruction. Note how the parameters are assigned to it, in this case
via SPAMDOPTIONS. Look for where those parameters are assigned and
change the "-m5" (in this case) to "-m2". (Which I think I will do
with this test machine because I note it still bogs down nastily
when a collection of Linux Kernel patch files are sent to the Linux
Kernel Mailing List. That's usually a dozen to two dozen files of
varying length that hit almost all at once. With -m5 it seems the
various spamd invocations are preempting each other to death. With
-m2 there might be fewer preemptions and better overall throughput.
At least, I'm willing to try, not that the machine cracks anything
close to a serious sweat on the load I place on it, about 1000
messages a day. At that rate it's loafing for a "2GHz" Atholn with
1G of memory even with X running.

{^

false positives with IMP

2004-10-31 Thread Iain Pople
Hi,
If I send an email from IMP the headers it inserts can cause problems
with spamassassin and dialup blocking lists.
e.g.
Received: from CPE-203-45-11-59.vic.bigpond.net.au
(CPE-203-45-11-59.vic.bigpond.net.au [203.45.11.59])
by webmail.brunny.com (IMP) with HTTP
for <[EMAIL PROTECTED]>; Sun, 24 Oct 2004 10:21:07 +1000
It is inserting a header with the IP address of the computer sending the
email from IMP. The problem is that often this computer is on a dynamic
IP or dialup connection so listed in various blocking lists. This
results in triggering some of the spam assasin rules:
X-Spam-Status: No, hits=0.3 tagged_above=-99.0 required=5.5 tests=AWL,
BAYES_00, HELO_DYNAMIC_DHCP, HELO_DYNAMIC_IPADDR, NO_REAL_NAME,
RCVD_IN_NJABL_DUL
This has caused several legitimate messages to be tagged as spam.
I notice that if i use a SMTP client instead (e.g. thunderbird), the
headers are slightly different:
Received: from cpe-203-45-11-59.vic.bigpond.net.au ([203.45.11.59]) by
bwmam09.bigpond.com(MAM REL_3_4_2a 138/139363271) with SMTP id
139363271; Sun, 31 Oct 2004 14:35:01 +1000
This doesn't trigger any spam assassin rules.
Is this a peculiarity of IMP? Is there some way i can avoid this happening?
thanks, Iain.


RE: Load Average Problems

2004-10-31 Thread Jason J. Ellingson
Being pretty much a new guy to the SA scene, I think I can help you
understand which does what...

SpamAssassin is the actual processing program.  When run directly as
"spamassassin" it needs to load a PERL processor (the scripting language
it's written in), runs, and then unloads from memory when done.  This is
fine for many applications, but when you need to check a lot of email (like
many of us that host email accounts for customers) that translates into a
very slow process as you have to wait for the whole load, execute, unload
process to run.  You also must run "spamassassin" on the machine that has
the email to be scanned.

"spamd" is a "daemon" (the "d" in spamd) or service.  It is a copy of
"spamassassin" that is loaded ahead of time (usually during the computer's
boot up), and not unloaded.  So initially, you may have 5 copies (also
called children) of spamd running (5 copies of spamassassin) which is a
quick hit on resources, but from there on it is MUCH faster as it doesn't
ever need to unload and reload again for each message it needs to process.
It is always ready and waiting... plus it has code to allow it to talk to
another server that has the email that needs processing... which brings me
to...

"spamc" is a "client" (the "c" in spamc).  It is very small, so it loads
very quickly as all it has to do is simply pass the message that needs
processing/checking to the server that is running spamd and then wait for a
response from spamd on what it found.

Now, you don't need two computers to use spamc/spamd.  Many run it on the
same computer because it is faster than running spamassassin as it is always
ready to run (no load/unload waiting).

Recap:
==
SpamAssassin: Processing program.  It loads, processes, and unloads.
SpamD: It is SpamAssassin, but doesn't unload, so it is always ready.  It
listens for a communication from SpamC (on same or different computer).
SpamC: It passes a message to be processed to SpamD (on same or different
computer).

So what you really want to do is get SpamD running with a line like:

spamd -i 0.0.0.0 -A 192.168/16,127.0.0.1

the -i tells spamd to listen on all IPs available (in case the computer has
more than 1 IP)
the -A tells spamd to accept SpamC connections from the following IP/IP
blocks - in my case 192.168.x.x (any computer on my private network - I have
3 servers using SpamC to talk to it) and 127.0.0.1 (itself)

By default spamd in spamassassin 3.x will run 5 children (5 copies of
spamassassin)... which will require a 512MB machine.  You can add a "-m 3"
to make it have 3 children if you have only 256MB.

You call spamc with a line like:

spamc -d 192.168.0.13 messageresults

the -d 192.168.0.12 tells spamc that the spamd is running on the computer at
IP 192.168.0.13... the default is 127.0.0.1, so you don't need this bit if
you wanted it to talk to spamd running on the same PC.

I hope this helps you and others out.

Oh, and you "wise and knowledgeable" devs and users... feel free to correct
me if I'm wrong about anything.

Jason J Ellingson
Technical Consultant

615.301.1682 : nashville
612.605.1132 : minneapolis

www.ellingson.com
[EMAIL PROTECTED]

-Original Message-
From: John Fleming [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 31, 2004 3:21 PM
To: users@spamassassin.apache.org
Subject: Re: Load Average Problems

OK, I'll bare my ignorance here in hopes of enlightenment.  I'm probably
lucky that I have SA working as well as I do.  I only have a loose
understanding of the different roles of "spamassassin", "spamc", and
"spamd".  I start things with /etc/init.d/spamassassin.  Then in procmail, I
pipe the msg to spamc.  In neither of these places do I see how to pass any
options to spamd.

I've also tried:
# spamd -m 2
but this gets an error about the socket being in use.

What am I missing?  - John






Re: Load Average Problems

2004-10-31 Thread John Fleming

- Original Message -
From: "jdow" <[EMAIL PROTECTED]>
To: 
Sent: Saturday, October 30, 2004 10:41 PM
Subject: Re: Load Average Problems


> From: "John Fleming" <[EMAIL PROTECTED]>
>
> > jdow said:
> > > On another paw I note that most family tools are not left running
> > > 24x7. If this is his case then a large portion of his 250 messages
> > > may be coming in right after he boots. If he is setup to spawn
> > > too many spamds then he could experience a memory crisis.
> >
> > That's not it.  It's mostly a family/hobby server, but it functions
> > "fairly professionally" - I just meant I'm not an ISP or big business
> > with thousands of emails a day.  The server's on 24/7/365 running
> > Apache, Mailman and other common server stuff - but all at a VERY low
> > activity/use level.
> >
> > I've reviewed my local.cf, and there was some duplication.  I've
> > removed the dupes and we'll see if that helps.
> >
> > I call spamd via spamc in procmail.  I've read man spamc/d - I see
> > where to limit the spamd children when using the spamd option, but I
> > don't see how to pass that option on when using spamc.  IOW, I don't see
> how
> > to limit spamd children when using spamc.
> >
> > Also, my procmailrc uses a lock file when evaluating the results of
> > spamd - I guess that doesn't limit starting another spamd before
> > that file has been evaluated?  - John
>
> Um, you do not limit with spamc. You simply setup the limit in spamd when
> you start or restart it. It is probably a good idea to play with several
> values to see which gives you performance closest to your desired
> performance. As soon as you get enough spamds up to trigger paging the
> overall performance will take a serious dive. To a fairly real extent
> a limit of two or three is probably best for single processor systems
> modulo how much time is spent computing compared to waiting on IO for
> any given spamd. If it is heavily compute bound 2 might be optimum.

OK, I'll bare my ignorance here in hopes of enlightenment.  I'm probably
lucky that I have SA working as well as I do.  I only have a loose
understanding of the different roles of "spamassassin", "spamc", and
"spamd".  I start things with /etc/init.d/spamassassin.  Then in procmail, I
pipe the msg to spamc.  In neither of these places do I see how to pass any
options to spamd.

I've also tried:
# spamd -m 2
but this gets an error about the socket being in use.

What am I missing?  - John





unsuscribe

2004-10-31 Thread abusquets



recognize direct mail from dial-up/dsl pools received by TRUSTED RELAYS ?

2004-10-31 Thread Albert R. Timashev
Hi everybody,
I have SpamAssassin 3.0.1 and Exim 4.43 with exiscan-acl patch revision 28
working together on FreeBSD 4.8.
My problem is how to configure SpamAssassin to make it recognize direct mail
from dial-up/dsl (and the like) pools received not only by my own server,
BUT BY THE TRUSTED RELAYS AS WELL. As far as I understand, SpamAssassin
recognize such mail only if it is recevied by the host mentioned at the
first "Received:" line of the message hearder (I guess it refers with
"20_dnsbl_tests.cf").
Here is the detailed problem description.
My system is configured in the following way:
- there is my own dedicated server (SERVER)
- there are two trusted relays by my ISP (RELAY1,RELAY2)
- MX records for all my domains look like:
 IN MX 10 SERVER.
 IN MX 30 RELAY1.
 IN MX 50 RELAY2.
Before I have installed SpamAssassin, I used my own anti-spam protection
system based on Exim "accept/deny" rules and the Perl script executed from 
the Exim system filter for more elaborate check. My experience has shown 
that I can filter about 90% of all spam by simply rejecting direct SMTP 
sessions from dial-up/dsl and the like pools. I am still using this method 
even after installation of SpamAssassin, having "deny" rules at my Exim 
configuration, and having collected almost complete worldwide dial-up/dsl 
pools database by my own (for past two years).

But the problem is that the direct mail from dial-up/dsl pools is received
at RELAY1 and RELAY2 as well, so there is needed to parse the "Received:" 
lines of the message headers to find it out. That's what I wrote my Perl 
script for (and executed it from Exim system filter).

NOW THE QUESTION. Is there any way to make SpamAssassin consider the host
that has sent the message to RELAY1 or RELAY2 (in case the message passed
thru RELAY1 or RELAY2) as though the message was directly received by my own 
SERVER?

Or maybe someone could make another suggestion based on the above problem 
description.

The typical "Received:" lines of the header of the mail I'm talking about 
is:

Received: from HOST4 ([xx.xx.xx.xx])
by SERVER with esmtp (Exim 4.43)
id 1CHTYA-0001xD-00
for ...; Wed, 13 Oct 2004 00:47:14 +0400
Received: from HOST3 (HOST3 [xx.xx.xx.xx])
by HOST4 (8.12.6/8.12.6) with ESMTP id i9CKlEH6038945
for <...>; Wed, 13 Oct 2004 00:47:14 +0400 (MSD)
Received: from HOST2 (HOST2 [xx.xx.xx.xx])
by HOST3 (8.9.1/8.9.1) with ESMTP id AAA06856; Wed, 13 Oct 2004 00:47:13
+0400 (MSD)
Received: from HOST1 (HOST1 [xx.xx.xx.xx])
by HOST2 (8.12.9/8.12.9) with ESMTP id i9CKlDxE060674
for <...>; Wed, 13 Oct 2004 00:47:13 +0400 (MSD)
Received: by HOST1 (Postfix, from userid 1000)
id 56A0522F35A; Wed, 13 Oct 2004 00:47:07 +0400 (MSD)
Received: from RELAY1 (RELAY1 [xx.xx.xx.xx])
by HOST1 (Postfix) with ESMTP id 50CA922F361
for <...>; Wed, 13 Oct 2004 00:47:06 +0400 (MSD)
Received: from i220-221-142-101.s04.a017.ap.plala.or.jp
(i220-221-142-101.s04.a017.ap.plala.or.jp [220.221.142.101])
by RELAY1 (8.12.9/8.12.9) with ESMTP id i9CKf4lF079892
for <...>; Wed, 13 Oct 2004 00:41:04 +0400 (MSD)
where:
RELAY1 is trusted relay specified by the MX-record,
SERVER is my dedicated server,
HOST1..HOST4 are intermediate mail servers of my ISP.
I need SpamAssassin to recognize receiving mail from
i220-221-142-101.s04.a017.ap.plala.or.jp by RELAY1 as "dialup sender did
non-local SMTP" or something like that, so I will specify and score it at my
"local.cf".
Regards,
Albert R. Timashev
St. Petersburg, Russia 



Problems with URI DNS tests in spamd?

2004-10-31 Thread Carl R. Friend
   G'day.

   Has anyone else noticed problems with the DNS-based URI tests in
SpamAssassin 3.0.[01]?  Specifically, running on Solaris 9, Perl 5.8.5,
and SpamAssassin 3.0.0 and .1 that the "urirhssub" and "uridnsbl" tests
are not even being called from spamd.  However, they work fine from a
direct spamassassin invocation.

   To wit, here's the test results from a spam that contained a link
to a well known spamvertised "destination".  First the spamassassin
run from the command line:

bash-2.05$ cat foo93 | /usr/local/bin/spamassassin
X-Spam-Checker-Version: SpamAssassin 3.0.1-crf_3.0.1_20041031_00 (2004-10-22) 
on t1
X-Spam-Level: 
X-Spam-Status: Yes, score=9.6 required=4.5 tests=FORGED_RCVD_HELO,URIBL_CACM,
URIBL_SBL,URIBL_WS_SURBL autolearn=disabled 
version=3.0.0-rc2-crf_3.0.0_rc2_20040807_00

   That's just fine, and is as it should be -- flagged as spam.  Now
here's the run through spamc/spamd:

bash-2.05$ cat foo93 | /usr/local/bin/spamc -d spamassassin
X-Spam-Checker-Version: SpamAssassin 3.0.1-crf_3.0.1_20041031_00 (2004-10-22) 
on t1
X-Spam-Level: 
X-Spam-Status: No, score=0.1 required=4.5 tests=FORGED_RCVD_HELO 
autolearn=disabled version=3.0.1-crf_3.0.1_20041031_00

   To see if the tests were even being tried, I sniffed on port 53
on my DNS server and never saw requests for the various URIBLs.

   Ideas?

++-+
| Carl Richard Friend (UNIX Sysadmin)| West Boylston   |
| Minicomputer Collector / Enthusiast| Massachusetts, USA  |
| mailto:[EMAIL PROTECTED]+-+
| http://users.rcn.com/crfriend/museum   | ICBM: 42:22N 71:47W |
++-+



Re: Bayes learn (sa-learn)

2004-10-31 Thread Rick Macdougall

Greg T. wrote:
I use Cyrus-imapd as my imap server, and it is a mail
store, not a maildir or a mailfile system.
I don't have access to the actual mail messages on the
server, as they are kept in a database.  

I have several spams that keep coming in under the
SpamAssassin radar and being delivered.  These are
porn and are particularly objectionable.  I'd like to
teach Bayes how to spot them as spam, but don't know
how to use the sa-learn utility with Cyrus.  Any help
out there?
Hi,
imap-sa-learn.pl
You can find it here.
http://tirian.magd.ox.ac.uk/~nick/code/
Regards,
Rick


Bayes learn (sa-learn)

2004-10-31 Thread Greg T.
I use Cyrus-imapd as my imap server, and it is a mail
store, not a maildir or a mailfile system.

I don't have access to the actual mail messages on the
server, as they are kept in a database.  

I have several spams that keep coming in under the
SpamAssassin radar and being delivered.  These are
porn and are particularly objectionable.  I'd like to
teach Bayes how to spot them as spam, but don't know
how to use the sa-learn utility with Cyrus.  Any help
out there?

Greg



__
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 


Re: ver 3.0 opinions

2004-10-31 Thread Tuc at Beach House
> > But shouldn't it have carried my database over from my previous install? 
> > I'd been using it for atleast 6 months on different versions before this
> > upgrade.  Did it 'forget' it all?  Do I need to totally retrain it?  I'm
> > using a stock install from FreeBSD ports, no local/global overrides.
> 
> Looks like somebody didn't read the UPGRADE doc...
> 
>  Due to the database format change, you will want to do something like
>   this when upgrading:
> 
I did see that, but I also saw :

- The Bayesian storage modules have been completely re-written and now
  include Berkeley DB (DBM) storage as well as SQL based storage (see
  sql/README.bayes for more information).  In addition, a new format
  has been introduced for the bayes database that stores tokens in fixed
  length hashes (Bayes v3).

**
  All DBM databases should be automatically
  converted to this new format the first time they are opened for write.
**

  You can manually perform the upgrade by running "sa-learn --sync"
  from the command line.

So I thought the rest of the instructions were if I just wanted to
do it before it did it itself.

Thanks, Tuc


RE: Pardon the old messages

2004-10-31 Thread marti
-Original Message-
From: Jeff Chan [mailto:[EMAIL PROTECTED] 
Sent: 31 October 2004 13:31
To: SpamAssassin Users
Subject: Pardon the old messages

I fscked up and resent some old messages from April.  Please discard those.

Embarrassed,

Jeff C.
--

Well I'm quite new to this mailing, so might make some interesting reading,
for an otherwise boring Sunday ;)

Martin



Pardon the old messages

2004-10-31 Thread Jeff Chan
I fscked up and resent some old messages from April.  Please
discard those.

Embarrassed,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



RE: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Michele Neylon :: Blacknight Solutions
Raymond Dijkxhoorn wrote:
> Hi!
> 
>> Hello SpamAssassin Users,
>> I'm pleased to announce a new type of RBL for blocking messages based
>> on spam domains contained in message bodies called SURBL.
>> Unlike other RBLs, the Spam URI RBL (SURBL) is not used to block spam
>> server IP addresses, but instead to block messages based on
> 
> Ouch, seems Jeff has problems with his setup. This is really old mail.
> 
> Bye,
> Raymond.

I was wondering! 

Mr Michele Neylon
Blacknight Internet Solutions Ltd
Hosting, co-location & domains
http://www.blacknight.ie/
Tel. +353 59 9137101
Proud sponsors of MM04 {http://www.mm04.net}


-- 
Email scanned by Blacknight for viruses and dangerous content.
Visit http://www.blacknight.ie for more information



Re: test.surbl.org?

2004-10-31 Thread Jeff Chan
On Monday, April 19, 2004, 5:01:01 PM, Mark Mark wrote:
> What about Matt's comment, though?

>> AFAIK it's invalid to
>> have a query to anything but '*.*.*.*.sc.surbl.org' or
>> '*.*.*.*.ws.surbl.org', where *.*.*.* is an IP address in reverse
>> order as per in-addr.arpa queries.

> Was not the whole point to do lookups on things like
> "domainundertest.com.sc.surbl.org"? If I have to find an IP address for
> "domainundertest.com" first, then do a regular RBL lookup, it is no longer a
> domain name lookup, really.

Yes, that points out a couple important differences between
SURBLs and other RBLs.

You're correct that URI domain names found in message bodies
should not get name resolution done on them before getting
checked against a SURBL.  That's already somewhat different
from other RBLs, most of which tend to have numeric data which
must be converted from names before use.  For example
mailserver.openrelay.com would need to be resolved into its IP
address before it could be checked in most number-based open
relay RBLs.

Another difference (oddity?) of SURBLs is that they have both
name and number entries in the same list.  The idea is that
if a numeric URI like http://1.2.3.4/ is found in an incoming
message body, it's checked against the *same* SURBL list but
with the octets reversed per RBL convention.  So that URI
would get checked against SURBL as 4.3.2.1.sc.surbl.org.

Note that in both cases, whatever URI is found in a message
is checked against the SURBL without any name resolution.
In that sense the SURBL (and use of it) captures the data
directly and without much processing (other than removing
host names, randomized subdomains and other non-core stuff).

I like to think it's being faithful to the data to do things
this way, but other people have argued for splitting the
names and numbers into separate lists.  My arguments against
that are that there are very few numbers in the data, so a
numbers-only list would be small, and that this keeps a single
query and single database of all the spam URIs.  I'm sure
good arguments can be made on both sides for splitting the
list, but I think I'll probably keep it together.

Jeff C.



SURBL Implemetation Guidelines

2004-10-31 Thread Jeff Chan
We're made a document describing some of the general properties
which code using SURBLs should have in order to use the data as
it was designed and intended.  We hope these comments may be
useful to developers.  Our Implementation Guidelines are brief
and copied below.


  http://www.surbl.org/implementation.html

Implementation Guidelines

Here are some very brief guidelines for folks writing software to
use SURBL lists. Your code should: 

   1. Extract URIs from message bodies. (Extraction of URIs from
message bodies should ideally include full resolution of
redirections into the final target domain name. This can be a
non-trivial problem.)

   2. Extract base (registrar) domains from those URIs. This
includes removing any and all leading host names, subdomains,
www., randomized subdomains, etc. In order to determine the base
domain it may be necessary to use a table of country code TLDs
(ccTLDs) such as the partially-imcomplete one SURBL uses.

   3. Not do name resolution on the domains.

   4. Look up the domain name in the SURBL by prepending it to
the name of the SURBL, e.g., domainundertest.com.sc.surbl.org,
then doing Address record DNS resolution on the resulting
combined name. A non-result indicates lack of inclusion in the
list. A result of 127.0.0.2 represents inclusion, i.e., probable
spam.

   5. Handle numeric IPs in URIs similarly, but reverse the octet
ordering before comparison against the RBL. This is standard
practice for RBLs. For example, http://1.2.3.4/ is checked as
4.3.2.1.sc.surbl.org. 

SURBL lists unusually have both names and numbers in the same
list. For example, 2.0.0.127 and test.surbl.org and similar
actual spam domains and addresses are both in all SURBL lists.
Numbered addresses in SURBLs should have occurred in spams as
numbers, e.g.: literally http://1.2.3.4/. Additional SURBL test
points are mentioned in the News & Notes section.
__

Please send me any comments, updates, revisions, corrections,
questions, etc...

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL web site updated, Letter to Redirectors added

2004-10-31 Thread Jeff Chan
On Monday, April 19, 2004, 11:31:47 AM, Kelson Vibber wrote:
> At 09:31 AM 4/19/2004, Kai Schaetzl wrote:
>>I think we are talking of different redirectors here. That kind:
>>http://www.yahoo.com/gotourl?spammerdomain.com

> ... which needs to be set up with to redirect only to those sites they're 
> using.

>>or
>>http://www.shorl.com/dfsdshjlk

> ... which needs to block known spam sites.

Kai identifies a couple different types of redirection
sites, the latter of which Devin Carraway descriptively termed
"opaque" and Kelson suggests two related remedies for the
redirection sites, which I agree with.

> tinyurl.com, for instance has a 
> statement that it will disable any redirection found to be used in spam and 
> report it to ISPs, gov't, etc.  (Preventing creation of the URL would be 
> even better.)  Other redirectors of this type need to follow similar policies.

tinyurl's policy sounds great.  The open letter was an attempt to
come up with something to tell other redirectors to convince them
to also block spammers.  :-)  I hope people will contact the
redirectors regarding spammers' abuse of their services.

Jeff C.



Re: test.surbl.org?

2004-10-31 Thread Jeff Chan
On Monday, April 19, 2004, 12:31:38 PM, Ryan Moore wrote:

> You're not querying the RBL itself. You would want to look for
> "test.surbl.org" being in the RBl. To do so you would do "nslookup 
> test.surbl.org.sc.surbl.org", which does indeed return a positive 
> (127.0.0.2) result.

Thanks Ryan!  Also, once your SURBL-using system is up and
running, sending a test message with a URL like:

  http://test.surbl.org-WITHOUT-THIS-MUNGING/

should get it scored/marked as spam.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: FP in ws.surbl.org

2004-10-31 Thread Jeff Chan
On Sunday, April 18, 2004, 11:12:26 PM, William Stearns wrote:
> On Mon, 19 Apr 2004, Marc Kool wrote:
>> I got a FP using ws.surbl.org.
>> Where can I get to to get it removed ?

> As a general rule, you should contact me directly for false
> positive reports.  Please include the word "blacklist" in the subject.

> I've removed it, and the updated version should be out in a few 
> minutes.  Sorry for the error.

Thanks Bill.  For FPs in ws.surbl.org please contact Bill.
Bill's list policies may be of interest:

  http://www.stearns.org/sa-blacklist/README.policy

For FPs in sc.surbl.org please contact whitelist at surbl dot org

  http://www.surbl.org/contact.html

Jeff C.



Re: [SURBL-Discuss] RFC: SURBL software implemetation guidelines

2004-10-31 Thread Jeff Chan
On Sunday, April 18, 2004, 7:53:46 PM, Eric Kolve wrote:
> Currently SpamCopURI checks both the 2nd and 3rd level domain regardless
> of the TLD.  I believe SA 3.0 does a little better job of this.

Sounds good.  That should catch everything with few false
positives, since we're filtering out most ccTLDs on the data
side and not too many get reported in the first place.

Jeff C.



SURBL web site updated, Letter to Redirectors added

2004-10-31 Thread Jeff Chan
I've udpated the SURBL web site to use frames and have freshened
the content slightly.   Please let me know if you spot any broken
links, etc.

  http://www.surbl.org/

Also added "An Open Letter To Operators Of Redirection Sites"
in which we try to appeal to redirection sites to deny their
services to spam URI domains (e.g., spammers' web sites).
Redirection sites may become an increasing problem if we're
successful in blocking spams with their sites directly linked.

  http://www.surbl.org/redirect.html

Comments, revisions, questions, suggestions on any of that are
welcomed.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Bill Stearns' sa-blacklist available as SURBL: ws.surbl.org

2004-10-31 Thread Jeff Chan
I probably should have introduced this second SURBL list
that can be used together with or in place of sc.surbl.org
before mentioning that its name was changing from sa.surbl.org
to ws.surbl.org.  :-)  Note that the two lists have different
data sources, so strictly speaking one is not a replacement for
the other.  They're two different lists.  sc uses URI domains
from SpamCop reports.  The data source for ws data is described
below.  Both lists have merits and we'd encourage you to consider
trying both. 

Here's an announcement with the additional update that
we've changed the *sample rule names* for the ws list to use
"WS" instead of "SA":
__

  http://www.surbl.org/   (with some live links)

More SURBL lists

In addition to the first SpamCop URI-derived SURBL sc.surbl.org, we
are pleased to host another RBL compatible with the SpamCopURI or
URIDNSBL SpamAssassin plugins, or any other software that can
check message body domains against a name-based RBL. Data for the
second SURBL ws.surbl.org comes from the domains in Bill Stearns'
SpamAssassin blacklist: sa-blacklist. This is a large list of
spam domains, including those found in spam message body URIs.
Both ws.surbl.org and sc.surbl.org SURBLs can be used in the same
SA installation by using two sets of rules.

An SA 2.63 rule and score using SpamCopURI (but not the SpamCop
data!) looks like this: 

uri   WS_URI_RBL  eval:check_spamcop_uri_rbl('ws.surbl.org','127.0.0.2')
describe  WS_URI_RBL  URI's domain appears in spamcop database at ws.surbl.org
tflagsWS_URI_RBL  net

score WS_URI_RBL  3.0

An SA 3.0 rule and score using URIBL's urirhsbl looks like this:

urirhsblURIBL_WS_SURBL  ws.surbl.org.   A
header  URIBL_WS_SURBL  eval:check_uridnsbl('URIBL_WS_SURBL')
describeURIBL_WS_SURBL  Contains a URL listed in the WS SURBL blocklist
tflags  URIBL_WS_SURBL  net

score   URIBL_WS_SURBL  3.0

More details about ws.surbl.org are available in the section
"Additional SURBLs for spam URI testing" (copied below).

Please note that the name of this list is being changed from
sa.surbl.org to ws.surbl.org. If you were using the old name in
your rules please update them to the new name. 

...

Additional SURBLs for spam URI testing

Additional SURBLs that list domains occurring in spam message
bodies may be used with the same routines that use the
sc.surbl.org RBL.

sa-blacklist available as RBL: ws.surbl.org

In cooperation with Bill Stearns, SURBL is making his
sa-blacklist SpamAssassin blacklist available as the RBL
ws.surbl.org. It can be used in the same way as sc.surbl.org, for
example by adding urirhsbl and SpamCopURI rules as described in
the Quick Start section at the top of this document. Like sc,
ws.surbl.org is available through DNS and, for large-volume mail
servers, as rsynced BIND and rbldns zone files. Raymond
Dijkxhoorn has graciously agreed to host the ws.surbl.org zone
files from his rsync server along with sc.surbl.org's. Please
contact him at [EMAIL PROTECTED] for rsync access. 

Both sc and ws RBLs can be used in the same installation. The
choice of using either or both or none is yours. Their data
differs somewhat, and we'll try to briefly describe and link some
of the differences here. Bill's list is rather large at about
9600 domains. It consists of domains found in spam message body
URIs and some spam sender and spam operator domains. Given that
the former are more relevant to isolate these days, most of the
recent additions to Bill's list have been URI domains. Those are
also the domains most relevant for use with the message body
checking approach which we propose throughout this site. 

The data in sa-blacklist and therefore ws.surbl.org differ from
the SpamCop URI report data described above in that the list is
about ten times larger, more stable, and may have a slightly
higher false positive rate. Bill's policy for inclusion and
cleaning of the sa-blacklist is quite sound, however, so folks
should feel comfortable giving this list a try in addition to the
sc list. ws may currently detect some spam that sc misses, and
vice versa, but it's worth mentioning that the current sc is a
working prototype and that we expect the performance of sc to
improve as we tune the sc data engine further. sc just got out of
the gate, yet it already has some worthy competition in ws.
Thanks Bill! 

Because ws is larger and more stable, the zone files for it gets
a six hour TTL compared to 10 minutes for sc. Due to the
differences between the time scales, sizes, and data sources of
ws and sc, we probably won't be offering a combined ws plus sc
list. For example it would be difficult to say what TTL a merged
list should get, and you probably would not want a megabyte plus
BIND zone file refreshing every 10 minutes. For those using
rsynced zone files that would probably not be an issue, but for
those using BIND, the DNS traffic quite well could be.

We encourage you to give ws.surbl.org a try.

Please note t

Re: FP from tinyfont rules

2004-10-31 Thread Jeff Chan
On Friday, April 16, 2004, 4:35:28 PM, Loren Wilton wrote:
> How clever!  They go OUT OF THEIR WAY to make their message
> look like spam, advertizing an anti-spam tool! 

> I think the cure for this one may be complaints to Hotmail.  :-(

Tell hotmail that using tiny fonts is going to get their user's
messages massively blocked.  They may not be able to ignore that.

Jeff C.



Re: Test Received headers using CIDR addresses rather than regexps

2004-10-31 Thread Jeff Chan
On Friday, April 16, 2004, 12:49:07 PM, Matt Kettler wrote:
> DNSBLs are also inherently by far more scaleable than static SA rules. Why?
> because you don't need to store the entire database on your local machine.

I agree, but...

> Witness the overhead of SURBL vs BigEvil.

In some fairness BigEvil represents a much larger dataset than
sc.surbl.org does currently.  (sc.surbl.org has about 500 of the
most reported SpamCop URI domains while BigEvil probably contains
the equivalent of thousands of domains.)

A better comparison if someone wanted to do two performance tests
might be to compare Bill Stearns' sa-blacklist in SA rule form
versus essentially his same data turned into a name-RBL in the
SURBL ws.surbl.org.  In particular using an RBL moves the storage
of the spam domain data out of SA memory and into the local name
service cache.

Speaking of which, I meant to properly announce the availability
of Bill's list as an RBL

Jeff C.



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Raymond Dijkxhoorn
Hi!
Hello SpamAssassin Users,
I'm pleased to announce a new type of RBL for blocking messages
based on spam domains contained in message bodies called SURBL.
Unlike other RBLs, the Spam URI RBL (SURBL) is not used to block
spam server IP addresses, but instead to block messages based on
Ouch, seems Jeff has problems with his setup. This is really old mail.
Bye,
Raymond.


Fwd: [SURBL-Discuss] Fix for the "syntax error at /etc/pamassassin/spamcop_uri.cf" errors

2004-10-31 Thread Jeff Chan
This is a forwarded message
From: Daniel Patterson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Date: Tuesday, April 13, 2004, 7:57:27 PM
Subject: [SURBL-Discuss]  Fix for the "syntax error at 
/etc/pamassassin/spamcop_uri.cf" errors

===8<==Original message text===
Hello all,

  This is a quick note to those who are getting the following
  errors under 2.63:

Failed to compile URI SpamAssassin tests, skipping: ^I(syntax error at 
/etc/spamassassin/spamcop_uri.cf, rule SPAMCOP_URI_RBL, line 1, near "eval:" 
syntax error at /usr/share/spamassassin/20_uri_tests.cf, rule URI_OFFERS, line 
175, near "; }" )

  The cause of this error is the perl module loader path being set
  such that your new, shiny SpamCopURI classes are not replacing
  their original counterparts in the original SA distribution.
  I'm running Debian, and I noticed, in /usr/bin/spamassassin and 
  /usr/sbin/spamd the following line:

use lib '/usr/share/perl5';

  The actual content of this line will vary depending on your system,
  but if you're experiencing the above error, this is the source
  of your troubles.

  To fix it, I add the following immedietly after the previous
  "use lib" line:

use lib '/usr/share/perl5';
use lib '/usr/local/share/perl/5.8.3';

  This prepends /usr/local/share/perl/5.8.3 to the module search path,
  which will make the newly installed Spamassassin modules be loaded
  in preference to those found elsewhere on your system.

  You could possibly also remove the "use lib" line altogether, but I
  haven't tried that.

daniel
___
Discuss mailing list
[EMAIL PROTECTED]
http://lists.surbl.org/mailman/listinfo/discuss

===8<===End of original message text===



Re: Likely new high volume spam - SPERMAMAX

2004-10-31 Thread Jeff Chan
On Wednesday, April 14, 2004, 3:31:22 AM, Kai Schaetzl wrote:
> Alton Danks wrote on Tue, 13 Apr 2004 19:53:52 -0400:

>> In anticipation of this being the beginning of a stream like we've seen with
>> similar products I've added these rules. Score to taste.

> It also shows another trend which is still quite slim but I think is growing 
> since earlier this year: spam with normal words you can read.

Yeah I've noticed that too.  Maybe it means the deobfuscation
efforts are having an effect.

Jeff C.



Re: [SURBL-Discuss] What list does ws.surbl.org represent?

2004-10-31 Thread Jeff Chan
On Tuesday, April 13, 2004, 8:58:30 AM, William Stearns wrote:
> On Tue, 13 Apr 2004, Charles Solomon wrote:
>> One more question.  Since both lists contain the same domains, and this
>> list is now published at ws.surbl.org, couldn't I implement something
>> like the following in addition to the SpamCopURI to get the benefits of
>> SA-Blacklist in the FROM headers?
>> 
>> header RCVD_IN_WSSURBLeval:check_rbl('ws.surbl.org', 
>> '127.0.0.2')
>> describe RCVD_IN_SORBSWS-SURBL: sender is listed in 
>> ws.surbl.org
>> tflags RCVD_IN_SORBS  net

> I honestly don't know the answer to that.  Does anyone know if 
> that will successfully check the sender domain?

The code that's using SURBLs generally should only be looking at
message bodies, so it should only match on spam domains in the
message body URIs and not header info. 

That said, since Bill's data which ends up in ws has some sender
domains in it, using ws.surbl.org in conventional RBL code that
looks at message headers such as sender domain may get some matches.

However attempting to match sender domains would give far fewer
(near zero) hits with sc.surbl.org whose source data comes from
message bodies only.

In other words we're trying to use SURBLs on message body URIs
and not against message headers, which would be more like using
a regular RBL.  As I understand it Bill is also focusing on
adding URI domains to his list lately, which is a good match 
for the intended use of SURBLs.

Hope this helps,

Jeff C.



Name of sa.surbl.org being changed to ws.surbl.org

2004-10-31 Thread Jeff Chan
Hello SURBL users,
Please note that the name of the SURBL derived from Bill Stearns'
sa-blacklist is being changed from sa.surbl.org to ws.surbl.org .
If you were using the old name in your rules or configs please
update them to the new name.

We will keep DNS queries up on the old name for a week or so but
will probably drop them after that.  This is only a name change
for that list.  Functionality should remain the same.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.jeffchan.com/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 4:58:08 PM, David Funk wrote:
> This looks like a poor test choice on the part of the authors.
> It's failing because that addres "[211.147.224.30]" is no longer in
> the sc.surbl.org list.

> They should use the specific "test" address that is guaranteed
> to be there. Try rewriting that test line to use:
> ['http://127.0.0.2/baddy']

> and see if that fixes it.

Thanks Dave,
We've added a numeric testpoint to the SURBL zone files at Eric
Kolve's request, and I have a hunch he is updating the test suite
to use it as we speak:

Name: 2.0.0.127.sc.surbl.org
Address:  127.0.0.2

We announced this addition on our announce list.  :-)

  http://lists.surbl.org/

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL Poisoning?

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 2:16:23 PM, Pete McNeil wrote:
>>However if *any* of the domains in a spam are on an SURBL list, the
>>entire message will get tagged as spam (for mail servers using
>>SURBL of course).  The more spam domains the spammers add,
>>including their spamming rivals, the better our chance of tagging
>>the message as spam.
>>
>>Score one for the good guys if they try to take out their
>>competitors this way.

> Well, actually, that's just what they hope. The point of this attack is 
> that the spam they send is meaningless and un-targeted - so only the rivals 
> get damaged. Their own targets are not present in the message.

Got it.  You're right.  I was assuming they were adding their
competitors along side their own.

> There's no hazard in this for the white-hats except for more work. The 
> targeted spammers quickly select new domains and step up the rate at which 
> they generate and use those domains. This results in more spam and more 
> filtering work. A hint (and a helpful tactic) is that the IP targeted by 
> the domain tends to remain intact... so it is helpful to capture not only 
> the target domain/link but also the IP at the end of it... If you see the 
> IP again in a short period then the link attached is likely to also be a 
> spam indicator... This can be helpful in closing networks of rotating IPs 
> and domains.

Yep!  That's exactly what I'm implementing in my new data engine:
watch for persistent IP blocks occurring in spam URIs and make the
new incoming spam URI reports resolving into those blocks
easier to add to the bad guy domain list through a lowered threshold.
I think it's going to work very well.

> When they mix in randomly trolled legitimate links it becomes a bit of a 
> challenge though... I've seen messages with up to 30 links all disguised as 
> something useful - with 30+ % actually being legitimate targets & potential 
> collateral damage. These messages tend to be modeled after eZine type 
> newsletters that tack on a large list of references to "in-depth" story 
> versions.

Right.  The savior for sc.surbl.org would be if the SpamCop users
uncheck most of those.  Not sure if SpamCop also internally
whitelists.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL Poisoning?

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 8:18:39 AM, Pete McNeil wrote:
> * Spammers are definitely stuffing legitimate urls, domains, and fragments
> into spam and the trend is increasing.

> * White and partial masking rules are numerous and required to avoid false 
> positives. In  practice (as you point out) the list becomes large.

sc.surbl.org has the advantage of the many SpamCop users
reviewing their submissions for legitimate looking domains and
unchecking them before they click Submit.  So far it seems to
help a lot.

> PS: As bizarre as it seems, I have seen spammers cluster bomb rivals by
> putting out dummy spam that contains links to rival spam targets... The 
> dummy goes nowhere and the rival links seem unrelated as if the purpose of 
> the message is to increase the filtering rate on the apparent rival 
> links/domains.

However if *any* of the domains in a spam are on an SURBL list, the
entire message will get tagged as spam (for mail servers using
SURBL of course).  The more spam domains the spammers add,
including their spamming rivals, the better our chance of tagging
the message as spam.

Score one for the good guys if they try to take out their
competitors this way.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



SpamCopURI install questions

2004-10-31 Thread Jeff Chan
May I suggest directing SpamCopURI install questions to the SURBL
discussion list:

  [EMAIL PROTECTED]

  http://lists.surbl.org/mailman/listinfo/discuss

We have many people who can probably answer these questions
there.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL & BigEvil

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 8:59:21 AM, Burnie Burnie wrote:
> Paul Barbeau <[EMAIL PROTECTED]> wrote:

>> There still might be a place for BigEvil in this new world of SURBL.  I find
>> there are a number of domain in BigEvil (and my own MidEvil) that are not
>> yet in the spam cop and therefore not in this service.  Because of this
>> there might still be a place for this type of list as you could call it more
>> cutting edge.

> Another issue to SURBL vs BigEvil:

> If the spammer uses redirectors for urls, AFAIK only BigEvil
> will match those. 
> SURBL will only check the hostname of the redirector.
> I.e. http://drs.yahoo.com/covey/parr/*http://spammer.address/ 

> Perhaps this (and tinyurl, etc.) is an issue to be discussed?

Here's a finer point to add the discussion.  SpamCop itself
does seem to disambiguate (most of) the redirection.  If
someone is using a redirector to send traffic to spamdomain.com,
SpamCop seems to detect and resolve if correctly to spamdomain.com
most of the time.  So the data that's used as input to sc.surbl.org
already has redirectors correctly handled to some extent.

The SA code using sc.surbl.org such as SpamCopURI and urirhdbl
may or may not be as capable of detecting and resolving the
redirections.  I can't really say for sure because I have
not reviewed that code.  My focus is on more the data side of
things.  Certainly it would be useful of the code handling
messages coming in from the wild were able to resolve
redirections fully, but I'm not sure that's currently the case.

This is why we do these projects openly: so other people can
add fixes, improve install scripts, add new features, etc.

> BTW: Currently my stats show that of all recognized spam
>  during the last ~85 hours (- 12919 spam)
> - 62.3% is in "sc.surbl.org"
> - 76.3% is in "sbl-xbl.spamhaus.org" (127.0.0.2)
> - 82.3% is in either/both of those
> -  2.4% were put "over the edge" because of those rules

> The last percentage is a bit low due to running bigevil,
> sa-blacklist.uri and quite a bunch of other rules.

Thanks for sharing the stats!  I hope to be able to increase
the spam detection rates significantly for sc.surbl.org when
I get back to coding ;-)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL Poisoning?

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 7:42:06 AM, Paul Barbeau wrote:
> that is a good point however there is the ability to whitelist domains in
> the CF file however this list could get large.

Yes, there is a per-installation whitelist (and blacklist)
available in SpamCopURI.  Hopefully we prevent its need by
preventing FPs from getting on the main lists to begin with.

The new data engine will be better in both regards hopefully.

Jeff C.
__

> As for if there is a manual
> review i have no idea however everything that is automated can get scare
> when people know that.

> Paul

>> -Original Message-
>> From: Jeff Koch [mailto:[EMAIL PROTECTED]
>> Sent: April 12, 2004 10:21 AM
>> To: [EMAIL PROTECTED]
>> Subject: SURBL Poisoning?
>> 
>> 
>> 
>> I have not tried SURBL yet but I'm concerned about what would 
>> happen if 
>> spammers started loading their emails up with links to 
>> legitimate websites 
>> - like paypal - ebay - chase - expedia, etc. Are you doing a 
>> manual review?

-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: How to report FPs to Spamcop?

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 12:07:19 PM, Alton Danks wrote:
> I'm running into some FP's with the RCVD_IN_BL_SPAMCOP rule. We've also just
> added SURBL and I wouldn't be surprised to see more. I don't see a good way to
> report FP's on the spamcop.net site. Does anyone know how to report them so 
> the
> SA rules are useful?

Hi Al,
I don't speak for SpamCop, but it appears SpamCop does not have a
removal procedure for their IP address blacklist bl.spamcop.net:

  http://www.spamcop.net/bl.shtml

> SpamCop Blocklist Details & Description
> 
> This blocking list is somewhat experimental. This system and
> most other spam-filtering systems should not be used in a
> production environment where legitimate email must be
> delivered. Many end-users and administrators have decided that
> risking the loss of legitimate email is worth the benefit of
> blocking most spam. As a result, this list is now used widely
> and it's reputation for blocking spam while reducing the risk
> of erronious blocking is growing. 
>
> However, it should be noted that SpamCop is aggressive and
> often errs on the side of blocking mail - users should be
> warned and given information about how their mail is filtered.
> Ideally they should have a choice of filtering options. Many
> mailservers can operate with blacklists in a "tag only" mode,
> which is preferable in many situations. 

  http://www.spamcop.net/fom-serve/cache/298.html

> How can I be de-listed
> 
> If you have stopped the spam, you will be delisted
> automatically a maximum of 48 hours from the most recent spam
> complaint. Please do not write asking to be delisted sooner
> unless you believe there is some error in SpamCop's logic.
> Systems known to send spam are listed for up to 48 hours even
> if there is a "resolution" to the "issue". Often, what happens
> once can happen again. 

They simply come off the list after a while if there are no IP
reports.

Regarding SURBL, sc.surbl.org, which is derived form the SpamCop
URI domain (not IP) data has been tested with a very low false
positive rate. (Remember also that SURBL lists should *not be
used like other RBLs*, but applied to domains found in message
body URIs instead.)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL Poisoning?

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 7:21:23 AM, Jeff Koch wrote:
> I have not tried SURBL yet but I'm concerned about what would happen if
> spammers started loading their emails up with links to legitimate websites 
> - like paypal - ebay - chase - expedia, etc. Are you doing a manual review?

I am doing a manual review, but need to find a way to share that
load.  Perhaps we can set up another discussion list and I can
gateway the new additions to the list to it.

Marc Perkel gave me access on his phpbb but I have not had time
to work with it.   A bulletin board with voting mechanism and a
way to feed new domains into threads under a "spam or ham" board
might be ideal. Anyone who could help implement that would be
gratefully thanked.

There is an internal whitelisting mechanism in sc.surbl.org that
prevents legitimate domains like ebay, etc. from ever getting
added to the list, and it seems quite effective.  Anyone who has
whitelists of common fp popular domains to share would be greatly
appreciated also. 

However the primary defense against FPs is in the care SpamCop
users take in *unchecking* legitimate looking URIs when
they submit their reports.  That also seems pretty effective
at preventing FPs, as the whitelist is small yet useful.

A log of the domains as they get added to the list can be found
at:

  http://www.surbl.org/top-sites-domains.new.log

If anyone spots any FPs, please forward them to me.

Note also that version 2 of the data engine should have
better spam detection and a similar low false positive rate,
all automatically.

Hope this helps,

Jeff C.



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 5:25:55 AM, Richard Humphrey wrote:
[snip]
> This looks like a great service!  However, after downloading, unzipping
> and untarring, I got the following error doing perl Makefile.PL:
>   Warning: prerequisite LWP 0 not found
[snip]

> I am having the same problem. Can anyone shed light on what LWP 0 is and 
> where to get it?

> Richard

Hi Richard,
lwp is short for libwww-perl a "Library for WWW access in Perl".
It's not called if the RBL method in SpamCopURI is used (which
is now the default behavior in 0.09), but apparently LWP needs
to be installed for things to be happy:

  http://search.cpan.org/~rse/lcwa-1.0.0/lib/lwp/lib/LWP.pm

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL & BigEvil

2004-10-31 Thread Jeff Chan
On Monday, April 12, 2004, 3:09:41 AM, Paul Barbeau wrote:
> There still might be a place for BigEvil in this new world of SURBL.  I find
> there are a number of domain in BigEvil (and my own MidEvil) that are not
> yet in the spam cop and therefore not in this service.  Because of this
> there might still be a place for this type of list as you could call it more
> cutting edge.

Hi Paul,
I was just about to ask if you wanted to help us get MidEvil into
an SURBL along with BigEvil.  We've asked Chris to see if his
data sources would allow the BigEvil domains to go out in an RBL
and are hoping (fingers crossed!!) to hear that he can get a
green light. 

There are SpamAssassin administrators hoping and waiting around
the world to be able to use these good lists in RBL form and
free up some memory from their SA servers. (By using them as
SURBLs instead of SA rule sets, the data is cached in DNS zone
files on their local name service.)

All we need is a feed of a list of domains and our scripts can
turn them into (SU)RBLs, and our network of DNS and rsync servers
can get the RBL zone files out.  Probably ideally we would combine
MidEvil and BigEvil if you and Chris could go for that.

We could also make a separate, combined list of *Evil and Bill
Stearns' sa-blacklist which we already have in RBL form, pending
agreement on the SURBL name for Bill's list.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: spamcopuri install troubles...

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 12:50:23 PM, Martin McWhorter wrote:
> Burnie wrote:
>>Some time ago, Martin McWhorter <[EMAIL PROTECTED]> wrote:
>>>When I do a perl Makefile.PL I get --
>>>Can't locate Mail/SpamAssassin.pm in @INC (@INC contains:

>>>Well of course it cant find it there, its at--
>>>/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin.pm

>>I'm not subscribed to this list, so I reply to you directly
>>
>>what does 'ls -l /usr/bin/perl*' list?
>>
>>could you do '/usr/bin/perl5.6.1 Makefile.PL'?

> No, it lists --
> /usr/bin/perl   /usr/bin/perlbug  /usr/bin/perldoc
> /usr/bin/perl5.8.0  /usr/bin/perlcc   /usr/bin/perlivp

> I was able to do a --
> perl -I/usr/lib/perl5/site_perl/5.6.1/ Makefile.PL

> But when I did a 'make test', it failed every test.

What error message do you get?

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Announcing SURBL mailing lists

2004-10-31 Thread Jeff Chan
Hello SA Users and Developers,
Raymond Dijkxhoorn has very kindly set up three Mailman-powered
mailing lists for folks using or interested in SURBL:

  http://lists.surbl.org/

Announce  - SURBL Announcement list [READONLY]

  This is an announcement-only list for SURBL users. Message
  volume should be very low, at probably fewer than one a month
  when things get stable. Everyone interested in the status of
  SURBL is encouraged to join.

Discuss   - SURBL Discussion list

  This is a discussion list for all SURBL users. Typically they
  will be administrators of mail servers with message body URI
  extraction and domain matching capabilities, for example using
  SpamAssassin, in order to compare the domains against an SURBL.
  Of key concern is the sharing of any *false positive domains so
  that they can be whitelisted quickly to prevent future false
  positives*. Hopefully the message volume will be low, though
  reasoned questions and discussion about the project are
  welcomed.

Zones - URBL Zonelist (DNS/Rsync admins) - RESTRICTED

  This is a mailing list for administrators of SURBL secondary
  DNS servers and zone file rsync server administrators. It should
  be considered mandatory for those administrators. Messages will
  be very infrequent, such as the rare addition of a new RBL zone.
  Discussion of issues of concern to all SURBL DNS/rsync
  administrators is ok if it's significant and infrequent.

I'd encourage anyone who is using SURBL to sign up for the
announcement list, and anyone with suggestions or questions
about the approach to ask them on the discussion list.

DNS and rsync server admins have already been contacted about
the zones list.


**Note, we are seeking more secondaries for the SURBL zone files.**

Anyone who would like us to help us secondary or rsync serve
the SURBL zone files is encouraged to contact me for BIND-
compatible secondary DNS info at [EMAIL PROTECTED] or
Raymond for rsync access to BIND or rbldns zone files at
[EMAIL PROTECTED] .


Cheers, and thanks again to Raymond for setting up the lists!

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL support in SA 2.63 problems? (syntax error)

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 7:39:32 AM, Dan Stetser wrote:
> I did the same upgrade of perl. Spamassassin started as the stock
> Fedora rpm, then did a -Uvh with the 2.63 rpm.

> my /usr/bin/spamassassin has:

> use lib '/usr/lib/perl5/vendor_perl/5.8.1';  # substituted at 'make' time

> Is this what was needs to be substituted to reflect 5.8.3?

> Looked around and I have stuff scattered all over:
> /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/PerMsgStatus.pm
> /usr/lib/perl5/vendor_perl/5.8.1/Mail/SpamAssassin/PerMsgStatus.pm
> /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Conf.pm
> /usr/lib/perl5/vendor_perl/5.8.1/Mail/SpamAssassin/Conf.pm

> I can see this being a problem especially w/3.0 around the corner.
> What's the easiest way to straighten this out

Hi Dan,
One answer is that 3.0 now has support for message body URI
domain checking via a different plugin URIDNSBL aka URIBL,
specifically through the command Justin Mason recently added for
this purpose called "urirhsbl".  That's been tested with SURBL
and it works great.

  
http://spamassassin.org/full/3.0.x/dist/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm

In fact it's already used in the sample config file for uribl:

  http://spamassassin.org/full/3.0.x/dist/rules/25_uribl.cf

urirhsblURIBL_SC_SURBL  sc.surbl.org.   A
header  URIBL_SC_SURBL  eval:check_uridnsbl('T_URIBL_SC_SURBL')
describeURIBL_SC_SURBL  Contains a URL listed in the SC SURBL blocklist
tflags  URIBL_SC_SURBL  net

> I do appreciate all your help,
> Dan

Likewise do I appreciate all the help offered by everyone here.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL support in SA 2.63 problems?

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 3:57:16 PM, Walker Aumann wrote:
>> # spamassassin --lint
>> Failed to compile URI SpamAssassin tests, skipping:
>> (syntax error at /etc/mail/spamassassin/spamcop_uri.cf, rule 
>> SPAMCOP_URI_RBL, line 1, near "eval:"
>> syntax error at /etc/mail/spamassassin/bigevil.cf, rule 
>> BigEvilList_23, line 
>> 975, near ";
>> }"
>> )

> I had the exact same problem on a Debian system.  The SpamCopURI module
> was installed somewhere other than where the rest of your SpamAssassin
> perl modules was installed.  I found three .pm file under /usr/local/lib.
> Once I moved them to /usr/share/perl5/Mail/SpamAssassin, it worked fine.
> Note that these paths may be different on your machine. 

Without trying to step on any toes, perhaps some kind person
could help with improved installation scripts?

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 8:59:15 AM, Sandy S wrote:
> This looks like a great service!  However, after downloading, unzipping and
> untarring, I got the following error doing perl Makefile.PL:
> Warning: prerequisite LWP 0 not found

> I may be dense, but what's LWP 0 and where do I find it?  I browsed CPAN and
> found a bunch of files which contain LWP, but I have no idea which one it's
> talking about here!

> We're running Perl 5.8.3 on BSDI.  Thanks for any help you can give me!

Hi Sandy,
lwp is short for libwww-perl a "Library for WWW access in Perl".
It's actually no longer used in SpamCopURI IIRC.

  http://search.cpan.org/~rse/lcwa-1.0.0/lib/lwp/lib/LWP.pm

OTOH, since it's not called by SpamCopURI when using the RBL
method, perhaps it will run regardless.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 9:53:37 AM, Adam Denenberg wrote:
> two questions for this.

> 1) Do you need to subscribe to spamcop to use this ?

No, though I would certainly encourage anyone to use SpamCop to
report spams, including the parsing and reporting it provides as
input for SURBL.  We then get the reported domains from SpamCop,
but that part is a one way street.  SURBL is a consumer of the
SpamCop data.

> 2) Has anybody started running this yet, and if so any feedback on
> results (stats would be great)?

Results so far are quite positive, modulo some of the
installation hurdles some folks have reported here.

We expect the sc.surbl.org hit rates to improve when I can get
back to working on the next version of the data engine...  :)

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL in SA 2.63 problem?

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 11:15:14 AM, Matthew Thomas wrote:
> debug: querying for onlinedeals.com.sc.surbl.org
> debug: Query failed for onlinedeals.com.sc.surbl.org

> Does this mean that the uri wasn't found in the list, or that I failed to
> contact the surbl?

Hi Matt,
In case it helps, onlinedeals.com, is not currently on the list,
so it does not resolve:

> ns1: [101]% nslookup onlinedeals.com.sc.surbl.org
> Server:  localhost.freeapp.net
> Address:  127.0.0.1
> 
> *** localhost.freeapp.net can't find onlinedeals.com.sc.surbl.org: 
> Non-existent
> host/domain

> % fgrep onlinedeals.com top-sites-domains
> %

Probably it should be on the list, and perhaps it will be in
the next rev of the engine which will have a loner memory.  Or
you may find it on other lists we may soon host.  FWIW I see no
recent SpamCop reports about the domain onlinedeals.com.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 12:09:12 PM, Shaun Erickson wrote:
> [EMAIL PROTECTED] wrote:

>> I have had quite a few come in this morning that Chris' rules hit, but the
>> new SURBL did not hit..  I'll keep both running for a while, and see how
>> things shake out!

> You're right - I put BigEvil back in place, and it's flagging stuff that 
> SURBL isn't yet (but I'm sure will, later). So I'll run both for now, too.

Bear in mind that we're not done tuning SURBL's use of our
SpamCop data yet.  Right now it is pretty conservative to prevent
FPs.  We have some strategies in the next version of the engine
which should increase the hit rate without much increase in FPs:

'After watching the data for a while I think a longer general
retention of say 10 days might be a good idea to catch reports
over more than a week.  For known spam gang domains/name
servers/IPs we could make the retention a whole lot longer.
And domains that get dozens to hundreds of reports should
probably also be watched a lot longer using a longer retention.
Domains that get reported most probably deserve the most
attention through longer retention and perhaps a lower
inclusion threshold.

We would get external "known bad guys" data from (SBL) in
order to adjust thresholds and expirations, but the inclusion of
a domain in SURBL would still be triggered by SpamCop URI
reports.  But the trigger point would be lower for "bad guys".'

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Need SURBL test message

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 11:52:57 AM, ITReading ITReading wrote:
> Hello all,
> I think I may have the SURBL working on ActivePerl 5.61 on Win32 w/ SA 2.63.
> I was hoping someone could forward a message to me that will
> trigger the rule in "spamcop_uri.cf" so that I may test it
> before placing it in production. 

Hi Charles,
There are a couple test points hardwired into the zone files:

> test.surbl.org  IN  A   127.0.0.2   ; permanent test point
> test.sc.surbl.org   IN  A   127.0.0.2   ; permanent test point

So any message containing a URI like:

  http://test.notdeliberatelybroken.surbl.org

(with the ".notdeliberatelybroken" removed) should get caught.

HTH,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL support in SA 2.63 problems?

2004-10-31 Thread Jeff Chan
Hello Daniel,

On Friday, April 9, 2004, 2:13:49 AM, Daniel Kleinsinger wrote:
> I'm not "of" MailScanner, that sounds bad I'm just a simple
> MailScanner user who gives thanks everyday to Julian Field, the 
> wonderful developer of MailScanner.  For what it's worth, I just 
> installed .09 and it's working for me.  I'm using MS 4.29.7 with SA 2.63 
> on RH 8 running perl 5.8.0.  It went
> perl Makefile.PL
> make
> make test
> make install
> cp rules/spamcop_uri.cf /etc/mail/spamassassin/
> spamassassin --lint
> service MailScanner reload

> No errors at any step...

> Daniel

Thanks for the feedback Daniel; pardon my goof.  :)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL support in SA 2.63 problems?

2004-10-31 Thread Jeff Chan
On Friday, April 9, 2004, 2:01:50 AM, Bob Mortimer wrote:
> Oops sorry - make test gets this:

> t/bad..ok
> t/blacklistok
> t/cacheok
> t/dnsrbl...ok
> t/good.ok
> t/ipaddressok
> t/longhostname.ok
> t/mailto...ok
> t/merge_urls...ok
> t/shorthostnameok
> t/spamcop_fetchok 3/6# Failed test (t/spamcop_fetch.t at line 48)
> # Failed test (t/spamcop_fetch.t at line 50)
> # Looks like you failed 2 tests of 6.
> t/spamcop_fetchdubious
> Test returned status 2 (wstat 512, 0x200)
> Scalar found where operator expected at (eval 153) line 1, near "'int'  
> $__val"
> (Missing operator before   $__val?)
> DIED. FAILED tests 4-5
> Failed 2/6 tests, 66.67% okay
> t/spamcopuri...ok
> t/whitelistok
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ---
> t/spamcop_fetch.t2   512 62  33.33%  4-5
> Failed 1/13 test scripts, 92.31% okay. 2/63 subtests failed, 96.83% okay.
> make: *** [test_dynamic] Error 255

Hi Bob,
Eric has fixed the test suite in SpamCopURI 0.09.

  http://sourceforge.net/projects/spamcopuri/

If the SourceForge mirrors don't all have it yet, here's a URL
that should work:

  
http://umn.dl.sourceforge.net/sourceforge/spamcopuri/Mail-SpamAssassin-SpamCopURI-0.09.tar.gz

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL support in SA 2.63 problems?

2004-10-31 Thread Jeff Chan
Hmm, I thought it may be a MailScanner/SA/SpamCopURI
compatibility issue of some kind.  But then Mike Zanker
gave 0.09 a try with MailScanner and had no problems.

I'm not sure if Daniel Kleinsinger of MailScanner is on this list,
so I'll cc him for a heads up.

Daniel, can you give SpamCopURI 0.09 a try and see if you can
duplicate this error?

  http://sourceforge.net/projects/spamcopuri/

Jeff C.
__

On Thursday, April 8, 2004, 11:41:18 PM, Damian Mendoza wrote:
> I also have the exact same errors - using mailscanner as well with SA
> 2.63.

> Damian 

> -Original Message-
> From: Dan Stetser [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, April 08, 2004 10:05 PM
> To: [EMAIL PROTECTED]
> Cc: Jeff Chan
> Subject: Re: SURBL support in SA 2.63 problems?

> Running SA version 2.63

> Mail-SpamAssassin-SpamCopURI-0.09  did a make/test/install with no
> errors but then when I try and use spamcop_uri.cf in my
> /etc/mail/spamassassin dir and lint it I get:

> Failed to compile URI SpamAssassin tests, skipping:
>  (syntax error at /etc/mail/spamassassin/spamcop_uri.cf, rule
> SPAMCOP_URI_RBL, line 1, near "eval:"
> syntax error at /etc/mail/spamassassin/bigevil.cf, rule BigEvilList_135,
> line 1823, near "; }"
> )

> I move bigevil out of the SA dir for another lint, and midevil throws
> syntax errs now.
> I move all aux cf's out of  .../spamassassin cept for local.cf and now
> /usr/share/spamassassin/20_uri_tests.cf tosses lint errors.

> Is there something that needs doing on 2.63 besides the SpamCopURI
> install and copying the cf /rules over.

> I use MailScanner, so don't deal with spamd/c, don't know if that makes
> a difference...

> Thanks,
> Dan

> At 02:48 PM 4/8/2004, Jeff wrote:
>>Hi Steve,
>>Eric Kolve has just issued a fix for the test suite in his 0.09 version

>>of SpamCopURI:
>>
>>http://sourceforge.net/projects/spamcopuri/
>>
>>Please give it a try and let us know what kind of results you get.  :D




-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: SURBL support in SA 2.63 problems?

2004-10-31 Thread Jeff Chan
On Thursday, April 8, 2004, 10:05:25 PM, Dan Stetser wrote:
> Running SA version 2.63

> Mail-SpamAssassin-SpamCopURI-0.09  did a make/test/install with no errors
> but then when I try and use spamcop_uri.cf in my /etc/mail/spamassassin dir
> and lint it I get:

> Failed to compile URI SpamAssassin tests, skipping:
>  (syntax error at /etc/mail/spamassassin/spamcop_uri.cf, rule 
> SPAMCOP_URI_RBL, line 1, near "eval:"
> syntax error at /etc/mail/spamassassin/bigevil.cf, rule BigEvilList_135, 
> line 1823, near ";
> }"
> )

> I move bigevil out of the SA dir for another lint, and midevil throws 
> syntax errs now.
> I move all aux cf's out of  .../spamassassin cept for local.cf and
> now /usr/share/spamassassin/20_uri_tests.cf tosses lint errors.

> Is there something that needs doing on 2.63 besides the SpamCopURI install
> and copying the cf /rules over.

> I use MailScanner, so don't deal with spamd/c, don't know if that makes a 
> difference...

> Thanks,
> Dan

Hi Dan,
Not sure what's causing the error you're seeing, but I do know
that other MailScanner folks are using SpamCopURI successfully.
Below is a copy of an announcement from the MailScanner mailing
list about it.

If the problem is specific to MailScanner you may want to
check with some of the users there.

If the issue is particular to using SpamCopURI, perhaps Eric
Kolve can offer some suggestions.

Jeff C.
__

> Date: Tue, 06 Apr 2004 11:48:44 -0700
> From: Daniel Kleinsinger <[EMAIL PROTECTED]>
> To: MailScanner mailing list <[EMAIL PROTECTED]>
> Subject: New Plugin for SpamAssassin
>
> 
> For the last few days I've been using a new plugin to SpamAssassin I've 
> found, SpamCopURI (http://sourceforge.net/projects/spamcopuri/).  It 
> adds points to spam based on the list of spamvertised sites on 
> www.spamcop.net.  [...]  So the plugin
> (*not quite sure how it works, there's info on the websites about 
> dealing with randomized domain names and such) extracts the URIs from an 
> email and then checks them against the RBL. The RBL is populated as 
> follows (from the surbl.org website): "Scripts which power the database 
> and SURBL creation grab data from SpamCop's "Spamvertised Web Sites" 
> (http://www.spamcop.net/w3m?action=inprogress&type=www) web page every 
> couple minutes or so, then merge new entries and expire the data so that 
> it's never more than 4 days old."  I think of it as a BigEvil-type RBL.  
> Apparently it currently has about 400 records.
> 
> The installation is very simple.  It is a patch to SA 2.63.  Basically, 
> it copies a few files over the SA 2.63 versions of them and you add a 
> rules file to local.cf or /etc/mail/spamassasssin or wherever.  The 
> default rules score using the local cache method, you should disable 
> those rules (score 0, more info in quoted email below) and score the URI 
> rule appropriately (I score mine 3, same as bigevil). [...]
> For what it's worth, glancing through my
> logs I see quite a few emails that hit SPAMCOP_URI_RBL, but don't hit 
> any other blacklists.
> 
> I've found the plugin to be very effective.  It's been my forth most 
> effective rule and I haven't seen any false positives.  These are the 
> hitrates for the top positive SA rules on my smallish mailserver (~4000 
> email/day, 50-60% spam).  One thing I was surprised at was how effective 
> Bayes has become.  Judging from my results, if at all possible for your 
> config, everyone should be using Bayes.
> 
> rulespam hitrate
> BAYES_990.91598
> DCC_CHECK   0.61738
> RAZOR2_CHECK0.45551
> SPAMCOP_URI_RBL 0.37685
> RCVD_IN_BL_SPAMCOP_NET  0.36902
> RCVD_IN_SORBS   0.30930
> 1 or more BIGEVIL   0.28707
> RCVD_IN_SPAMHAUS_XBL0.25700
> RCVD_IN_DSBL0.20964
> RCVD_IN_DYNABLOCK   0.19646
> RCVD_IN_NJABL   0.16928
> RCVD_IN_SBL 0.16310



Re: rule regex help

2004-10-31 Thread Jeff Chan
On Thursday, April 8, 2004, 10:17:46 AM, Mike Schrauder wrote:
> We seem to be getting many through despite continual bayes training.
> They have one URI in them and here is my rule that is getting many but not 
> all of them as it seems spammers have an
> unlimited supply of goofy domain names registered.

> uri MIKES_URI_EVIL_DOM1  
> /(we3xe|dbs54d|qwas3da|ding17|qmiakemds|8005hosting)\.(com|biz|net)/i
> describe MIKES_URI_EVIL_DOM1 contains url for known med porn or cable spammer
> score MIKES_URI_EVIL_DOM1  4.6

Hate to sound like a broken record, but:

% egrep "we3xe|dbs54d|qwas3da|ding17|qmiakemds|8005hosting" surbl.bind

8005hosting.com IN  A   127.0.0.2
dbs54d.com  IN  A   127.0.0.2
ding17.biz  IN  A   127.0.0.2
qmiakemds.com   IN  A   127.0.0.2
qwas3da.com IN  A   127.0.0.2
we3xe.com   IN  A   127.0.0.2

SURBL's got your back.  ;)

Try it:

  http://www.surbl.org/

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
As Steve pointed out off list, the update has not propagated to
all the SourceForge mirrors yet.  Please check for the updated
version later at a mirror near you.  ;)

Jeff C.
__

On Thursday, April 8, 2004, 4:48:17 PM, Jeff Chan wrote:
> Hi Steve,
> Eric Kolve has just issued a fix for the test suite in his 0.09
> version of SpamCopURI:

>http://sourceforge.net/projects/spamcopuri/

> Please give it a try and let us know what kind of results
> you get.  :D

> Jeff C.
> __

> On Thursday, April 8, 2004, 4:29:58 PM, Steve Wakelin wrote:
>> Jeff,

>> Would dearly like to implement this however received the following

>> [EMAIL PROTECTED] Mail-SpamAssassin-SpamCopURI-0.09]# perl Makefile.PL
>> Checking if your kit is complete...
>> Looks good
>> Writing Makefile for Mail::SpamAssassin::SpamCopURI
>> [EMAIL PROTECTED] Mail-SpamAssassin-SpamCopURI-0.09]# make
>> cp lib/Mail/SpamAssassin/Conf.pm blib/lib/Mail/SpamAssassin/Conf.pm
>> cp lib/Mail/SpamAssassin/PerMsgStatus.pm
>> blib/lib/Mail/SpamAssassin/PerMsgStatus.pm
>> cp lib/Mail/SpamAssassin/SpamCopURI.pm
>> blib/lib/Mail/SpamAssassin/SpamCopURI.pm
>> cp bin/spamcopuri-update blib/script/spamcopuri-update
>> /usr/bin/perl "-MExtUtils::MY" -e "MY->fixin(shift)"
>> blib/script/spamcopuri-update
>> Manifying blib/man3/Mail::SpamAssassin::PerMsgStatus.3pm
>> Manifying blib/man3/Mail::SpamAssassin::Conf.3pm
>> Manifying blib/man3/Mail::SpamAssassin::SpamCopURI.3pm
>> [EMAIL PROTECTED] Mail-SpamAssassin-SpamCopURI-0.09]# make test
>> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
>> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>>  
>>  
>> t/bad..ok
>>  0
>> ...ok 7/10
   
>> t/blacklistok
>>   
>> t/cacheok
>> t/dnsrbl...NOK 1# Failed test (t/dnsrbl.t at line 16)
>> t/dnsrbl...NOK 2# Failed test (t/dnsrbl.t at line 18)
  
>> t/dnsrbl...NOK 5# Failed test (t/dnsrbl.t at line 25)
>> t/dnsrbl...ok 6/6# Looks like you failed 3 tests of 6.
>> t/dnsrbl...dubious
>> Test returned status 3 (wstat 768, 0x300)
>> Scalar found where operator expected at (eval 153) line 1, near "'int' 
>> $__val"
>> (Missing operator before   $__val?)
>> DIED. FAILED tests 1-2, 5
>> Failed 3/6 tests, 50.00% okay

>> t/good.ok
>> t/ipaddressok
>> t/longhostname.ok
>> t/mailto...ok
>>ok 7/7
>> t/merge_urls...ok
>> t/shorthostnameok

>> t/spamcop_fetchNOK 3# Failed test (t/spamcop_fetch.t at line 35)
>> # Failed test (t/spamcop_fetch.t at line 50)
>> t/spamcop_fetchNOK 5# Looks like you failed 2 tests of 6.
>> t/spamcop_fetchdubious
>> Test returned status 2 (wstat 512, 0x200)
>> DIED. FAILED tests 3, 5
>> Failed 2/6 tests, 66.67% okay
>> t/spamcopuri...ok
>>ok 7/9
>>  /whitelistok
>> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
>> ---
>> t/dnsrbl.t   3   768 63  50.00%  1-2 5
>> t/spamcop_fetch.t2   512 62  33.33%  3 5
>> Failed 2/13 test scripts, 84.62% okay. 5/63 subtests failed, 92.06%
>> okay.
>> make: *** [test_dynamic] Error 255
>> [EMAIL PROTECTED] Mail-SpamAssassin-SpamCopURI-0.09]# exit
>> exit
 
>> Script done on Fri 09 Apr 2004 00:24:44 BST


-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
Hi Steve,
Eric Kolve has just issued a fix for the test suite in his 0.09
version of SpamCopURI:

   http://sourceforge.net/projects/spamcopuri/

Please give it a try and let us know what kind of results
you get.  :D

Jeff C.
__

On Thursday, April 8, 2004, 4:29:58 PM, Steve Wakelin wrote:
> Jeff,

> Would dearly like to implement this however received the following

> [EMAIL PROTECTED] Mail-SpamAssassin-SpamCopURI-0.09]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Writing Makefile for Mail::SpamAssassin::SpamCopURI
> [EMAIL PROTECTED] Mail-SpamAssassin-SpamCopURI-0.09]# make
> cp lib/Mail/SpamAssassin/Conf.pm blib/lib/Mail/SpamAssassin/Conf.pm
> cp lib/Mail/SpamAssassin/PerMsgStatus.pm
> blib/lib/Mail/SpamAssassin/PerMsgStatus.pm
> cp lib/Mail/SpamAssassin/SpamCopURI.pm
> blib/lib/Mail/SpamAssassin/SpamCopURI.pm
> cp bin/spamcopuri-update blib/script/spamcopuri-update
> /usr/bin/perl "-MExtUtils::MY" -e "MY->fixin(shift)"
> blib/script/spamcopuri-update
> Manifying blib/man3/Mail::SpamAssassin::PerMsgStatus.3pm
> Manifying blib/man3/Mail::SpamAssassin::Conf.3pm
> Manifying blib/man3/Mail::SpamAssassin::SpamCopURI.3pm
> [EMAIL PROTECTED] Mail-SpamAssassin-SpamCopURI-0.09]# make test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>   
> 
> t/bad..ok
>  0
> ...ok 7/10
   
> t/blacklistok 
>  
> t/cacheok
> t/dnsrbl...NOK 1# Failed test (t/dnsrbl.t at line 16)
> t/dnsrbl...NOK 2# Failed test (t/dnsrbl.t at line 18)
  
> t/dnsrbl...NOK 5# Failed test (t/dnsrbl.t at line 25)
> t/dnsrbl...ok 6/6# Looks like you failed 3 tests of 6.
> t/dnsrbl...dubious
> Test returned status 3 (wstat 768, 0x300)
> Scalar found where operator expected at (eval 153) line 1, near "'int' 
> $__val"
> (Missing operator before   $__val?)
> DIED. FAILED tests 1-2, 5
> Failed 3/6 tests, 50.00% okay

> t/good.ok
> t/ipaddressok
> t/longhostname.ok
> t/mailto...ok
>ok 7/7
> t/merge_urls...ok
> t/shorthostnameok

> t/spamcop_fetchNOK 3# Failed test (t/spamcop_fetch.t at line 35)
> # Failed test (t/spamcop_fetch.t at line 50)
> t/spamcop_fetchNOK 5# Looks like you failed 2 tests of 6.
> t/spamcop_fetchdubious
> Test returned status 2 (wstat 512, 0x200)
> DIED. FAILED tests 3, 5
> Failed 2/6 tests, 66.67% okay
> t/spamcopuri...ok
>ok 7/9
>  /whitelistok
> Failed Test   Stat Wstat Total Fail  Failed  List of Failed
> ---
> t/dnsrbl.t   3   768 63  50.00%  1-2 5
> t/spamcop_fetch.t2   512 62  33.33%  3 5
> Failed 2/13 test scripts, 84.62% okay. 5/63 subtests failed, 92.06%
> okay.
> make: *** [test_dynamic] Error 255
> [EMAIL PROTECTED] Mail-SpamAssassin-SpamCopURI-0.09]# exit
> exit
 
> Script done on Fri 09 Apr 2004 00:24:44 BST

-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Catching low content spam...

2004-10-31 Thread Jeff Chan
On Thursday, April 8, 2004, 9:59:47 AM, Matthew Thomas wrote:
> I receive a goodly number of spams in the format below.  I have fed a large
> number of them into bayes, but since the text varies so much from message to
> message I still am not catching many.  Is there a set of rules that anyone
> is using which is reliably catching these?  I'm using SA 2.61, bayes, razor,
> dcc, and sundry .cf files.

> "Discover...in the next few minutes... regardless of your age, sex, or
> current health status, how this common element can change the way you
> experience the 
> next half of your life."
> http://www.hyperfastsolution.com/anx/";>Learn how to increase
> your quality of life

Since the message body has a URI, and URI has been reported to
SpamCop often enough, the domain is in SURBL.  Therefore SURBL
can catch it, *regardless of any randomized text added to throw
off Bayes*.

This domain got added to SURBL about a day and a half ago (time
in GMT):

  top-sites-domains.new.log:2004-04-07 11:34 hyperfastsolution.com

Which would have been about an hour late to catch your copy of
the spam:

> Wed, 7 Apr 2004 05:37:53 -0700

But it would have caught subsequent ones.  And we're modifying
the data engine to catch sites like these sooner.

Here it is in the current zone file:

Name:hyperfastsolution.com.sc.surbl.org
Address:  127.0.0.2

  http://www.surbl.org/

Jeff C.
--
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Spammer trait: [Fwd: Jnichols the amusement park idea APLyQ]

2004-10-31 Thread Jeff Chan
On Thursday, April 8, 2004, 3:40:15 PM, Jonathan Nichols wrote:
> The domains -
> pimpinprices.com
> billiebobbie.com
> funkyprices.com

> These are all owned by the same person, same registrar, everything.

> Yet these are the ONLY ones that seem to slip through.

> Any more domain examples? We need a web form to add these domain names 
> to BigEvil or something. :)

FWIW I see those domains with a low hit count in the SURBL source
data from SpamCop, so I've added them to the manual blacklist for
SURBL.

If the spammer is in SBL, his future domains should also get
caught by the new SURBL data engine with lower thresholds for
known bad domains, name servers, etc.  Oh, these sites are also
hosted by spam-friendly ISPs in China, which would similarly get
them on SURBL with just a couple reports in our next (transparent
to users) version.

In other words, these kind of spammers will get caught
automatically by SURBL RSN.  :)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
On Thursday, April 8, 2004, 5:01:48 PM, Matthew Trent wrote:
> On Thursday 08 April 2004 4:56 pm, Rick Macdougall wrote:
>> Jeff Chan wrote:
>> > Hi Steve,
>> > Eric Kolve has just issued a fix for the test suite in his 0.09
>> > version of SpamCopURI:
>> >
>> >http://sourceforge.net/projects/spamcopuri/

>> Works great here, just installed it 5 minutes ago.
>>
>> Regards,
>>
>> Rick

> Ditto.

> And it looks like it uses SURBL by default now.

Yes, Eric changed it to use the RBL by default and increased the
score to 3.0 also.

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
On Thursday, April 8, 2004, 11:59:30 AM, Chris Santerre wrote:
> I am SOO happy to see this!!!  So what does this mean to the future of
> BigEvil??? I'm not sure just yet. This is what I have wanted for so long. I
> have done BigEvil because we didn't have any other option. But now we do :) 

> I will probably continue to run BigEvil at least a few months after the
> release of SA 3.0, then I will see how it is matching up with me. I may
> change the way I do BigEvil. There are some things I could change to make
> things a LOT easier for me anyway. 

> I greatly thank eveyone involed who made this URI RBL happen!! Maybe now I
> can do more in SARE then be the resident URI skank :)

> --Chris (Slave to the phrase http://)

Thanks Chris!  We're glad to have your support and recognition.
I'll look forward to seeing how your efforts change and evolve.
In other words, I want to see what you come up with next.  :)

It's been an honor to provide the glue to make SURBL happen,
in essence to coordinate the efforts of so many spam fighters
throughout the Internet community.

This is a real collaborative effort from the people who report
spam to SpamCop, to the folks who develop and use the anti-spam
systems, to the folks who write the tools that the developers
use, etc. 

I hope people will give SURBL a try and share some results.

I also hope it takes off and spam takes a dive.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
On Thursday, April 8, 2004, 11:18:14 AM, Matthew Trent wrote:
> On Wednesday 07 April 2004 06:22 pm, Jeff Chan wrote:
>> Unlike other RBLs, the Spam URI RBL (SURBL) is not used to block
>> spam server IP addresses, but instead to block messages based on
>> URI domains previously reported to SpamCop.

> For discussion... would this approach be better or worse (or just different) 
> than the BigEvil way?

> One thing I can think of off the bat is SURBL is having to query a remote 
> server versus using a local list (BigEvil-style) may not scale as well. 
> However, spamcop is pretty big and may provide a larger pool of URIs and they 
> may be added to the list faster...

Those are some good questions.  There are some differences I
can see between Chris' BigEvil list and SURBL.  I'll list
a few off the top of my head:

1.  The source of SURBL is dynamic.  As new URIs are reported
to SpamCop, and the number of reports reach a threshold, they get
added to SURBL.  It's largely automatic.  In essence this leverages
the spam reporting efforts of all the SpamCop users.  It lets
their actions do more to stop spam by blocking on spam URIs in
addition to blocking on sending IP addresses.

2.  Similarly as the reports trail off, the domains eventually
fall off of SURBL.  That has some advantages and disadvantages,
mainly in addressing false positives, joe jobs, etc. We are
increasing the retention time to keep spammers on the RBL longer,
and false positives are somewhat naturally limited by the care of
SpamCop users in their reporting.  (I.e. most will uncheck
legitimate looking domains before submitting.)  So the data from
SpamCop tends to be hand checked on input and pretty good as a
result.

We're also going to be lowering the inclusion threshold and
retention time for SBL-known spammers, web sites at certain
spam-friendly ISPs, etc.  That will help more serious spam
domains get added to SURBL sooner and stay on it much longer.

3.  Implementing the list as an RBL leverages all the caching,
distribution, database, and general networking mechanisms that go
along with DNS (whether it's BIND or rsynced rbldns, since we are
supporting both).  We feel those are some big advantages of RBLs
since they do appear to scale well, have caching, updates and
networking all taken care of, etc.  That's an advantage of RBLs
in general. 

4.  Keeping the data outside of SpamAssassin and using dynamic
RBL data in a sense is *more* scalable since it uses fewer
SpamAssassin resources to simply do a DNS A record lookup on a
constructed RBL name than to try to hold the domain rules
internally.

I'm not at all trying to put down Chris' effort which is a
good one, but our approach does have some differences which
we think are noteworthy.

Cheers,

Jeff C.
--
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Trusted RBLs?

2004-10-31 Thread Jeff Chan
On Thursday, April 8, 2004, 1:52:24 AM, Ryan Castellucci wrote:
> Kelson Vibber wrote:

>> At 09:49 AM 4/7/2004, Shaun T. Erickson wrote:
>>>I'm not trying to start a religious war, but are there any RBLs whose data
>>>is considered trusted enough to use to block mail from coming into the
>>>server, leaving the other RBLs to be used only to add score in SA?
>> 
>> I use:
>> 
>> sbl.spamhaus.org (confirmed spam sources)
>> list.dsbl.org (confirmed open relays)
>> 
>> There aren't any others I trust enough to block outright, and even these
>> get *occasional* false positives.  (Informative reject codes are a must!)
>> 
>> These two lists alone block nearly 60% of our incoming mail.  Every time
>> I've looked at hit rates for other lists (based on SA rules) either the
>> added benefit has been negligible or the FP rate has been too high.
>> 
>> 
>> Kelson Vibber
>> SpeedGate Communications 

> I'll second those suguestions. Though I use sbl-xbl.spamhaus.org instead of 
> sbl.spamhaus.org. I see just over 60% blockage.

May I suggest giving SURBL a try?  It's getting 60% hit rates
by itself and 0% false positives.  The 60% should improve when
the new data engine rolls out within a few weeks.

  http://www.surbl.org/

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Announcing SURBL support in SA 2.63 and 3.0 plugins

2004-10-31 Thread Jeff Chan
Hello SpamAssassin Users,
I'm pleased to announce a new type of RBL for blocking messages
based on spam domains contained in message bodies called SURBL.
Unlike other RBLs, the Spam URI RBL (SURBL) is not used to block
spam server IP addresses, but instead to block messages based on
URI domains previously reported to SpamCop.  We feel this is a
very direct approach to the issue of stopping spam.  It is also
proving highly effective, with spam detection rates currently
approaching 60% together with zero false positives.  Future
improvement is expected as we continue to tune things better.

Acknowledgements go to Julian Haight, Justin Mason, Eric Kolve
and countless others for making this possible, including
SpamCop and SpamAssassin developers and users.

Here's the Quick Start from our web site:
__

  http://www.surbl.org/

SURBL -- Spam URI Realtime Blocklist

Quick Start

[...]

In order to use SURBL you need software that can parse URIs in
message bodies, extract their domains, and check them against
SURBL. 

[...]

For those familiar with adding plugins to SpamAssassin, these
quick start comments may enough information to get started using
SURBL. More details about SURBL itself appear in following
sections.

 
SpamCopURI SpamAssassin 2.63 plugin

  http://sourceforge.net/projects/spamcopuri/

One such program is Eric Kolve's SpamCopURI which is a
SpamAssassin 2.63 plug in.

In order to use SURBL in SpamCopURI, please comment out the older
tests SPAMCOP_URI and SPAMCOP_URI_HOST and increase the score for
the new test up to something like 2.5 or greater:

  score SPAMCOP_URI_RBL  2.5

in the spamcop_uri.cf file. Values higher than 2.5 may be
appropriate because this test is a highly accurate indicator of
spam, for some of the reasons mentioned below. Some people are
using scores of 3.0; others are using up to 6.0.


URIBL SpamAssassin 3.0 plugin

  
http://spamassassin.org/full/3.0.x/dist/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm

Another program is the SpamAssassin 3.0 plugin URIDNSBL, to which
Justin Mason recently added the "urirhsbl" command which can be
used to do name to name matching from message body URI to SURBL.
Here is a sample rule to use urirhsbl with SURBL from the config
file for URIBL:

  http://www.spamassassin.org/full/3.0.x/dist/rules/25_uribl.cf

  urirhsblURIBL_SC_SURBL  sc.surbl.org.   A
  header  URIBL_SC_SURBL  eval:check_uridnsbl('URIBL_SC_SURBL')
  describeURIBL_SC_SURBL  Contains a URL listed in the SC SURBL 
blocklist
  tflags  URIBL_SC_SURBL  net

You will need to score it, presumably with some fairly high value:

  score URI_SC_SURBL  5.0

Some results of using urirhsbl and SpamCopURI with SURBL appear
below. Spam detection rates are running 40-60% with zero false
positives noted so far, and with some improvements expected when
we revise the code to tune the data better.

Update: Feedback so far on the effectiveness of SURBL is very
positive, with spam hit rates ranging up to 60% and near-zero
False Positives. With some more tuning we may be able to improve
that further. We could use help with some more BIND-compatible
secondary DNS servers for the zone file since SURBL seems to be
starting to take off. Also valuable would be integration of SURBL
with an MTA such as postfix. Development of a sendmail milter to
use SURBL is rumored to be in the works. Contact jeffc at surbl
dot org if you would like to help. Thanks!

Raymond Dijkxhoorn has kindly set up an rsync server for the
SURBL rbldns and BIND zone files. Administrators of high volume
mail servers, please contact Raymond for access at:
[EMAIL PROTECTED] Please see the Notes section for more
information.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: ver 3.0 opinions

2004-10-31 Thread Faisal N. Jawdat
Looks like somebody didn't read the UPGRADE doc...
 Due to the database format change, you will want to do something like
  this when upgrading:
I read it and followed the directions and didn't see any problem for a 
couple days and then suddenly the spam level jumped substantially.  
Upon further investigation it looked like the bayes dbs had gotten 
corrupted and I was seeing low tok counts like the original poster had 
reported.  Another friend saw this happen when he first upgraded, 
although I can't speak for his direction-following on the upgrade.

If I see more of this I'll try and isolate it and file an appropriately 
detailed bug report, but I don't think it's necessarily accurate to 
immediately write this off as user error.

-faisal


Re: ver 3.0 opinions

2004-10-31 Thread Ed Kasky
On Sat, 30 Oct 2004, Tuc at Beach House wrote:

> > > himinbjorg% sa-learn --dump magic
> > > 0.000  0  3  0  non-token data: bayes db version
> > > 0.000  0175  0  non-token data: nspam
> > > 0.000  0  73501  0  non-token data: nham
> > > 0.000  01027341  0  non-token data: ntokens
> > 
> > I think 175 spam messages is not nearly enough for Bayes to be
> > adequately trained.  Also, the ratio of ham to spam (~0.3%) looks a
> > bit odd.  If it reflects roughly equal time periods over which you
> > received the messages, it suggests you might be missing (or perhaps
> > misclassifying) some of your spam.
> > 
> But shouldn't it have carried my database over from my previous install? 
> I'd been using it for atleast 6 months on different versions before this
> upgrade.  Did it 'forget' it all?  Do I need to totally retrain it?  I'm
> using a stock install from FreeBSD ports, no local/global overrides.

Looks like somebody didn't read the UPGRADE doc...

 Due to the database format change, you will want to do something like
  this when upgrading:

  - stop running spamassassin/spamd (ie: you don't want it to be running
during the upgrade)
  - run "sa-learn --rebuild", this will sync your journal.  if you skip
this step, any data from the journal will be lost when the DB is
upgraded.
  - upgrade SA to 3.0.0
  - run "sa-learn --sync", which will cause the db format to be upgraded.
if you want to see what is going on, you can add the "-D" option.
  - test the new database by running some sample mails through
SpamAssassin, and/or at least running "sa-learn --dump" to make sure
the data looks valid.
  - start running spamassassin/spamd again
 
. . . . . . . . . . . . . . .
Randomly generated quote:
"My great concern is not whether you have failed, but whether you
are content with your failure." - Abraham Lincoln



Re: Load Average Problems

2004-10-31 Thread jdow
From: "John Fleming" <[EMAIL PROTECTED]>

> jdow said:
> > On another paw I note that most family tools are not left running
> > 24x7. If this is his case then a large portion of his 250 messages
> > may be coming in right after he boots. If he is setup to spawn
> > too many spamds then he could experience a memory crisis.
>
> That's not it.  It's mostly a family/hobby server, but it functions
> "fairly professionally" - I just meant I'm not an ISP or big business
> with thousands of emails a day.  The server's on 24/7/365 running
> Apache, Mailman and other common server stuff - but all at a VERY low
> activity/use level.
>
> I've reviewed my local.cf, and there was some duplication.  I've
> removed the dupes and we'll see if that helps.
>
> I call spamd via spamc in procmail.  I've read man spamc/d - I see
> where to limit the spamd children when using the spamd option, but I
> don't see how to pass that option on when using spamc.  IOW, I don't see
how
> to limit spamd children when using spamc.
>
> Also, my procmailrc uses a lock file when evaluating the results of
> spamd - I guess that doesn't limit starting another spamd before
> that file has been evaluated?  - John

Um, you do not limit with spamc. You simply setup the limit in spamd when
you start or restart it. It is probably a good idea to play with several
values to see which gives you performance closest to your desired
performance. As soon as you get enough spamds up to trigger paging the
overall performance will take a serious dive. To a fairly real extent
a limit of two or three is probably best for single processor systems
modulo how much time is spent computing compared to waiting on IO for
any given spamd. If it is heavily compute bound 2 might be optimum.

{^_^}




Re: ver 3.0 opinions

2004-10-31 Thread Tuc at Beach House
> > himinbjorg% sa-learn --dump magic
> > 0.000  0  3  0  non-token data: bayes db version
> > 0.000  0175  0  non-token data: nspam
> > 0.000  0  73501  0  non-token data: nham
> > 0.000  01027341  0  non-token data: ntokens
> 
> I think 175 spam messages is not nearly enough for Bayes to be
> adequately trained.  Also, the ratio of ham to spam (~0.3%) looks a
> bit odd.  If it reflects roughly equal time periods over which you
> received the messages, it suggests you might be missing (or perhaps
> misclassifying) some of your spam.
> 
But shouldn't it have carried my database over from my previous install? 
I'd been using it for atleast 6 months on different versions before this
upgrade.  Did it 'forget' it all?  Do I need to totally retrain it?  I'm
using a stock install from FreeBSD ports, no local/global overrides.

Thanks Tuc



Re: Load Average Problems

2004-10-31 Thread John Fleming
jdow said:
> On another paw I note that most family tools are not left running
> 24x7. If this is his case then a large portion of his 250 messages
> may be coming in right after he boots. If he is setup to spawn
> too many spamds then he could experience a memory crisis.

That's not it.  It's mostly a family/hobby server, but it functions
"fairly professionally" - I just meant I'm not an ISP or big business
with thousands of emails a day.  The server's on 24/7/365 running
Apache, Mailman and other common server stuff - but all at a VERY low
activity/use level.

I've reviewed my local.cf, and there was some duplication.  I've
removed the dupes and we'll see if that helps.

I call spamd via spamc in procmail.  I've read man spamc/d - I see
where to limit the spamd children when using the spamd option, but I
don't see how to pass that option on when using spamc.  IOW, I don't see how
to limit spamd children when using spamc.

Also, my procmailrc uses a lock file when evaluating the results of
spamd - I guess that doesn't limit starting another spamd before
that file has been evaluated?  - John





tell postfix to not use amavisd/spamassasin for sasl authenticaed mails

2004-10-31 Thread mailinglists
how to tell postfix to NOT check mails from sasl authenticated users:
use no content_filter in your
main.cf:
content_filter =
use a amavisd filter after sasl authentication
main.cf:
smtpd_sender_restrictions =
permit_sasl_authenticated,
check_sender_access regexp:/etc/postfix/amavisd.regexp
amavisd.regexp:
/^/ FILTER smtp-amavis:[127.0.0.1]:10024
master.cf:
#amavisd
smtp-amavisunix  -   -   n   -   2  smtp
   -o smtp_data_done_timeout=1200
   -o disable_dns_lookups=yes
   -o smtp_send_xforward_command=yes
# re-injection service
127.0.0.1:10025 inet  n   -   n   -   -   smtpd
   -o content_filter=
   -o smtpd_banner=Mailware_REINJECT_8.4
   -o disable_dns_lookups=yes
   -o local_recipient_maps=
   -o relay_recipient_maps=
   -o smtpd_helo_restrictions=
   -o smtpd_client_restrictions=
   -o smtpd_sender_restrictions=
   -o smtpd_recipient_restrictions=permit_mynetworks,reject
   -o mynetworks=127.0.0.0/8
   -o myhostname=localhost
- Original Message - 
From: "Glenn Sieb" <[EMAIL PROTECTED]>
To: 
Sent: Thursday, October 28, 2004 11:34 PM
Subject: Re: Spamassassin 3 and postfix


Bill Landry said the following on 10/28/04 17:54:
Add a second IP address and SMTPD service to your Postfix server and have
your customers auth to that IP address, or have then use a different port
on
the same IP address.  Then have Postfix bypass the content filter on this
port or second SMTPD service.
Or he can make a whitelist that includes himself and put that before the
DSBL checking...
Best,
Glenn



Re: [RD] evilnumbers.cf updated

2004-10-31 Thread Matt Yackley
Jan Theofel said:
>
> Hello Matt,
>
> On Sat, Oct 30, 2004 at 01:00:01PM -0500, Matt Yackley wrote:
>>
>> Today marks the first full year that SARE has been contributing custom rules 
>> to
>> the
>> SA community.
>>
>> Happy Birthday SARE!
>
>>>From where do you know this? There's no note about that on the SARE
> website.
>
> Jan
>
> --
> Jan Theofel  Fon:  +49 (7 11) 48 90 83 - 0
> ETES - EDV-Systemhaus GbRFax:  +49 (7 11) 48 90 83 - 50
> Libanonstrasse 58 A * D-70184 Stuttgart  Web: http://www.etes.de
>

Hi Jan,
As one of the original members of SARE, I've still got the email that Chris 
Santerre
sent out to a group of folks asking if they wanted to work together on custom 
rules.
 :)

Yeah, I suppose we should throw something up on the site...

Cheers,
-matt


Re: Load average probs

2004-10-31 Thread jdow
On another paw I note that most family tools are not left running
24x7. If this is his case then a large portion of his 250 messages
may be coming in right after he boots. If he is setup to spawn
too many spamds then he could experience a memory crisis.

{^_-}

From: "JamesDR" <[EMAIL PROTECTED]>

> He did say what his mail average is:
> "This is a small-scale family server receiving only about 250
> emails a day."
> You may want to read the entire message next time.
> Thanks,
> JamesDR
>
> Loren Wilton wrote:
> >>When the high LA hits, available RAM is essentially nil and the swap
space
> >>is about 85% used as well.  When I've seen it hit 12 or so, it seemed
that
> >>the HDD activity would never stop, and I've manually killed spamassassin
> >
> > and
> >
> >>any spamd's.
> >
> >
> > You don't say what your normal mail rate is, nor what rules you are
running.
> >
> > 2.64 didn't have some of the memory problems that were really bad on the
> > initial 3.0 release, and are now largely corrected.  Generally all that
> > would cause the sort of thing you are seeing would be either running too
> > many spamd  children for the available memory, or using some of the
really
> > big addon rule sets like BigEvil.
> >
> > How many spamd children do you have running?  You should probably limit
to
> > 3-5 at most with 512M, at a guess.  Are you using BigEvil?  If so, get
rid
> > of it and use the surbl patch instead.
> >
> > You are clearly getting into swap, and this implies that you have
overflowed
> > memory.
> >
> > Loren
> >
> >
>




Re: [RD] evilnumbers.cf updated

2004-10-31 Thread jdow
I just wonder what rule set should be catching "m o r tgage" along with
all the other drug rules level permutations on that word, perhaps
coupled with "application" similarly permutated.
{^_^}
- Original Message - 
From: "Matt Yackley" <[EMAIL PROTECTED]>


> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Today marks the first full year that SARE has been contributing custom
rules to the
> SA community.
>
> Happy Birthday SARE!
>
>
> Updated:
> evilnumbers.cf
>
> http://www.rulesemporium.com/rules.htm
>
>
>
>
> Cheers,
> matt