date:20060217

Dallas Engelken wrote:

-Original Message-
From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 17, 2006 01:09

To: users@spamassassin.apache.org
Subject: Re: Over-scoring of SURBL lists...

On Thu, Feb 16, 2006 at 10:42:19PM -, Dallas Engelken wrote:
So.. I have moved partypoker.com to grey for now.  I'll let you and 
Theo thumb wrestle over it :)

Warning: I have big hands. ;)

Yea, thats what she said  ;)

I'm happy to show samples of mails to certain folks, btw.  
There are several personal and spamtrap entries in my which 
refer to them:

Hmm... I don't have any spam with the domain in it for the last 6 months 
anyway.  It doesn't surprise me that they'd be spamming though.

Oh I'm not saying anyones wrong.. I'm just tired of hearing people say we
are wrong.  We actually have 2 black submissions for partypoker.com.  One
actually had a sample attached.

Don't get me wrong.  I'm not saying you're wrong.  I was just under the 
(possibly wrong) impression that your 'black' list was for domains that 
only appear in spam, and 'grey' was for domains that appear in both spam 
and ham.

Daryl

Re: Over-scoring of SURBL lists...

2006-02-17 Thread Rune Kristian Viken

On Thursday 16 February 2006 21:05, jdow wrote:
 ...
 The URL-lists are made in a different manner.

 Take for example - a fully legit message from one friend to another
 that contains something like this:


 Hi $name, god I'm getting tired of all the spam we're receiving about
 http://uri-here/uri , I've noticed that they're actually located in the
 neighbourhood - maybe we should pay them a visit and tell them what we
 think about them?

 regards,
 $name
 .. that message is fully legit, and the only wrong thing about it is
 the mentioning of the uri.  It'll get blocked, while still violating
 nothing more than one rule that in practice is repeated multiple times.
 Rune, there are two canonical means of solving that petty issue. If
 there is someone likely to send you such a message white list her. Or
 simply munge the name, for example http://uri-here-M/uri/.

Certainly, the only problem with your approach is that we're suddenly set 
several years back with that apporach.  The great thing with spamassassin 
has always been that it's usually not one single factor that triggers 
whether the message is categorized as spam or not.  It's a combination of 
multiple factors.

Furthermore, while both you and I fully understand your nice example, a lot 
of users will not.  We're back to users having to understand munging, 
manually having to edit whitelists, and so forth.  

This issue isn't very important to me, but I would think of it as sad if  
spamsassassin became less usefull by default due to overzealousness.  For 
me it's easy enough just to edit the rules to suit my needs and remove the 
over-board URI checks, but .. 

-- 
Rune Kristian Viken
Basefarm AS
Tlf: (+47) 98 28 28 41

scoring changes not taking effect

2006-02-17 Thread Tom Brown


Hi

I have been manually tweaking some rules to increase their score and 
then doing a spamassassin --lint on the rules however it seems my score 
increases have not taken effect


eg

No, hits=3.675 tagged_above=0 required=5 
tests=[DATE_IN_FUTURE_03_06=2.007, HTML_MESSAGE=0.001, 
MIME_HTML_ONLY=0.001, SARE_SPEC_XXGEOCITIES2=1.666]


and yet

# grep SARE_SPEC_XXGEOCITIES2 /etc/mail/spamassassin/*
70_sare_specific.cf:meta  SARE_SPEC_XXGEOCITIES2 
!__SARE_SPEC_XXGEOCITIE   __SARE_SPEC_XX2GEOCIT
70_sare_specific.cf:describe  SARE_SPEC_XXGEOCITIES2   spamsign pointing 
to free webhost spam site

70_sare_specific.cf:score SARE_SPEC_XXGEOCITIES2   2.666

and i had _definately_ done a --lint after changing that score

any ideas?? this is on a postfix system using amavisd-new

thanks

Re: scoring changes not taking effect

2006-02-17 Thread Dirk Bonengel


Hi,

You need to restart amavisd-new which uses the spamassassin-classes 
internally.

So no need to run spamd (if you do).

Applies only if you do spamchecking via amavisd-new, of course

Dirk

Tom Brown schrieb:


Hi

I have been manually tweaking some rules to increase their score and 
then doing a spamassassin --lint on the rules however it seems my 
score increases have not taken effect


eg

No, hits=3.675 tagged_above=0 required=5 
tests=[DATE_IN_FUTURE_03_06=2.007, HTML_MESSAGE=0.001, 
MIME_HTML_ONLY=0.001, SARE_SPEC_XXGEOCITIES2=1.666]


and yet

# grep SARE_SPEC_XXGEOCITIES2 /etc/mail/spamassassin/*
70_sare_specific.cf:meta  SARE_SPEC_XXGEOCITIES2 
!__SARE_SPEC_XXGEOCITIE   __SARE_SPEC_XX2GEOCIT
70_sare_specific.cf:describe  SARE_SPEC_XXGEOCITIES2   spamsign 
pointing to free webhost spam site

70_sare_specific.cf:score SARE_SPEC_XXGEOCITIES2   2.666

and i had _definately_ done a --lint after changing that score

any ideas?? this is on a postfix system using amavisd-new

thanks

Re: scoring changes not taking effect

2006-02-17 Thread Tom Brown



You need to restart amavisd-new which uses the spamassassin-classes 
internally.

So no need to run spamd (if you do).

Applies only if you do spamchecking via amavisd-new, of course


ahh OK thanks - so after a --lint i need to restart amavisd aswell - no 
i don't run spamd


thanks

Re: Can you read user confs from /config/$USER instead of /home/$USER?

2006-02-17 Thread Rick Macdougall


Cian Davis wrote:

Rick Macdougall wrote:
I tried this and it didn't work. I edited /etc/default/spamassassin and
changed the options to OPTIONS=--create-prefs --max-children 5
--helper-home-dir --virtual-config-dir=/config/%l/.spamassassin -x

The result was SA creating a directory /config/.spamassassin owned by root.

The line in/var/log/mail.log is Feb 17 09:27:11 hex spamd[15467]: Using
default config for davisc: /config//.spamassassin/user_prefs

I assume that even if this did work, the process can't run as the
user.Correct? And if so, then the bayes token cannot be updated. Or new
configs are created owned by root.


Hi,

Try without the --helper-home-dir

I used to use this method before I moved everything into MySQL and it 
did work for me.


Regards,

Rick

Re: spamd: unauthorized connection







Matt Kettler wrote:

  Marc Perkel wrote:
  
  

Theo Van Dinter wrote:


  On Thu, Feb 16, 2006 at 05:36:32PM -0800, Marc Perkel wrote:
  
  
  
Why is spamd deciding what IP addresses are unauthorized when I told it 
to listen on all ports.


  
  Just because it's listening on a port doesn't mean the client is allowed to
connect.  You want to look at -A which is the listing of allowed client IPs.
  
  

Yes - that's it. Thanks.

So - why two different settings?

  
  
Because they control two totally different things.

-i controls which interfaces of the SERVER that spamd will listen for connections n.

-A controls which CLIENTS it will accept connections from.

Say I have 3 webservers, 3 mailservers, and 1 backend spamd server in a DMZ
subnet. I want the mailservers to connect to the backend spamd, but there's no
reason to allow the webservers to do so.

In fact, if the webservers are are running a lot of scripts that might get
exploited, it's probably better that I not allow them to connect to spamd. If
someone found a way of exploiting spamd over the network, they could leapfrog
from the webserver to the spamd server.

Admittedly -A is a bit redundant with iptables, you could achieve the same
effect with any firewall on the spamd server. However, this way it is defaulting
to accepting connections from nobody, just to force you to think about what
machines you should accept connections from.


  


If I may suggest - it is a very confusing configuration because I don't
see why you would configure these two things to different vaules.
However, you should at least donument it better so that the -i and -A
sections refer to each other. You can surely see why if someone did -i
then they would not be looking for another switch that does almost the
same thing.

I recomment changint it so that both switches do the same thing.

Re: Can you read user confs from /config/$USER instead of /home/$USER?

2006-02-17 Thread Cian Davis

Rick Macdougall wrote:
 Cian Davis wrote:
 Rick Macdougall wrote:
 I tried this and it didn't work. I edited /etc/default/spamassassin and
 changed the options to OPTIONS=--create-prefs --max-children 5
 --helper-home-dir --virtual-config-dir=/config/%l/.spamassassin -x

 The result was SA creating a directory /config/.spamassassin owned by
 root.

 The line in/var/log/mail.log is Feb 17 09:27:11 hex spamd[15467]: Using
 default config for davisc: /config//.spamassassin/user_prefs

 I assume that even if this did work, the process can't run as the
 user.Correct? And if so, then the bayes token cannot be updated. Or new
 configs are created owned by root.

 Hi,

 Try without the --helper-home-dir

 I used to use this method before I moved everything into MySQL and it
 did work for me.

 Regards,

 Rick

Still getting the same (I did confirm that the spamd process was running
with the correct arguments).

The problem here is that (for some reason) it's not expanding %l to the
username.

When you used this, did the spamd process su to the user involved?

Regards,
Cian

Re: spamd: unauthorized connection

2006-02-17 Thread DAve


Marc Perkel wrote:



Matt Kettler wrote:


Marc Perkel wrote:
 


Theo Van Dinter wrote:
   


On Thu, Feb 16, 2006 at 05:36:32PM -0800, Marc Perkel wrote:
   

Why is spamd deciding what IP addresses are unauthorized when I 
told it to listen on all ports.



Just because it's listening on a port doesn't mean the client is 
allowed to
connect.  You want to look at -A which is the listing of allowed 
client IPs.



Yes - that's it. Thanks.

So - why two different settings?




Because they control two totally different things.

-i controls which interfaces of the SERVER that spamd will listen for 
connections n.


-A controls which CLIENTS it will accept connections from.

Say I have 3 webservers, 3 mailservers, and 1 backend spamd server in 
a DMZ
subnet. I want the mailservers to connect to the backend spamd, but 
there's no

reason to allow the webservers to do so.

In fact, if the webservers are are running a lot of scripts that might 
get
exploited, it's probably better that I not allow them to connect to 
spamd. If
someone found a way of exploiting spamd over the network, they could 
leapfrog

from the webserver to the spamd server.

Admittedly -A is a bit redundant with iptables, you could achieve the 
same
effect with any firewall on the spamd server. However, this way it is 
defaulting
to accepting connections from nobody, just to force you to think about 
what

machines you should accept connections from.



If I may suggest - it is a very confusing configuration because I don't 
see why you would configure these two things to different vaules. 
However, you should at least donument it better so that the -i and -A 
sections refer to each other. You can surely see why if someone did -i 
then they would not be looking for another switch that does almost the 
same thing.


I recomment changint it so that both switches do the same thing.


But they don't do the same thing. For example, I have one spamd server, 
and three mail toasters. I use both the -i and the -A switch. My spamd 
server is at 10.0.240.253 and my toasters are lumped in with everything 
else at 10.0.240.50-200. (all my servers have two faces, a 100mb public 
interface and a 1gb private interface)


I run spamd like so,

#!/sbin/sh

PATH=/usr/bin:/usr/local/bin

exec /usr/local/bin/softlimit -a 12800 \
/usr/local/bin/spamd -i 10.0.240.253 \
-p 1783 \
-A 10.0.240.134 \
10.0.240.135 \
10.0.240.136 \
-m 25 \
--max-conn-per-child=500 \
-u vpopmail -x -q -s stderr 21

-i tells spamd to listen only on the 10.0.240.253 interface, ignore the 
10.0.241.xxx interface, that one is public.


-A tells spamd to only accept connectione from 10.0.240.134-136, my 
toasters. Do *NOT* accept connections from my Frontpage server, my 
webservers, my shared hosting box, my MSSQL box, etc.


The two switches do very different things.

DAve

Re: SpamD won't connect to MySQL if started via init.d

2006-02-17 Thread Glen Carreras

Thanks to everyone who offered suggestions, both on and offlist.  
Unfortunately nothing seems to work so I suppose I will just resort to 
starting spamd in another way until I find the answer.  I realize this 
wasn't a SA problem per se, but I appreciate the time and efforts to help.


Cheers,
Glen


Glen Carreras wrote:

Hi,
Hopefully someone can give me some advice here.  I've been fighting 
with SpamD for the last two days trying to get it to connect to a 
MySQL database.  I think I've finally weeded out all of my own 
errors and am down to this:


I'm running SA 3.1 on a Fedora 5 setup and MySQL is running on another 
machine.  If I start SpamD via the init.d script, it fails to connect 
to the database.  It really doesn't give much more info than that but 
it does show enough info to prove to me that it is reading the config 
files and at least providing correct login information.


If I start spamd from the command line (and demonize it) it connects 
fine.


I'm using identical tests... starting as root, using the same test 
mail, same email user, same database user, the only difference (that I 
am aware) is the fact that one starts from init.d and the other from 
the command line.  I realize there are, of course, some extra things 
in the init.d script, but to my untrained eye, I can't see anything 
that should affect it. Can anyone help shed any light on what I might 
try next?  The command I am using to start SpamD is this:


spamd -d -A 192.,127. -i -l -x -q -m 2

I'd greatly appreciate any suggestions.
Glen

Re: spamd: unauthorized connection




DAve wrote:

Marc Perkel wrote:



Matt Kettler wrote:


Marc Perkel wrote:
 


Theo Van Dinter wrote:
  

On Thu, Feb 16, 2006 at 05:36:32PM -0800, Marc Perkel wrote:
  
Why is spamd deciding what IP addresses are unauthorized when I 
told it to listen on all ports.



Just because it's listening on a port doesn't mean the client is 
allowed to
connect.  You want to look at -A which is the listing of allowed 
client IPs.



Yes - that's it. Thanks.

So - why two different settings?




Because they control two totally different things.

-i controls which interfaces of the SERVER that spamd will listen 
for connections n.


-A controls which CLIENTS it will accept connections from.

Say I have 3 webservers, 3 mailservers, and 1 backend spamd server 
in a DMZ
subnet. I want the mailservers to connect to the backend spamd, but 
there's no

reason to allow the webservers to do so.

In fact, if the webservers are are running a lot of scripts that 
might get
exploited, it's probably better that I not allow them to connect to 
spamd. If
someone found a way of exploiting spamd over the network, they could 
leapfrog

from the webserver to the spamd server.

Admittedly -A is a bit redundant with iptables, you could achieve 
the same
effect with any firewall on the spamd server. However, this way it 
is defaulting
to accepting connections from nobody, just to force you to think 
about what

machines you should accept connections from.



If I may suggest - it is a very confusing configuration because I 
don't see why you would configure these two things to different 
vaules. However, you should at least donument it better so that the 
-i and -A sections refer to each other. You can surely see why if 
someone did -i then they would not be looking for another switch that 
does almost the same thing.


I recomment changint it so that both switches do the same thing.


But they don't do the same thing. For example, I have one spamd 
server, and three mail toasters. I use both the -i and the -A switch. 
My spamd server is at 10.0.240.253 and my toasters are lumped in with 
everything else at 10.0.240.50-200. (all my servers have two faces, a 
100mb public interface and a 1gb private interface)


I run spamd like so,

#!/sbin/sh

PATH=/usr/bin:/usr/local/bin

exec /usr/local/bin/softlimit -a 12800 \
/usr/local/bin/spamd -i 10.0.240.253 \
-p 1783 \
-A 10.0.240.134 \
10.0.240.135 \
10.0.240.136 \
-m 25 \
--max-conn-per-child=500 \
-u vpopmail -x -q -s stderr 21

-i tells spamd to listen only on the 10.0.240.253 interface, ignore 
the 10.0.241.xxx interface, that one is public.


-A tells spamd to only accept connectione from 10.0.240.134-136, my 
toasters. Do *NOT* accept connections from my Frontpage server, my 
webservers, my shared hosting box, my MSSQL box, etc.


The two switches do very different things.

DAve

Well then the DOCS should be changed so that the docs for -i and -A at 
least refer to each other.

Re: spamd: unauthorized connection

2006-02-17 Thread DAve


Marc Perkel wrote:



DAve wrote:


Marc Perkel wrote:




Matt Kettler wrote:


Marc Perkel wrote:
 


Theo Van Dinter wrote:
 


On Thu, Feb 16, 2006 at 05:36:32PM -0800, Marc Perkel wrote:
 

Why is spamd deciding what IP addresses are unauthorized when I 
told it to listen on all ports.




Just because it's listening on a port doesn't mean the client is 
allowed to
connect.  You want to look at -A which is the listing of allowed 
client IPs.




Yes - that's it. Thanks.

So - why two different settings?





Because they control two totally different things.

-i controls which interfaces of the SERVER that spamd will listen 
for connections n.


-A controls which CLIENTS it will accept connections from.

Say I have 3 webservers, 3 mailservers, and 1 backend spamd server 
in a DMZ
subnet. I want the mailservers to connect to the backend spamd, but 
there's no

reason to allow the webservers to do so.

In fact, if the webservers are are running a lot of scripts that 
might get
exploited, it's probably better that I not allow them to connect to 
spamd. If
someone found a way of exploiting spamd over the network, they could 
leapfrog

from the webserver to the spamd server.

Admittedly -A is a bit redundant with iptables, you could achieve 
the same
effect with any firewall on the spamd server. However, this way it 
is defaulting
to accepting connections from nobody, just to force you to think 
about what

machines you should accept connections from.




If I may suggest - it is a very confusing configuration because I 
don't see why you would configure these two things to different 
vaules. However, you should at least donument it better so that the 
-i and -A sections refer to each other. You can surely see why if 
someone did -i then they would not be looking for another switch that 
does almost the same thing.


I recomment changint it so that both switches do the same thing.



But they don't do the same thing. For example, I have one spamd 
server, and three mail toasters. I use both the -i and the -A switch. 
My spamd server is at 10.0.240.253 and my toasters are lumped in with 
everything else at 10.0.240.50-200. (all my servers have two faces, a 
100mb public interface and a 1gb private interface)


I run spamd like so,

#!/sbin/sh

PATH=/usr/bin:/usr/local/bin

exec /usr/local/bin/softlimit -a 12800 \
/usr/local/bin/spamd -i 10.0.240.253 \
-p 1783 \
-A 10.0.240.134, \
10.0.240.135, \
10.0.240.136 \
-m 25 \
--max-conn-per-child=500 \
-u vpopmail -x -q -s stderr 21

-i tells spamd to listen only on the 10.0.240.253 interface, ignore 
the 10.0.241.xxx interface, that one is public.


-A tells spamd to only accept connectione from 10.0.240.134-136, my 
toasters. Do *NOT* accept connections from my Frontpage server, my 
webservers, my shared hosting box, my MSSQL box, etc.


The two switches do very different things.

DAve

Well then the DOCS should be changed so that the docs for -i and -A at 
least refer to each other.





They don't have to refer to each other. One switch tells where spamd 
should listen for connections, the other tells spamd what connections to 
listen for. You can use one, the other, both, or neither as you require.


DAve


http://spamassassin.apache.org/full/3.1.x/dist/doc/spamd.html

-i [ipaddress], --listen-ip[=ipaddress], --ip-address[=ipaddress]
Tells spamd to listen on the specified IP address (defaults to 
127.0.0.1). If you specify no IP address after the switch, spamd will 
listen on all interfaces. (This is equal to the address 0.0.0.0). You 
can also use a valid hostname which will make spamd listen on the first 
address that name resolves to.


-A host,..., --allowed-ips=host,...
Specify a list of authorized hosts or networks which can connect to 
this spamd instance. Single IP addresses can be given, ranges of IP 
addresses in address/masklength CIDR format, or ranges of IP addresses 
by listing 3 or less octets with a trailing dot. Hostnames are not 
supported, only IP addresses. This option can be specified multiple 
times, or can take a list of addresses separated by commas. Examples:


-A 10.11.12.13 -- only allow connections from 10.11.12.13.

-A 10.11.12.13,10.11.12.14 -- only allow connections from 
10.11.12.13 and 10.11.12.14.


-A 10.200.300.0/24 -- allow connections from any machine in the 
range 10.200.300.*.


-A 10. -- allow connections from any machine in the range 10.*.*.*.

By default, connections are only accepted from localhost [127.0.0.1].

Template Tags?

I see a section on template tags but it doesn't show what file these 
tags are used in. I'm trying to add the vayes score to the header. How 
do I do that?

Re: Over-scoring of SURBL lists...

Jeff Chan wrote:
 On Thursday, February 16, 2006, 9:13:36 PM, Matt Kettler wrote:
   
 I'm only presenting evidence of accuracy problems in relation to why the
 URIBLs collectively wield a great deal of power in SpamAssassin scoring.
 I'm not really complaining about uribl.com, I'm complaining about URIBLs
 as a whole. That's both uribl.com and surbl. Whenever I use the term
 URIBL in all caps, I mean all URI dns-based blacklists. If you prefer,
 I'll retract my uribl.com example, and point out that less than an hour
 later, I got a ws.surbl.org FP.
 

 There may be some value in not lumping together URIBL.com and
 SURBL.org lists.  As you can see the performance of the lists are
 different, and the way they're created is different too.  That
 makes it harder for us to respond to comments that seem to not
 take those differences into account.
   
Did you see Theo's test data from yesterday?

 35.418  41.1930   0.1.000   0.900.00  URIBL_JP_SURBL
 34.665  40.3177   0.1.000   0.880.00  URIBL_SC_SURBL
 26.069  30.3204   0.1.000   0.800.00  URIBL_AB_SURBL
 28.024  32.5464   0.29150.991   0.610.00  URIBL_OB_SURBL
 48.113  55.7492   1.28730.977   0.550.00  URIBL_BLACK
  0.293   0.3406   0.1.000   0.470.00  URIBL_PH_SURBL
  0.000   0.   0.0.500   0.420.00  URIBL_RED
  0.000   0.   0.0.500   0.420.01  T_URIBL_XS_SURBL
 37.539  42.4763   7.26260.854   0.380.00  URIBL_WS_SURBL
  0.548   0.3446   1.79740.161   0.030.00  URIBL_GREY

I consider that highly similar for JP, SC, AB, OB and WS.

Also, even if there are some differences, even 10% overlap would have
the effect I'm talking about.

I personally would like to see some statistics, but  at this point, we
don't have any test data on this so we're arguing your theory vs mine.

I'd love to see some results for some meta tests:

meta SURBL_MULTI2   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
URIBL_OB_SURBL+  URIBL_WS_SURBL) 2)
meta SURBL_MULTI3   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
URIBL_OB_SURBL+  URIBL_WS_SURBL) 3)
meta SURBL_MULTI4   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
URIBL_OB_SURBL+  URIBL_WS_SURBL) 4)

In particular, I'm concerned about the ham hits of even multi 2.

Theo?
 3) I'm even more concerned about the monoculure of the URIBLs.
 

 I suppose it depends on your point of view.  From my point of
 view the various lists are different in terms of sources and
 listing logic.  As you can see from the results posted, they have
 fairly different performance in terms of spam and ham hits, but
 those measurements don't take into account the underlying
 tools and sources that go into making them, which varies between
 lists.
   

I don't see the difference from the recent results posted by Theo.
   
 uribl.com's black, surbl.org's ws, sc, jp, ab and ob are all
 more-or-less the same list. Paul argued against that statement, but in
 my mind his arguments are weak at best. There IS considerable overlap
 between these lists. Contrary Paul's statements, you only need to be
 reported once by a spamcop spamtrap or trusted feed to be on SC.
 

 That's only partially correct.  Paul's statement is correct for
 most SpamCop reports.  It takes many reports to get on SC for
 most domains except the ones that resolve into known spammer
 networks.
   
 There are no trusted feeds for SC.  
Not on your end, but keep in mind that spamcop trusts their spamtraps
with a 5x bias.


 The data in SC comes from
 SpamCop reports.  I don't know the number of SpamCop users, but
 they're probably many.  The way I deal with the issue of trust is
 to aggregate the reports in various ways and ignore some of the
 noise that would lead to FPs.  And all SURBL lists are subject to
 whitelisting as a final arbiter.  So even if a SpamCop user
 wanted us to blacklist say google.com or yahoo.com, we won't.

   
 JP
 monitors 18,000 domains, not just two people. AB accepts feeds directly
 from spamcop and does different analysis on them. Ultimately it is
 possible for a single copy of an email to cause a listing in
 uribl_black, SC, WS, JP, and OB all at the same time.
 

 Not really.  It take a fairly large and widespread spam run to
 get onto multiple (SURBL) lists.  
So why do so some small-spread legitamate mailings with special-purpose
domains end up multi-listed? I've seen this happen a number of times in
the past 3 weeks. This *IS* real.

It's not terribly common in terms of % of email, but maybe 1 in 1000 ham
mails I get has a double-listed link in it.

Re: Over-scoring of SURBL lists...

On Friday, February 17, 2006, 7:19:50 AM, Matt Kettler wrote:
 Jeff Chan wrote:
 On Thursday, February 16, 2006, 9:13:36 PM, Matt Kettler wrote:
   
 I'm only presenting evidence of accuracy problems in relation to why the
 URIBLs collectively wield a great deal of power in SpamAssassin scoring.
 I'm not really complaining about uribl.com, I'm complaining about URIBLs
 as a whole. That's both uribl.com and surbl. Whenever I use the term
 URIBL in all caps, I mean all URI dns-based blacklists. If you prefer,
 I'll retract my uribl.com example, and point out that less than an hour
 later, I got a ws.surbl.org FP.
 

 There may be some value in not lumping together URIBL.com and
 SURBL.org lists.  As you can see the performance of the lists are
 different, and the way they're created is different too.  That
 makes it harder for us to respond to comments that seem to not
 take those differences into account.
   
 Did you see Theo's test data from yesterday?

Yes.  I was referring lumping URIBL.com with SURBL.org mostly.

  35.418  41.1930   0.1.000   0.900.00  URIBL_JP_SURBL
  34.665  40.3177   0.1.000   0.880.00  URIBL_SC_SURBL
  26.069  30.3204   0.1.000   0.800.00  URIBL_AB_SURBL
  28.024  32.5464   0.29150.991   0.610.00  URIBL_OB_SURBL
  48.113  55.7492   1.28730.977   0.550.00  URIBL_BLACK
   0.293   0.3406   0.1.000   0.470.00  URIBL_PH_SURBL
   0.000   0.   0.0.500   0.420.00  URIBL_RED
   0.000   0.   0.0.500   0.420.01  T_URIBL_XS_SURBL
  37.539  42.4763   7.26260.854   0.380.00  URIBL_WS_SURBL
   0.548   0.3446   1.79740.161   0.030.00  URIBL_GREY

 I consider that highly similar for JP, SC, AB, OB and WS.

As similar as 30 and 40, and 0, .3 and 7 are, I suppose.

 Also, even if there are some differences, even 10% overlap would have
 the effect I'm talking about.

 I personally would like to see some statistics, but  at this point, we
 don't have any test data on this so we're arguing your theory vs mine.

 I'd love to see some results for some meta tests:

 meta SURBL_MULTI2   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
 URIBL_OB_SURBL+  URIBL_WS_SURBL) 2)
 meta SURBL_MULTI3   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
 URIBL_OB_SURBL+  URIBL_WS_SURBL) 3)
 meta SURBL_MULTI4   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
 URIBL_OB_SURBL+  URIBL_WS_SURBL) 4)

 In particular, I'm concerned about the ham hits of even multi 2.

I'd be concerned about it to, but it seldom seems to happen.

 Theo?
 3) I'm even more concerned about the monoculure of the URIBLs.
 

 I suppose it depends on your point of view.  From my point of
 view the various lists are different in terms of sources and
 listing logic.  As you can see from the results posted, they have
 fairly different performance in terms of spam and ham hits, but
 those measurements don't take into account the underlying
 tools and sources that go into making them, which varies between
 lists.
   

 I don't see the difference from the recent results posted by Theo.

That's like saying two different RBLs that hit a similar
percentage of spams must therefore have the same policies, even
when they may have no data in common.  It's not a conclusion that
can be drawn from that kind of measurement.

 uribl.com's black, surbl.org's ws, sc, jp, ab and ob are all
 more-or-less the same list. Paul argued against that statement, but in
 my mind his arguments are weak at best. There IS considerable overlap
 between these lists. Contrary Paul's statements, you only need to be
 reported once by a spamcop spamtrap or trusted feed to be on SC.
 

 That's only partially correct.  Paul's statement is correct for
 most SpamCop reports.  It takes many reports to get on SC for
 most domains except the ones that resolve into known spammer
 networks.
   
 There are no trusted feeds for SC.

 Not on your end, but keep in mind that spamcop trusts their spamtraps
 with a 5x bias.

Our feeds are SpamCop user and mole reports, not SpamCop trap data.

 The data in SC comes from
 SpamCop reports.  I don't know the number of SpamCop users, but
 they're probably many.  The way I deal with the issue of trust is
 to aggregate the reports in various ways and ignore some of the
 noise that would lead to FPs.  And all SURBL lists are subject to
 whitelisting as a final arbiter.  So even if a SpamCop user
 wanted us to blacklist say google.com or yahoo.com, we won't.

   
 JP
 monitors 18,000 domains, not just two people. AB accepts feeds directly
 from spamcop and does different analysis on them. Ultimately it is
 possible for a single copy of an email to cause a listing in
 uribl_black, SC, WS, JP, and OB all at the same time.
 

 Not really.  It take a fairly large and widespread spam run to
 get onto multiple (SURBL) lists.

 So why do so some small-spread legitamate mailings with special-purpose
 domains end up multi-listed? I've

Re: Over-scoring of SURBL lists...

Jeff Chan wrote:

 I don't see the difference from the recent results posted by Theo.
 

 That's like saying two different RBLs that hit a similar
 percentage of spams must therefore have the same policies, even
 when they may have no data in common.  It's not a conclusion that
 can be drawn from that kind of measurement.
   
Yes, I agree.. You can't make conclusions from that data. However, you
were pointing out the performance data gave clear indication of a lack
of similarity between the lists. I certainly can not conclude from
Theo's data that they do not have a significant overlap. If anything,
Theo's data does suggest overlap, but you'd need some tests with
supplemental tests to get a feel for how much overlap there is.

That's why I posted some rules to test against. Without that kind of
measurement, both you and I are speculating, at best.

I'll even re-quote myself:
 I personally would like to see some statistics, but  at this point, we
  don't have any test data on this so we're arguing your theory vs mine.
And your quote that I was counter-pointing:
 As you can see the performance of the lists are different, and the way 
 they're created is different too.

I don't see enough of a difference to clearly rule out significant overlap.

I'll define my test of significant overlap as:
10% of total hits redundant across 3 or more lists and 1% nonspam hits
redundant across 2 or more lists.

I'd personally like to see some test data to see if that's really
happening. Because with SA's current scoring, numbers like that are a
problem.

FW: Spam Score Advise

2006-02-17 Thread Vahric MUHTARYAN











Hi Everybody , 



I started spamassassin score from
6.5 , now Im watching the mail flow and I saw that if mails are really a
mail they have a point bettween 0.1  1.x , and some of spams are getting
score between 5.0  5.9 and because of this I couldnt catch it .
Actually I know I can play with scores and now I m watching carefuly
mail traffic for reducing limits , and I need spamassassin people recomendation
, What is your spamassassin scores ? 



Thanks 
Vahric

RE: Spam Score Advise

2006-02-17 Thread Bowie Bailey

Vahric MUHTARYAN wrote:
 
 I started spamassassin score from 6.5 , now Im watching the mail
 flow and I saw that if mails are really a mail they have a point
 bettween 0.1 - 1.x , and some of spams are getting score between 5.0
 - 5.9 and because of this I couldnt catch it . Actually I know I can
 play with scores and now I m watching carefuly mail traffic for
 reducing limits , and I need spamassassin people recomendation , What
 is your spamassassin scores ?  

The SpamAssassin default for required_hits is 5.0 and that works pretty
well for me even with lots of extra rules installed.

For my personal account, I have dropped it down to 4.0 and only see the
occasional false positive (usually from this list).

The scores for the default SpamAssassin rules are calibrated for a
required_hits setting of 5.0.  I would recommend that you set it there
and see what happens.  You can adjust the score up or down a bit as you
need.  Don't play with the individual rule scores unless you know that a
particular rule is causing a problem for you.

-- 
Bowie

RE: Spam Score Advise

2006-02-17 Thread Bret Miller

Title: Message



We drop spam at
5.0 and optionally file 4.0-4.99 mail in the user's Junk E-mail if they have
that folder.

  
  -Original Message-From: Vahric MUHTARYAN
  [mailto:[EMAIL PROTECTED] Sent: Friday, February 17, 2006 8:25
  AMTo: users@spamassassin.apache.orgSubject: FW: Spam
  Score Advise 
  
  
  Hi Everybody ,
  
  
  I started spamassassin score from
  6.5 , now Im watching the mail flow and I saw that if mails are really a mail
  they have a point bettween 0.1  1.x , and some of spams are getting score
  between 5.0  5.9 and because of this I couldnt catch it . Actually I know I
  can play with scores and now I m watching carefuly mail traffic for reducing
  limits , and I need spamassassin people recomendation , What is your
  spamassassin scores ? 
  
  Thanks Vahric

Re: FW: Spam Score Advise

Vahric MUHTARYAN wrote:

 Hi Everybody ,

 I started spamassassin score from 6.5 , now I’m watching the mail flow
 and I saw that if mails are really a mail they have a point bettween
 0.1 – 1.x , and some of spams are getting score between 5.0 – 5.9 and
 because of this I couldn’t catch it . Actually I know I can play with
 scores and now I ‘m watching carefuly mail traffic for reducing limits
 , and I need spamassassin people recomendation , What is your
 spamassassin scores ?

I use the default required_score of 5.0. Some use more, some use less,
but most start off at 5.0.

Shifting to 6.5 will reduce your false positives, but will also greatly
increase your miss rate.

Stealing a little data from the set3 statistics:
http://spamassassin.apache.org/full/3.1.x/dist/rules/STATISTICS-set3.txt

# SUMMARY for threshold 5.0:
# Correctly non-spam:  53068  99.96%
# Correctly spam: 122508  98.97%
# False positives:23  0.04%
# False negatives:  1270  1.03%

# SUMMARY for threshold 6.5:
# Correctly non-spam:  53081  99.98%
# Correctly spam: 121740  98.35%
# False positives:10  0.02%
# False negatives:  2038  1.65%


In theory a threshold of 6.5 will have 56.5% fewer false positives, but also 
60.4% more false negatives compared with 5.0.

That sounds about like what you're experiencing.

Re: Template Tags?

Marc Perkel wrote:
 I see a section on template tags but it doesn't show what file these
 tags are used in. I'm trying to add the vayes score to the header. How
 do I do that?

with a add_header command in local.cf.

add_header all BayesScore _BAYES_

Will add a header called X-Spam-BayesScore: with the bayes score to all 
messages.

If you want it in the X-Spam-Status header you can copy the declarations from
10_misc.cf into local.cf, put a clear_headers before them, and edit to your
heart's content.

i know i'm stupid but...

2006-02-17 Thread Zdenko Aka

hi to all,

i've installed SA 3.1.0 on Fedora (qmail) using yum install
spamassassin command.

after that, i've tried to find any detailed manual for setting it up
but no luck...
can you seuggest me any link or give me few tipetricks?

thanks in advance,

zdenko

Re: i know i'm stupid but...

2006-02-17 Thread Jim Maul


Zdenko Aka wrote:

hi to all,

i've installed SA 3.1.0 on Fedora (qmail) using yum install
spamassassin command.

after that, i've tried to find any detailed manual for setting it up
but no luck...
can you seuggest me any link or give me few tipetricks?

thanks in advance,

zdenko




like http://spamassassin.apache.org/doc.html ?

-Jim

Re: i know i'm stupid but...

Zdenko Aka wrote:
 hi to all,
 
 i've installed SA 3.1.0 on Fedora (qmail) using yum install
 spamassassin command.
 
 after that, i've tried to find any detailed manual for setting it up
 but no luck...
 can you seuggest me any link or give me few tipetricks?
 

If you want to call it directly at the qmail level:

http://wiki.apache.org/spamassassin/IntegratedInMta?highlight=%28Integrated%29


Or if you want to have qmail make use of procmail as a MDA you can do:

http://wiki.apache.org/spamassassin/UsedViaProcmail?highlight=%28procmail%29

Re: i know i'm stupid but...

2006-02-17 Thread Jim Maul


Zdenko Aka wrote:

On 2/17/06, Jim Maul [EMAIL PROTECTED] wrote:

Zdenko Aka wrote:

hi to all,

i've installed SA 3.1.0 on Fedora (qmail) using yum install
spamassassin command.

after that, i've tried to find any detailed manual for setting it up
but no luck...
can you seuggest me any link or give me few tipetricks?

thanks in advance,

zdenko



like http://spamassassin.apache.org/doc.html ?


thanks, but i'm not that stupid:-)

i don't know what i have to do after i've installed SA. do i have to
create some files for filtering emails, do i have to activate SA or it
stats automatically, etc...




I didnt mean to imply that you are 'that stupid'. I was more just 
pointing in the general direction of the docs since i wasnt real sure 
what you were looking for specifically.  It sounds like you are looking 
at how to integrate SA into your mail server.  We would need to know 
what your setup is like and what you intend to do with it.


-Jim

Re: Template Tags?





Thanks - that's what I needed.

Matt Kettler wrote:

  Marc Perkel wrote:
  
  
I see a section on template tags but it doesn't show what file these
tags are used in. I'm trying to add the vayes score to the header. How
do I do that?

  
  
with a "add_header" command in local.cf.

add_header all BayesScore _BAYES_

Will add a header called "X-Spam-BayesScore:" with the bayes score to all messages.

If you want it in the X-Spam-Status header you can copy the declarations from
10_misc.cf into local.cf, put a "clear_headers" before them, and edit to your
heart's content.

Re: Over-scoring of SURBL lists...

2006-02-17 Thread mouss

jdow a écrit :
 Rune, there are two canonical means of solving that petty issue. If
 there is someone likely to send you such a message white list her. Or
 simply munge the name, for example http://uri-here-M/uri/.
 

I would like to whitelist all legitimate senders. unfortunately, I don't
have their addresses:)

If I could whitelist non spammers (or blocklist spammers), I wouldn't
need heuristic/bayesian/blahblahmatic filters.

so the argument: whitelist is the answer that I've seen here more than
once is just useless here. once again, the FPs I get are from addresses
I can't whitelist before getting their mail, which gets tagged as spam
before I can whitelist it. ok?

1- friends/collegues/partners/... do change their email address.
2- lost friends can get my email address (from the web or from another
friend)
3- when opt-in a letter, I don't know what addresse they'll use


I understand that things may be easy for some people who will just
ignore email. but let me claim to be from the other part (not sure it's
a minority...).

My approach with SA so far was: if this test generates an FP, look at
it, if it sound silly, disable it. just because some perceptron (or
call it whatever you prefer) gets inputs and religiously generates
scores doesn't mean I should religiously obeit it. if you give more
conditions to the preceptron, it'll generate other results. so let's
stop the hey, the scores are the optimal result of the preceptron.

also, when someone's filter misses spam, the common answer is 'use
SARE. but it seems to me that SARE is not managed (they don't seem to
care about false positive reports). worst, all SA docs suggest using
external rules, without warning about the dangers

the default SA rules use sorbs lists. This is now known to be too
aggressive (lists large ISPs for example, even their duhl isn't safe).
ahem...

Re: Several problems with SA 3.1

2006-02-17 Thread Frank Bures

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, 16 Feb 2006 10:22:30 +0100 (CET), Eduardo Gimeno wrote:

Thanks for the reply. I found the sample .procmailrc file at some
documentation page... I would expect it beign case sensitive to...
Well, then I leave the rule as ^X-Spam-Status: Yes. Anyhow this way it
is working. I wonder why this changed from one day to other...

Procmail recipes are by default case insensitive.  If you want to make them 
case sensitive, you have to include parameter D.


Frank Bures, Dept. of Chemistry, University of Toronto, M5S 3H6
[EMAIL PROTECTED]
http://www.chem.utoronto.ca
PGP public key: http://pgp.mit.edu:11371/pks/lookup?op=indexsearch=Frank+Bures
-BEGIN PGP SIGNATURE-
Version: PGPfreeware 5.0 OS/2 for non-commercial use
Comment: PGP 5.0 for OS/2
Charset: cp850

wj8DBQFD9gpcih0Xdz1+w+wRAuU/AJ9+hF4XAK4YRapJ/h8d2+4XCFqnaQCgvD5/
LC2d1KFA/wJZVytOPkVfiPM=
=hfQx
-END PGP SIGNATURE-

Question on long scan times

2006-02-17 Thread Kevin W. Gagel

I am running spamd/spamc and have spamc launching with -t
55. Yet I'm finding that scans are taking as long as 798
seconds to complete, not alot of them but the question is
why isn't it timing out?

Any suggestions on what to look for?

=
Kevin W. Gagel
Network Administrator
Information Technology Services
(250) 562-2131 local 448
My Blog:
http://mail.cnc.bc.ca/blogs/gagel

---
The College of New Caledonia, Visit us at http://www.cnc.bc.ca
Virus scanning is done on all incoming and outgoing email.
Anti-spam information for CNC can be found at http://avas.cnc.bc.ca
---

Re: Question on long scan times


Kevin W. Gagel wrote:

I am running spamd/spamc and have spamc launching with -t
55. Yet I'm finding that scans are taking as long as 798
seconds to complete, not alot of them but the question is
why isn't it timing out?

Any suggestions on what to look for?


spamc -t only controls the timeout of the spamc client.  spamd continues 
to process the message.


Scan times of 798 seconds are probably a result of a bayes expiry.  If 
auto expiry is enabled (default) I'd disable it and run a manually 
expiry as a cron job.



Daryl

Re: Over-scoring of SURBL lists...

Matt Kettler wrote:

 I'll even re-quote myself:
 I personally would like to see some statistics, but  at this point, we
  don't have any test data on this so we're arguing your theory vs mine.
 And your quote that I was counter-pointing:
 As you can see the performance of the lists are different, and the way 
 they're created is different too.
 
 I don't see enough of a difference to clearly rule out significant overlap.
 
 I'll define my test of significant overlap as:
 10% of total hits redundant across 3 or more lists and 1% nonspam hits
 redundant across 2 or more lists.
 

Messages received today that are double-listed in two or more of SC, JP, AB, OB
and WS:
grep SURBL_MULTI2 /var/log/maillog |grep Feb 17 |wc -l
292

All surbl.org hits in same timeframe (includes ph, but no matter):

grep _SURBL /var/log/maillog |grep Feb 17 |wc -l
583

So we at least have a 50% double-listing rate. That in-and-of-itself isn't much
of a problem, but it also doesn't rule out overlap. It's still a whole lot
higher than my first criteria of 10% overlap

However, right now I don't have more than 100 FPs so I can't really comment on
the nonspam hit rate of SURBL_MULTI2. That's the important one.

I also added multi3, multi4 and another rule to detect overlap between
uribl.com's black and surbl.org:

meta URIBL_BLACK_OVERLAP (URIBL_BLACK  (URIBL_AB_SURBL || URIBL_JP_SURBL ||
URIBL_OB_SURBL || URIBL_WS_SURBL || URIBL_SC_SURBL))
score URIBL_BLACK_OVERLAP -1.0

I'll see what kind of runtime data I can gather based on these rules over the
weekend.

Re: Question on long scan times

Kevin W. Gagel wrote:
 I am running spamd/spamc and have spamc launching with -t
 55. Yet I'm finding that scans are taking as long as 798
 seconds to complete, not alot of them but the question is
 why isn't it timing out?
 
 Any suggestions on what to look for?

If it's just a few, maybe 2 a day, then spamassassin is probably doing a bayes
or AWL database expire at the time of scan.

Re: Question on long scan times

2006-02-17 Thread Mike Jackson


I am running spamd/spamc and have spamc launching with -t
55. Yet I'm finding that scans are taking as long as 798
seconds to complete, not alot of them but the question is
why isn't it timing out?

Any suggestions on what to look for?


spamc -t only controls the timeout of the spamc client.  spamd continues 
to process the message.


Scan times of 798 seconds are probably a result of a bayes expiry.  If 
auto expiry is enabled (default) I'd disable it and run a manually expiry 
as a cron job.


So that raises the question: Is there a way to tell spamd to timeout if it's 
taking too long to process a message?

Re: Question on long scan times

2006-02-17 Thread Kevin W. Gagel

- Original Message -
spamc -t only controls the timeout of the spamc client. 
spamd continues  to process the message.

Scan times of 798 seconds are probably a result of a bayes
expiry.  If  auto expiry is enabled (default) I'd disable
it and run a manually  expiry as a cron job.


Daryl,

Thanks, that explains something. What man page do I read up
on to be able to figure out how to expire the bayes db
manually? I've a quick look but don't see anything jumping
out at me.

=
Kevin W. Gagel
Network Administrator
Information Technology Services
(250) 562-2131 local 448
My Blog:
http://mail.cnc.bc.ca/blogs/gagel

---
The College of New Caledonia, Visit us at http://www.cnc.bc.ca
Virus scanning is done on all incoming and outgoing email.
Anti-spam information for CNC can be found at http://avas.cnc.bc.ca
---

RE: Over-scoring of SURBL lists...

2006-02-17 Thread Chris Santerre

Title: RE: Over-scoring of SURBL lists...

-Original Message-
From: mouss [mailto:[EMAIL PROTECTED]]
Sent: Friday, February 17, 2006 1:28 PM
To: jdow
Cc: users@spamassassin.apache.org
Subject: Re: Over-scoring of SURBL lists...

SNIP

also, when someone's filter misses spam, the common answer is 'use
SARE. but it seems to me that SARE is not managed (they don't seem to
care about false positive reports). worst, all SA docs suggest using
external rules, without warning about the dangers

What?!

We have a forum. How many times have you posted FPs in it? Its more common to post them here on this list. We have always discussed SARE issues here as long as it didn't become too much traffic. It never has, and has completely disapaited over time. Thats not SAREs fault!

Not care about FPs?! Good grief! If you have one OUNCE of an idea of how much testing goes into rules!! WHy do you think the new Stock rules took so long? Old days we'd whip out a new ruleset in 2-3 days! This one took almost 2 months!! SARE is trying to work closer with SA devs. Which means we have to all out eliminate every FP or Theo and his friends will beat us with a 2x4! You ever see JM mad? No, you haven't! Because he takes out his aggression at night, dressing like the pink ninja, and beating on the SARE people :)

SARE, URIBL, and SURBL do not have your mail! We can only act on FPs that are reported.

If you think there is overscoring, then its simple, ZERO out all URIBL/SURBL scores. Use a meta so that if it hits on 2, you score it.

I personally only use ws.surbl.org and black.uribl.com. (I only use the rest for testing.) I can tell you those 2 typically overlap by 85%! And we don't share sources!

--Chris

Re: Question on long scan times - Bayes expire

2006-02-17 Thread Kevin W. Gagel

- Original Message -
Thanks, that explains something. What man page do I read up
on to be able to figure out how to expire the bayes db
manually? I've a quick look but don't see anything jumping
out at me.

Found it in sa-learn, I've setup a cronjob to run it once a
day. We'll see how that does.

I've also adjusted for the UTF-8 problem that has been noted
elsewhere. I'll see what kind of scantimes I get next week
now.


=
Kevin W. Gagel
Network Administrator
Information Technology Services
(250) 562-2131 local 448
My Blog:
http://mail.cnc.bc.ca/blogs/gagel

---
The College of New Caledonia, Visit us at http://www.cnc.bc.ca
Virus scanning is done on all incoming and outgoing email.
Anti-spam information for CNC can be found at http://avas.cnc.bc.ca
---

Re: Question on long scan times - Bayes expire


Kevin W. Gagel wrote:

- Original Message -
Thanks, that explains something. What man page do I read up
on to be able to figure out how to expire the bayes db
manually? I've a quick look but don't see anything jumping
out at me.


Found it in sa-learn, I've setup a cronjob to run it once a
day. We'll see how that does.

I've also adjusted for the UTF-8 problem that has been noted
elsewhere. I'll see what kind of scantimes I get next week
now.


Be sure to disable the auto-expiry in your local.cf with:

bayes_auto_expire 0

spamassassin + spamhaus postfix filtering

2006-02-17 Thread Christer Edwards

Basic info:
spamassassin 3.0.4-2 (ubuntu repo package)
Ubuntu 5.10 (Breezy)
Postfix 2.2.4-1ubuntu2 (ubuntu repo package)

Used in my postfix main/master.cf
smtpd_client_restrictions = reject_rbl_client sbl-xbl.spamhaus.org

This effectively blocks A LOT of attempted spam from known addresses,
which is great.  I do get some that additionally makes it thru
however.  According to spamhaus.org
(http://www.spamhaus.org/effective_filtering.html) it is suggested to:

If using SpamAssassin, we recommend you increase the value of the
SBL-check feature, URIBL_SBL to at least 5 or 6 (by default it's set
to 1 which in most cases is too low to trigger the spam flag).

Can someone tell me where this can be done?  In my searching I haven't
found where that value can be changed?  Also, is this going to mark as
***SPAM*** or (hopefully) DENY anything marked as such?

Thank you for the help.

Re: spamassassin + spamhaus postfix filtering

2006-02-17 Thread Jim Maul


Christer Edwards wrote:

Basic info:
spamassassin 3.0.4-2 (ubuntu repo package)
Ubuntu 5.10 (Breezy)
Postfix 2.2.4-1ubuntu2 (ubuntu repo package)

Used in my postfix main/master.cf
smtpd_client_restrictions = reject_rbl_client sbl-xbl.spamhaus.org

This effectively blocks A LOT of attempted spam from known addresses,
which is great.  I do get some that additionally makes it thru
however.  According to spamhaus.org
(http://www.spamhaus.org/effective_filtering.html) it is suggested to:

If using SpamAssassin, we recommend you increase the value of the
SBL-check feature, URIBL_SBL to at least 5 or 6 (by default it's set
to 1 which in most cases is too low to trigger the spam flag).

Can someone tell me where this can be done?  In my searching I haven't
found where that value can be changed?  Also, is this going to mark as
***SPAM*** or (hopefully) DENY anything marked as such?

Thank you for the help.


http://wiki.apache.org/spamassassin/AdjustRuleScore?highlight=%28score%29%7C%28change%29

SpamAssassin doesnt deny anything.  Its up to whatever is calling 
spamassassin to do this function.


-Jim

RE: Over-scoring of SURBL lists...

2006-02-17 Thread Dallas L. Engelken

 -Original Message-
 From: Matt Kettler [mailto:[EMAIL PROTECTED] 
 Sent: Friday, February 17, 2006 05:14
 To: Dallas L. Engelken
 Cc: users@spamassassin.apache.org
 Subject: Re: Over-scoring of SURBL lists...

 Dallas L. Engelken wrote:
  -Original Message-
  From: Matt Kettler [mailto:[EMAIL PROTECTED]
  Sent: Thursday, February 16, 2006 22:50
  To: Chris Santerre
  Cc: users@spamassassin.apache.org
  Subject: Re: Over-scoring of SURBL lists...

  Chris Santerre wrote:

  Matt Kettler wrote:

  My FPs fall into two categories:

  1) URIs that would likely never appear outside of a specialty 
  newsletter. I've had lots of hits on things like:
  -Authors of programmer's tools
  -producers of electronic parts
  -producers of embedded computer systems (Note: embedded,

  not normal

  computers..
  companies like versalogic.com that make parts that only a kiosk 
  manufacturer or extreme geek would use)

  Agreed. And we have seen these be more JoeJobs. But some

  are not. Some

  simply hire mass emailers thinking they are legit, only 
 to find out 
  they are not. Just because they are legit for you, doesn't

  mean they

  haven't spammed someone else. You ask, we remove.

  Yes, the only problem is that I'm getting tired of having to track 
  down sample emails for FPs so I can find which URI a URIBL FPed on.

  But really, how often or not a URIBL FP's isn't really the 
 point. The 
  point is they DO FP, and it's really quite common for FP's to be 
  multi-listed. That multi-listing wields some hefty score 
 biases, way 
  beyond the power of any other rule in spamassassin other than 
  BLACKLIST_* and GTUBE.

  I merely find it to be a big problem that URIBLs on the 
 general whole 
  are rather FP prone, and prone to cascades of FPs which 
 unleashes 
  havoc from the strong scores the perceptron gave them.

  I think the reason the perceptron gave them such high 
 scores is that 
  a lot of URIBL FP problems get fixed fairly quickly, 
 within a matter 
  of hours. Ditto for a lot of FN problems.

  By the time the mass-checks are run, the URI's in the 
 corpus emails 
  are likely well sorted by the reports given to the URIBLs.

  Sounds like someone's having a bad day ;)

 First, a pre-statement:

 I'm only presenting evidence of accuracy problems in relation 
 to why the URIBLs collectively wield a great deal of power in 
 SpamAssassin scoring.
 I'm not really complaining about uribl.com, I'm complaining 
 about URIBLs as a whole. That's both uribl.com and surbl. 
 Whenever I use the term URIBL in all caps, I mean all URI 
 dns-based blacklists. If you prefer, I'll retract my 
 uribl.com example, and point out that less than an hour 
 later, I got a ws.surbl.org FP.

 And let me remind you.

 Let me remind you, 

 1) you control which uribl's you run
 2) you control how they score

 1)  I'm talking about the default setup of SA 3.1.0 and the 
 perceptron assigned default scores for the URIBLs it uses.. 
 Not customization.
 Default, Stock ,SA 3.1.0 setup. Note that doesn't really 
 involve uribl.com, but does involve surbl and sbl.

 2) I do have serious concerns about the accuracy problems of 
 both surbl.org and uribl.com. Particularly in light of #2. 
 uribl.com presents a larger portion of this problem at my 
 site, but surbl has the same basic problems.

 3) I'm even more concerned about the monoculure of the URIBLs.
 uribl.com's black, surbl.org's ws, sc, jp, ab and ob are all 
 more-or-less the same list. Paul argued against that 
 statement, but in my mind his arguments are weak at best. 
 There IS considerable overlap between these lists. Contrary 
 Paul's statements, you only need to be reported once by a 
 spamcop spamtrap or trusted feed to be on SC. JP monitors 
 18,000 domains, not just two people. AB accepts feeds 
 directly from spamcop and does different analysis on them. 
 Ultimately it is possible for a single copy of an email to 
 cause a listing in uribl_black, SC, WS, JP, and OB all at the 
 same time. It might be possible for that one email to list in 
 AB via spamcop, but I'm not sure if they have a multi-report 
 requirement or not. Sure it's unlikely, but there is enough 
 overlap to have it be possible. If that one email is 
 mis-classified you have a whopper of a FP problem to deal with.

I think that is a benefit of the single list classification in
URIBL.com.  We don't crosslist (ok, we had a small bug that's been
fixed) domains.

 Combinining 1-3 you have a serious problem. Due to 2 FPs are 
 relatively commonplace, and due to 3 any FPs tend to cascade 
 quickly into multiple URIBLs. Due to 1, these rules wield 
 considerable power ( +12) that even BAYES_00 can't put a 
 dent in (-2.599)

 Ultimately my major problem isn't with the URIBLs themselves. 
 My problem is with the structure of the rules in SA 3.1.0 and 
 the outrageously

Re: Over-scoring of SURBL lists...

2006-02-17 Thread Chris Thielen

Matt Kettler wrote:
 Jeff Chan wrote:
   
 There may be some value in not lumping together URIBL.com and
 SURBL.org lists.  As you can see the performance of the lists are
 different, and the way they're created is different too.  That
 makes it harder for us to respond to comments that seem to not
 take those differences into account.  
 
 Did you see Theo's test data from yesterday?

  35.418  41.1930   0.1.000   0.900.00  URIBL_JP_SURBL
  34.665  40.3177   0.1.000   0.880.00  URIBL_SC_SURBL
  26.069  30.3204   0.1.000   0.800.00  URIBL_AB_SURBL
  28.024  32.5464   0.29150.991   0.610.00  URIBL_OB_SURBL
  48.113  55.7492   1.28730.977   0.550.00  URIBL_BLACK
   0.293   0.3406   0.1.000   0.470.00  URIBL_PH_SURBL
   0.000   0.   0.0.500   0.420.00  URIBL_RED
   0.000   0.   0.0.500   0.420.01  T_URIBL_XS_SURBL
  37.539  42.4763   7.26260.854   0.380.00  URIBL_WS_SURBL
   0.548   0.3446   1.79740.161   0.030.00  URIBL_GREY

 I consider that highly similar for JP, SC, AB, OB and WS.

 Also, even if there are some differences, even 10% overlap would have
 the effect I'm talking about.

 I personally would like to see some statistics, but  at this point, we
 don't have any test data on this so we're arguing your theory vs mine.

 I'd love to see some results for some meta tests:

 meta SURBL_MULTI2   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
 URIBL_OB_SURBL+  URIBL_WS_SURBL) 2)
 meta SURBL_MULTI3   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
 URIBL_OB_SURBL+  URIBL_WS_SURBL) 3)
 meta SURBL_MULTI4   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
 URIBL_OB_SURBL+  URIBL_WS_SURBL) 4)
   
I whipped up a short script to calculate these stats on my spam corpus
(realtime data).  First of all, the hit rate is quite impressive.  The
last 3 months I had 67%, 74% and 72% hit rates.  However, it looks like
about 45-50% of the spam hit 4 or 5 SURBL lists.

My ham corpus looked clean of URIBL hits.  Sorry for the ugly formatting. 

Note: the month buckets listed aren't exactly accurate because they use
the Date header sent from the spammer, not the Date received header. 
This should be good enough to get an idea though.


Chris Thielen

Stats for SPAM 38 months old:
0: 98.5% ( 268 / 272 )
1: 0.0% ( 0 / 272 )
2: 0.0% ( 0 / 272 )
3: 0.7% ( 2 / 272 )
4: 0.0% ( 0 / 272 )
5: 0.7% ( 2 / 272 )
6: 0.0% ( 0 / 272 )
Stats for SPAM 37 months old:
0: 96.6% ( 281 / 291 )
1: 0.7% ( 2 / 291 )
2: 0.0% ( 0 / 291 )
3: 0.3% ( 1 / 291 )
4: 1.4% ( 4 / 291 )
5: 1.0% ( 3 / 291 )
6: 0.0% ( 0 / 291 )
Stats for SPAM 36 months old:
0: 96.5% ( 277 / 287 )
1: 0.7% ( 2 / 287 )
2: 0.3% ( 1 / 287 )
3: 0.3% ( 1 / 287 )
4: 1.0% ( 3 / 287 )
5: 1.0% ( 3 / 287 )
6: 0.0% ( 0 / 287 )
Stats for SPAM 35 months old:
0: 97.5% ( 234 / 240 )
1: 0.4% ( 1 / 240 )
2: 0.4% ( 1 / 240 )
3: 0.0% ( 0 / 240 )
4: 0.8% ( 2 / 240 )
5: 0.8% ( 2 / 240 )
6: 0.0% ( 0 / 240 )
Stats for SPAM 34 months old:
0: 39.5% ( 118 / 299 )
1: 11.7% ( 35 / 299 )
2: 11.7% ( 35 / 299 )
3: 11.0% ( 33 / 299 )
4: 25.8% ( 77 / 299 )
5: 0.3% ( 1 / 299 )
6: 0.0% ( 0 / 299 )
Stats for SPAM 33 months old:
0: 24.0% ( 76 / 317 )
1: 20.8% ( 66 / 317 )
2: 11.7% ( 37 / 317 )
3: 12.0% ( 38 / 317 )
4: 30.9% ( 98 / 317 )
5: 0.6% ( 2 / 317 )
6: 0.0% ( 0 / 317 )
Stats for SPAM 32 months old:
0: 23.6% ( 66 / 280 )
1: 18.2% ( 51 / 280 )
2: 13.6% ( 38 / 280 )
3: 13.2% ( 37 / 280 )
4: 30.7% ( 86 / 280 )
5: 0.7% ( 2 / 280 )
6: 0.0% ( 0 / 280 )
Stats for SPAM 31 months old:
0: 27.4% ( 80 / 292 )
1: 9.2% ( 27 / 292 )
2: 10.6% ( 31 / 292 )
3: 19.9% ( 58 / 292 )
4: 32.9% ( 96 / 292 )
5: 0.0% ( 0 / 292 )
6: 0.0% ( 0 / 292 )
Stats for SPAM 30 months old:
0: 27.4% ( 83 / 303 )
1: 14.9% ( 45 / 303 )
2: 14.9% ( 45 / 303 )
3: 10.6% ( 32 / 303 )
4: 32.3% ( 98 / 303 )
5: 0.0% ( 0 / 303 )
6: 0.0% ( 0 / 303 )
Stats for SPAM 29 months old:
0: 27.1% ( 82 / 303 )
1: 13.5% ( 41 / 303 )
2: 11.6% ( 35 / 303 )
3: 15.8% ( 48 / 303 )
4: 19.8% ( 60 / 303 )
5: 12.2% ( 37 / 303 )
6: 0.0% ( 0 / 303 )
Stats for SPAM 28 months old:
0: 14.4% ( 40 / 277 )
1: 11.9% ( 33 / 277 )
2: 17.7% ( 49 / 277 )
3: 15.2% ( 42 / 277 )
4: 16.6% ( 46 / 277 )
5: 24.2% ( 67 / 277 )
6: 0.0% ( 0 / 277 )
Stats for SPAM 27 months old:
0: 18.3% ( 56 / 306 )
1: 9.2% ( 28 / 306 )
2: 18.6% ( 57 / 306 )
3: 15.4% ( 47 / 306 )
4: 13.7% ( 42 / 306 )
5: 24.8% ( 76 / 306 )
6: 0.0% ( 0 / 306 )
Stats for SPAM 26 months old:
0: 21.8% ( 49 / 225 )
1: 10.2% ( 23 / 225 )
2: 20.0% ( 45 / 225 )
3: 14.2% ( 32 / 225 )
4: 12.0% ( 27 / 225 )
5: 21.8% ( 49 / 225 )
6: 0.0% ( 0 / 225 )
Stats for SPAM 25 months old:
0: 22.2% ( 59 / 266 )
1: 13.9% ( 37 / 266 )
2: 19.2% ( 51 / 266 )
3: 13.2% ( 35 / 266 )
4: 18.0% ( 48 / 266 )
5: 13.5% ( 36 / 266 )
6: 0.0% ( 0 / 266 )
Stats for SPAM 24 months old:
0: 20.4% ( 51 / 250 )
1: 13.2% ( 33 / 250 )
2: 17.6% ( 44 / 250 )
3: 16.8% ( 42 / 250 )
4: 14.0% ( 35 / 250 )
5: 18.0% ( 45 / 250 )
6: 0.0% ( 0 / 250 )
Stats for SPAM

Re: Over-scoring of SURBL lists...


Dallas L. Engelken wrote:

The result will be no URIBL only FPs.  OTOH, you may end up with a
shit-ton of people bitching about spam accuracy dropping in stock 3.2
installs if you make these changes.  


I'm not sure it'd be *that* bad.

A grep of my logs from this week shows that 1.1% of my spam scores under 
a score of 8 and only 13% of those spams hit *any* URIBLs.


So yeah, there'd be more FNs, but I'm not sure that it'd a shit-ton of them.


Daryl

Re: Over-scoring of SURBL lists...

2006-02-17 Thread DAve


Daryl C. W. O'Shea wrote:

Dallas L. Engelken wrote:


The result will be no URIBL only FPs.  OTOH, you may end up with a
shit-ton of people bitching about spam accuracy dropping in stock 3.2
installs if you make these changes.  



I'm not sure it'd be *that* bad.

A grep of my logs from this week shows that 1.1% of my spam scores under 
a score of 8 and only 13% of those spams hit *any* URIBLs.


So yeah, there'd be more FNs, but I'm not sure that it'd a shit-ton of 
them.



Daryl


It would, an imperial shit-ton. I would have clients backed up in the 
queue before lunch the day I upgraded. That is assuming I didn't modify 
the scores after the upgrade to put things back, which I would.


I would have had to replace my spamd server at least twice in the last 
two years if it were not for URIBL. Prior to it's introduction I ran 
approx (been a awhile, hard to remember) 25 to 45 sets of rules. I 
checked my spam pots three times a week and spent 10 to 15 hours a week 
writing custom rules. Now I kill score a URIBL and run just a few rules 
as needed.


Yes, I kill messages with a single URIBL hit(not on grey). I rarely 
check for FP any more, my clients will let me know immediately if there 
is one. When there is an FP the clients just login to webmail, whitelist 
the sender, problem solved.


Frank gets all the wildthangs.com mail he can handle due to his personal 
whitelist entry, and Mable never has to deal with 
partywithyourpantson.ru thanks to URIBL.


I find the discussion interesting from a technical standpoint. But I 
don't see it as a unsurmountable issue. Juggle the scores in local.cf, 
setup per user prefs, whitelist globally. Lots of solutions exist.


Just my two cents (if it is even worth that). I'll bow out to the 
smarter types now.


DAve

Re: Over-scoring of SURBL lists...

Daryl C. W. O'Shea wrote:
 Dallas L. Engelken wrote:
 The result will be no URIBL only FPs.  OTOH, you may end up with a
 shit-ton of people bitching about spam accuracy dropping in stock 3.2
 installs if you make these changes.  
 
 I'm not sure it'd be *that* bad.
 
 A grep of my logs from this week shows that 1.1% of my spam scores under
 a score of 8 and only 13% of those spams hit *any* URIBLs.
 
 So yeah, there'd be more FNs, but I'm not sure that it'd a shit-ton of
 them.
 

Well, you have to realize though that the impact of a quad-listed URI is on the
order of 15 points. So you could still have plenty of spams that are only
hitting URIBLs.

RE: Over-scoring of SURBL lists...

2006-02-17 Thread Dallas Engelken

 -Original Message-
 From: Matt Kettler [mailto:[EMAIL PROTECTED] 
 Sent: Friday, February 17, 2006 18:47
 To: Matt Kettler
 Cc: Jeff Chan; users@spamassassin.apache.org
 Subject: Re: Over-scoring of SURBL lists...

 Matt Kettler wrote:

  I'll even re-quote myself:
  I personally would like to see some statistics, but  at 
 this point, 
  we  don't have any test data on this so we're arguing your 
 theory vs mine.
  And your quote that I was counter-pointing:
  As you can see the performance of the lists are different, 
 and the way they're created is different too.

  I don't see enough of a difference to clearly rule out 
 significant overlap.

  I'll define my test of significant overlap as:
  10% of total hits redundant across 3 or more lists and 1% nonspam 
  hits
  redundant across 2 or more lists.

 Messages received today that are double-listed in two or more 
 of SC, JP, AB, OB and WS:
 grep SURBL_MULTI2 /var/log/maillog |grep Feb 17 |wc -l
 292

 All surbl.org hits in same timeframe (includes ph, but no matter):

 grep _SURBL /var/log/maillog |grep Feb 17 |wc -l
 583

 So we at least have a 50% double-listing rate. That 
 in-and-of-itself isn't much of a problem, but it also doesn't 
 rule out overlap. It's still a whole lot higher than my first 
 criteria of 10% overlap

 However, right now I don't have more than 100 FPs so I can't 
 really comment on the nonspam hit rate of SURBL_MULTI2. 
 That's the important one.

 I also added multi3, multi4 and another rule to detect 
 overlap between uribl.com's black and surbl.org:

 meta URIBL_BLACK_OVERLAP (URIBL_BLACK  (URIBL_AB_SURBL || 
 URIBL_JP_SURBL || URIBL_OB_SURBL || URIBL_WS_SURBL || 
 URIBL_SC_SURBL)) score URIBL_BLACK_OVERLAP -1.0

if anyone is interested, here is an alternative scoring method for
25_uribl.cf - http://www.uribl.com/tools/25_uribl.cf (make sure you wipe
out the scores for uribl tests in 50_scores.cf if you replace this file).

This should make SBL/URIBL/SURBL hits range in score from 2.0 to 5.5... 

- 2.0 (SBL ONLY) 
- 2.5 (URIBL_ONLY)
- 2.5 (SURBL_ONLY)
- 3.0 (SBL + URIBL)
- 3.0 (SBL + SURBL)
- 3.0 (SURBL_ONLY x2)
- 4.0 (URIBL + SURBL)
- 5.0 (SBL + URIBL + SURBL)
- 5.5 (SBL + URIBL + SURBLx2)

If you want to reduce the possibility of URIBL-only FPs, this is the way to
go.  

D

Re: Over-scoring of SURBL lists...


Matt Kettler wrote:

Daryl C. W. O'Shea wrote:

Dallas L. Engelken wrote:

The result will be no URIBL only FPs.  OTOH, you may end up with a
shit-ton of people bitching about spam accuracy dropping in stock 3.2
installs if you make these changes.  

I'm not sure it'd be *that* bad.

A grep of my logs from this week shows that 1.1% of my spam scores under
a score of 8 and only 13% of those spams hit *any* URIBLs.

So yeah, there'd be more FNs, but I'm not sure that it'd a shit-ton of
them.



Well, you have to realize though that the impact of a quad-listed URI is on the
order of 15 points. So you could still have plenty of spams that are only
hitting URIBLs.


Ah, true.  In that case it raises the possibility of URIBLs being 
deterministic to 1.8% of my spam.



Anyway... I think a solution may be to write meta rules that result in 
each of the current rules (effectively) keeping their current scores, 
but using only the maximum of all of them in the final total.


ie.

If URIBL_LIST1 has a score of 2 and URIBL_LIST2 has a score of 4:

score   URIBL_LIST1 0.001
score   URIBL_LIST2 0.001


metaURIBL_LIST_HIGHEST_LIST1(URIBL_LIST1  !URIBL_LIST2)
score   URIBL_LIST_HIGHEST_LIST12

metaURIBL_LIST_HIGHEST_LIST2(URIBL_LIST2  !URIBL_LIST1)
score   URIBL_LIST_HIGHEST_LIST24


...a bit of a pain to update, but might be more effective.


Daryl

RE: Over-scoring of SURBL lists...

2006-02-17 Thread Dallas Engelken

 -Original Message-
 From: Daryl C. W. O'Shea [mailto:[EMAIL PROTECTED] 
 Sent: Friday, February 17, 2006 21:34
 To: Dallas L. Engelken
 Cc: users@spamassassin.apache.org
 Subject: Re: Over-scoring of SURBL lists...

 Dallas L. Engelken wrote:
  The result will be no URIBL only FPs.  OTOH, you may end up with a 
  shit-ton of people bitching about spam accuracy dropping in 
 stock 3.2 
  installs if you make these changes.

 I'm not sure it'd be *that* bad.

 A grep of my logs from this week shows that 1.1% of my spam 
 scores under a score of 8 and only 13% of those spams hit 
 *any* URIBLs.

 So yeah, there'd be more FNs, but I'm not sure that it'd a 
 shit-ton of them.

All I know is I've had a few system bit by 
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4767

And when this happens, I hear about it, because people are complaining about
getting a shit-ton of spam.  And all just run SBL/URIBL/SURBL.  No other
RBLs.   Now I understand loosing URIBL tests completely versus scoring
between say 2 and 5.5 are completely different things... but I still believe
a considerable increase will be seen in FN's.

Which phone call would you rather have?

  Q)  My client is trying to send me email and its being rejected because
they are listed on URIBL, what should I do?  
  A)  whitelist the sender or request delisting from the URIBL

  Q)  How do I block all this stock, pill, porn, etc spam that started
coming in since we upgraded?
  A)  Well, you can re-adjust your heuristic scoring for your URIBL tests
back to their previous values, let me walk you through that.. 

The first question can be answered in 1 minute.  

The second question OTOH could take you a considerable amount of time.
Especially if you have to do it for them.  Oh, and you have to wait for them
to reconfigure their PIX (that they don't know how to administer) before you
can get in, and they want you to wait on the line until they figure it out.
Meanwhile client X, Y, and Z are waiting for you to get off the phone so you
can do the same thing for them ;)

All I'm saying here is, I'll take the easy route :)

Dallas

Re: Over-scoring of SURBL lists...

2006-02-17 Thread jdow


From: Jeff Chan [EMAIL PROTECTED]

On Friday, February 17, 2006, 7:19:50 AM, Matt Kettler wrote:

Jeff Chan wrote:

On Thursday, February 16, 2006, 9:13:36 PM, Matt Kettler wrote:
  

I'm only presenting evidence of accuracy problems in relation to why the
URIBLs collectively wield a great deal of power in SpamAssassin scoring.
I'm not really complaining about uribl.com, I'm complaining about URIBLs
as a whole. That's both uribl.com and surbl. Whenever I use the term
URIBL in all caps, I mean all URI dns-based blacklists. If you prefer,
I'll retract my uribl.com example, and point out that less than an hour
later, I got a ws.surbl.org FP.



There may be some value in not lumping together URIBL.com and
SURBL.org lists.  As you can see the performance of the lists are
different, and the way they're created is different too.  That
makes it harder for us to respond to comments that seem to not
take those differences into account.
  

Did you see Theo's test data from yesterday?


Yes.  I was referring lumping URIBL.com with SURBL.org mostly.


 35.418  41.1930   0.1.000   0.900.00  URIBL_JP_SURBL
 34.665  40.3177   0.1.000   0.880.00  URIBL_SC_SURBL
 26.069  30.3204   0.1.000   0.800.00  URIBL_AB_SURBL
 28.024  32.5464   0.29150.991   0.610.00  URIBL_OB_SURBL
 48.113  55.7492   1.28730.977   0.550.00  URIBL_BLACK
  0.293   0.3406   0.1.000   0.470.00  URIBL_PH_SURBL
  0.000   0.   0.0.500   0.420.00  URIBL_RED
  0.000   0.   0.0.500   0.420.01  T_URIBL_XS_SURBL
 37.539  42.4763   7.26260.854   0.380.00  URIBL_WS_SURBL
  0.548   0.3446   1.79740.161   0.030.00  URIBL_GREY



I consider that highly similar for JP, SC, AB, OB and WS.


As similar as 30 and 40, and 0, .3 and 7 are, I suppose.


Heh, yeah...

On another paw how independent are these lists? Do any inherit from other
lists or are they all separately maintained?

It may be that the right way to use these lists for some circumstances
is to have meta rules that add points to lowered scores when more than
two and more than three BLs all hit together.

{^_^}

Re: Over-scoring of SURBL lists...

2006-02-17 Thread Raymond Dijkxhoorn


Hi!


  0.293   0.3406   0.1.000   0.470.00  URIBL_PH_SURBL
  0.000   0.   0.0.500   0.420.00  URIBL_RED
  0.000   0.   0.0.500   0.420.01  T_URIBL_XS_SURBL
 37.539  42.4763   7.26260.854   0.380.00  URIBL_WS_SURBL
  0.548   0.3446   1.79740.161   0.030.00  URIBL_GREY



I consider that highly similar for JP, SC, AB, OB and WS.


As similar as 30 and 40, and 0, .3 and 7 are, I suppose.



On another paw how independent are these lists? Do any inherit from other
lists or are they all separately maintained?


They use different datasources and no cross links between them. If there 
is a real nasty one we could/would talk about it on the private list but 
thats really sporadic.


Bye,
Raymond.

Re: Over-scoring of SURBL lists...

2006-02-17 Thread mouss

Chris Santerre a écrit :

-Original Message-
From: mouss [mailto:[EMAIL PROTECTED]
Sent: Friday, February 17, 2006 1:28 PM
To: jdow
Cc: users@spamassassin.apache.org
Subject: Re: Over-scoring of SURBL lists...

 SNIP

also, when someone's filter misses spam, the common answer is 'use
SARE. but it seems to me that SARE is not managed (they don't seem to
care about false positive reports). worst, all SA docs suggest using
external rules, without warning about the dangers

 What?!

 We have a forum.

Forums are offline.. Please use the mailling list.
http://lists.maddoc.net/mailman/listinfo/sare-users;

I indeed could subscribe there. sorry. my fault.

 How many times have you posted FPs in it? Its more common
 to post them here on this list. We have always discussed SARE issues here as
 long as it didn't become too much traffic. It never has, and has completely
 disapaited over time. Thats not SAREs fault! 

ok. sorry. will use the forum | sare-users.

Re: Over-scoring of SURBL lists...

Raymond Dijkxhoorn wrote:
 Hi!


 I consider that highly similar for JP, SC, AB, OB and WS.

 As similar as 30 and 40, and 0, .3 and 7 are, I suppose.
 
 On another paw how independent are these lists? Do any inherit from
 other
 lists or are they all separately maintained?
 
 They use different datasources and no cross links between them. If there
 is a real nasty one we could/would talk about it on the private list but
 thats really sporadic.

Untrue. AB and SC use a common data source, spamcop reports. However, each has
it's own processing/listing criteria and each is separately maintained.

And, realistically, since WS and uribl accept direct reports from more-or-less
anyone, their data sources could be redundant with any other URIBLs depending on
what the

It's really straight forward for an end-user to report the email to spamcop,
then report the spamverized URI to WS and URIBL_BLACK via web forms.

Pickup on surbl's SC list appears to involve multiple reports to spamcop, but
there's still potential for common inputs.

Let's see a show of hands.. How many people here have ever filed a spam report
with multiple lists, including doing spamcop + either WS or URIBL.

(raises own hand)

RE: Over-scoring of SURBL lists...

2006-02-17 Thread Dallas Engelken

 -Original Message-
 From: Matt Kettler [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, February 18, 2006 00:05
 To: Raymond Dijkxhoorn
 Cc: jdow; users@spamassassin.apache.org
 Subject: Re: Over-scoring of SURBL lists...

 Raymond Dijkxhoorn wrote:
  Hi!

  I consider that highly similar for JP, SC, AB, OB and WS.

  As similar as 30 and 40, and 0, .3 and 7 are, I suppose.

  On another paw how independent are these lists? Do any 
 inherit from 
  other lists or are they all separately maintained?

  They use different datasources and no cross links between them. If 
  there is a real nasty one we could/would talk about it on 
 the private 
  list but thats really sporadic.

 Untrue. AB and SC use a common data source, spamcop reports. 
 However, each has it's own processing/listing criteria and 
 each is separately maintained.

 And, realistically, since WS and uribl accept direct reports 
 from more-or-less anyone, their data sources could be 
 redundant with any other URIBLs depending on what the

 It's really straight forward for an end-user to report the 
 email to spamcop, then report the spamverized URI to WS and 
 URIBL_BLACK via web forms.

 Pickup on surbl's SC list appears to involve multiple reports 
 to spamcop, but there's still potential for common inputs.

 Let's see a show of hands.. How many people here have ever 
 filed a spam report with multiple lists, including doing 
 spamcop + either WS or URIBL.

 (raises own hand)

FWIW, web submissions account for less than 1% (119 of 12652 listings) of
URIBL data for the last 7 days.  All submissions are reviewed, so I find it
hard to believe that the FPs are coming in via this mechanism.. seeing that
a human reports it (i hope) and a human reviews it.   From what I see, FPs
normally come from automation and over zealous mass adds.

D

RE: Over-scoring of SURBL lists...

2006-02-17 Thread Matthew.van.Eerde

Matt Kettler wrote:
 On another paw how independent are these lists? Do any inherit
 from other lists or are they all separately maintained?
 
 They use different datasources and no cross links between them. If
 there is a real nasty one we could/would talk about it on the
 private list but thats really sporadic.
 
 Untrue. AB and SC use a common data source, spamcop reports. However,
 each has it's own processing/listing criteria and each is separately
 maintained. 

It's not particularly important how many URLs the lists have in common.  What 
is important is how many *false positives* the lists have in common... or more 
to the point, whether a given good URL is more likely to be on (say) JP given 
that it's on (say) SC.

If
P(JP | SC ^ good)  P(JP | good)

then your point is accurate, and the rules should be re-scored.

Otherwise the rules are fine.

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer

Re: Template Tags?


Odd - no _RAZOR_ tag to return the confidence level?

Re: Over-scoring of SURBL lists...

On Friday, February 17, 2006, 3:36:07 PM, jdow jdow wrote:
 On another paw how independent are these lists? Do any inherit from other
 lists or are they all separately maintained?

The different SURBL lists are all separately maintained.  Only AB
and SC share a data source, namely SpamCop user Spamvertised site
reports.  But those reports are processed differently and
separately. 

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Re: Over-scoring of SURBL lists...

On Friday, February 17, 2006, 4:04:42 PM, Matt Kettler wrote:
 Raymond Dijkxhoorn wrote:

 I consider that highly similar for JP, SC, AB, OB and WS.

 As similar as 30 and 40, and 0, .3 and 7 are, I suppose.
 
 On another paw how independent are these lists? Do any inherit from
 other
 lists or are they all separately maintained?
 
 They use different datasources and no cross links between them. If there
 is a real nasty one we could/would talk about it on the private list but
 thats really sporadic.

 Untrue. AB and SC use a common data source, spamcop reports. However, each has
 it's own processing/listing criteria and each is separately maintained.

AB and SC is the only exception in that they're both based on
SpamCop spamvertised site reports, but as you correctly note,
they're processed differently.

 And, realistically, since WS and uribl accept direct reports from more-or-less
 anyone, their data sources could be redundant with any other URIBLs depending 
 on
 what the

 It's really straight forward for an end-user to report the email to spamcop,
 then report the spamverized URI to WS and URIBL_BLACK via web forms.

Dallas said less than 1% of the URIBL records come from user
reports.  Unless the SARE ninjas are still processing the public
reports into WS, which I doubt, far fewer public reports get onto
WS.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/

Re: Over-scoring of SURBL lists...