Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Thursday, December 9, 2004, 8:14:16 AM, Larry Rosenbaum wrote:
> By the way, if you have a message that's been forwarded in such a way
> that the original recipient addresses become part of the message text,
> the URI extraction code will extract these too.  Therefore, if you get
> one of those "forward this to everyone you know" messages, it could
> result in a lot of SURBL lookups.

Code using SURBLs is supposed to look for URIs, which
message headers don't usually look like.

  http://www.surbl.org/implementation.html

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



RE: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Rosenbaum, Larry M.

> -Original Message-
> From: Jeff Chan [mailto:[EMAIL PROTECTED]
> Posted At: Wednesday, December 08, 2004 8:45 PM
> Posted To: sa-users
> Conversation: Feature Request: Whitelist_DNSRBL
> Subject: Re: Feature Request: Whitelist_DNSRBL
> 
> On Wednesday, December 8, 2004, 8:47:18 AM, Larry Rosenbaum wrote:
> > How about a way to use wildcards with uridnsbl_skip_domain?  I'd
like to
> > be able to tell the SURBL code not to look up
> 
> > *.gov
> > *.mil
> > *.edu
> > and even *.??.us
> 
> > since these are unlikely to be hosting spammer web pages.
> 
> True, but most people believe that whitelisting entire
> TLDs is too broad, and I agree.
> 
> Jeff C.

I understand, which is why I suggested a configuration option rather
than hardwiring the TLDs to skip into the code.  We exchange a lot of
mail with folks in these domains, so there is likely to be an upside in
not having to look up any .gov or .mil addresses that appear in the
message.  And if we do get spam advertising .gov or .mil web addresses,
there's something very wrong going on and we can report it.  Most other
email admins won't see the same tradeoffs.

By the way, if you have a message that's been forwarded in such a way
that the original recipient addresses become part of the message text,
the URI extraction code will extract these too.  Therefore, if you get
one of those "forward this to everyone you know" messages, it could
result in a lot of SURBL lookups.

>From  Thu Dec  9 11:17:29 2004
Return-Path: <>
Received: from emroute1.cind.ornl.gov (localhost [127.0.0.1])
 by emroute1.cind.ornl.gov (PMDF V6.2-X27 #30899)
 with ESMTP id <[EMAIL PROTECTED]> for
 [EMAIL PROTECTED] (ORCPT [EMAIL PROTECTED]); Thu,
 09 Dec 2004 11:17:29 -0500 (EST)
Received: from www2.ornl.gov (www2.ornl.gov [160.91.4.32])
 by emroute1.cind.ornl.gov (PMDF V6.2-X27 #30899)
 with ESMTP id <[EMAIL PROTECTED]> for
 [EMAIL PROTECTED] (ORCPT [EMAIL PROTECTED]); Thu,
 09 Dec 2004 11:17:12 -0500 (EST)
Received: from PROCESS-DAEMON.www2.ornl.gov by www2.ornl.gov
 (PMDF V6.2-1 #31038) id <[EMAIL PROTECTED]> for
 [EMAIL PROTECTED]; Thu, 09 Dec 2004 11:05:14 -0500 (EST)
Received: from www2.ornl.gov (PMDF V6.2-1 #31038)
 id <[EMAIL PROTECTED]>; Thu, 09 Dec 2004 10:12:34 -0500 (EST)
Date: Thu, 09 Dec 2004 10:12:34 -0500 (EST)
From: PMDF Internet Messaging <[EMAIL PROTECTED]>
Subject: Successful page transmission
In-reply-to: "Your message dated Thu, 09 Dec 2004 10:11:56 -0500 (EST)"
 <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Message-id: <[EMAIL PROTECTED]>
MIME-version: 1.0
Content-type: multipart/report;
 boundary="Boundary_(ID_Z1dAJ1KMtcGZAOKBpgNNIg)"; report-type=delivery-status


--Boundary_(ID_Z1dAJ1KMtcGZAOKBpgNNIg)
Content-type: text/plain; charset=us-ascii
Content-language: EN-US
Content-transfer-encoding: 7BIT

This report relates to a message you sent with the following header fields:

  Message-id: <[EMAIL PROTECTED]>
  Date: Thu, 09 Dec 2004 10:11:56 -0500 (EST)
  From: [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Subject: test

Your message has been successfully delivered to the following recipients:

  Recipient address: [EMAIL PROTECTED]
  Reason: Transmission of your pages has been completed: delivered in 1 part
Subject: test
2 transmission attempts were made

The content of the pages follow.



F:root S:test M:Testing paging on www2




--Boundary_(ID_Z1dAJ1KMtcGZAOKBpgNNIg)
Content-type: message/delivery-status

Reporting-MTA: dns;www2.ornl.gov (cingularpage.ornl.gov)

Action: delivered
Status: 2.1.5
 (Transmission of your pages has been completed: delivered in 1 part)
Original-recipient: rfc822;[EMAIL PROTECTED]
Final-recipient: rfc822;[EMAIL PROTECTED]

--Boundary_(ID_Z1dAJ1KMtcGZAOKBpgNNIg)
Content-type: TEXT/RFC822-HEADERS

Return-path: <[EMAIL PROTECTED]>
Received: from cingularpage.ornl.gov by www2.ornl.gov (PMDF V6.2-1 #31038)
 id <[EMAIL PROTECTED]>; Thu,  9 Dec 2004 10:12:34 -0500 (EST)
Received: from emroute1.cind.ornl.gov (smtp.ornl.gov [160.91.4.119])
 by www2.ornl.gov (PMDF V6.2-1 #31038)
 with ESMTP id <[EMAIL PROTECTED]> for
 [EMAIL PROTECTED]; Thu, 09 Dec 2004 10:11:57 -0500 (EST)
Received: from emroute1.cind.ornl.gov by emroute1.cind.ornl.gov
 (PMDF V6.2-X27 #30899) id <[EMAIL PROTECTED]> for
 [EMAIL PROTECTED]; Thu, 09 Dec 2004 10:11:56 -0500 (EST)
Date: Thu, 09 Dec 2004 10:11:56 -0500 (EST)
From: [EMAIL PROTECTED]
Subject: test
To: [EMAIL PROTECTED]
Message-id: <[EMAIL PROTECTED]>
MIME-version: 1.0
Content-type: TEXT/PLAIN
Content-transfer-encoding: QUOTED-PRINTABLE



--Boundary_(ID_Z1dAJ1KMtcGZAOKBpgNNIg)--



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 11:41:41 PM, hamann w wrote:
>>> How about a way to use wildcards with uridnsbl_skip_domain?  I'd like to
>>> be able to tell the SURBL code not to look up
>>> 
>>> *.gov
>>> *.mil
>>> *.edu
>>> and even *.??.us
>>> 
>>> since these are unlikely to be hosting spammer web pages.

> I have received obscure web traffic from a .mil site recently - it looked 
> like an infected
> windows box trying to inflict pain on windows web server
> (or would visitors from .mil sites conduct a "vulnerability scan" on remote 
> sites before they
> view them?)

That's bad, but remember that SURBLs are usually used to
check message body URIs and not sender domains.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



RE: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread hamann . w
>> How about a way to use wildcards with uridnsbl_skip_domain?  I'd like to
>> be able to tell the SURBL code not to look up
>> 
>> *.gov
>> *.mil
>> *.edu
>> and even *.??.us
>> 
>> since these are unlikely to be hosting spammer web pages.
>> 
>> Larry
>> 
>> 

Hi,

I have received obscure web traffic from a .mil site recently - it looked like 
an infected
windows box trying to inflict pain on windows web server
(or would visitors from .mil sites conduct a "vulnerability scan" on remote 
sites before they
view them?)

Wolfgang Hamann





Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Daryl C. W. O'Shea
Jeff Chan wrote:
> On Wednesday, December 8, 2004, 9:06:26 AM, Daryl O'Shea wrote:
>>It doesn't cause more lookups for anyone.  A local white list file would
>>reduces lookups at the expense of process size (and time if the white
>>list is very large).
>
>
> The SA developers chose an appropriately small exclusion list
> to hard code as the top 125 most often hit whitelist entries.
> Those top hits are largely invariant and would represent a
> large portion of the DNS queries if not excluded.  It doesn't
> make much sense to serve up a small, nearly invariant list
> with a DNS list, long TTLs or not.
>
> Jeff C.
Yes, as I noted later in the thread.
"There's got to be a reason why SpamAssassin currently only includes the 
top 100 or whatever excluded domains... either the rest of the data 
wasn't useful or it wasn't worth the performance hit having them in memory."

I only suggested another solution to what Chris was suggesting (having 
Rules-du-jour style (assumedly) masssive .cf file exlusions lists... 
which in my opinion aren't appropriate (massive lists that is) due to 
the memory overhead.

I'm fully aware, as I think everyone is now, of the exlusion list 
included with 3.0.

Daryl


Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 9:49:55 AM, Daryl O'Shea wrote:
> Additionally, assuming there isn't an extreme query frequency drop off
> after the top 100 or 200 excluded domains, it would be nice to have 
> access to the rest of the exclusion list which wouldn't be realistic to 
> be storing (and currently copying around) in memory.

> There's got to be a reason why SpamAssassin currently only includes the 
> top 100 or whatever excluded domains... either the rest of the data
> wasn't useful or it wasn't worth the performance hit having them in memory.

I believe the 125 cutoff was entirely arbitrary, but it happens to
correspond almost exactly with the 50th percentile of DNS queries
against whitelisted domains, which is a happy coincidence and
a perfectly reasonable cut off point.

> New additions to the exclusion list would immediately be available too, 
> not that that is really a huge concern.

Remember that the only reason to build this hard-coded exclusion
list into SA was to prevent unnecessary DNS queries from
happening in the first place:

  http://bugzilla.spamassassin.org/show_bug.cgi?id=3805

The much larger global whitelist is applied internally in
SURBLs to prevent those domains from ever getting listed.
It is an exclusion list there.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 9:21:37 AM, Chris Santerre wrote:
> My whole idea was skipping the lookup entirley. Why would you want to do a
> lookup for google even if it is cached? 

Yep it's a good idea.  Which is why we're already doing it.  ;-)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 9:06:26 AM, Daryl O'Shea wrote:
> Bill Landry wrote:

>  >> From: "Chris Santerre" <[EMAIL PROTECTED]>
>  >>
>  >> Well we have talked about it and  didn't come up with a solid
>  >> answer. The idea would cause more lookups and time for those who
>  >> don't cache dns.

> It doesn't cause more lookups for anyone.  A local white list file would 
> reduces lookups at the expense of process size (and time if the white 
> list is very large).

The SA developers chose an appropriately small exclusion list
to hard code as the top 125 most often hit whitelist entries.
Those top hits are largely invariant and would represent a
large portion of the DNS queries if not excluded.  It doesn't
make much sense to serve up a small, nearly invariant list
with a DNS list, long TTLs or not.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 9:07:44 AM, Chris Santerre wrote:
> Actually I was only saying to list the top look ups from the whitelist, not
> the 66,500. That is more of a research and exclusion tool. So no more then
> 200-300 domains. Check it every month for changes and update. 

This is already answered in other messages, but the top 125
most often hit SURBL whitelisted domains are currently listed
in the default 25_uribl.cf file:

  http://spamassassin.apache.org/full/3.0.x/dist/rules/25_uribl.cf

# Top 125 domains whitelisted by SURBL
uridnsbl_skip_domain yahoo.com w3.org msn.com com.com yimg.com
uridnsbl_skip_domain hotmail.com doubleclick.net flowgo.com ebaystatic.com 
aol.com
[...]

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 8:33:11 AM, Bill Landry wrote:
> Actually, I was thinking of the whitelist that Jeff has already compiled at
> http://spamcheck.freeapp.net/whitelist-domains.sort (currently over 66,500
> whitelisted domains).  If you set a long TTL on the query responses, it
> would certainly cut down on follow-up queries for anyone that is running a
> caching dns.  It would also be a lot less resource intensive then trying to
> run a local whitelist.cf of over 66,500 whitelisted domains.

That list includes a large majority (52 thousand) of geographic
domain names, mostly .us ones which will probably never be used
in spams. We included them just for completeness and since large
sorted lists have almost no performance impact on UNIX joins.

The actual number of non .us whitelisted domains is about
13 thousand.

We mentioned some reasons why these are not as well-suited
to DNS lists as the blacklist records are.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 8:15:28 AM, David Hooton wrote:
> The floor in offering a DNS based whitelist is that it encourages
> people to place a negative score on it.  The problem with this is that
> spammers can poison messages with whitelisted domains, thereby
> bypassing the power of the SURBL

Agreed, that's another possible misuse of whitelists if they
existed in RBL form.

> The concept of "Whitelist" in the SURBL world is more of an "Exclusion
> List" as in "we exclude these domains from being listed" rather than
> we consider the presence of these domains in an email to be a good
> sign of ham.

Which is how we use them throughout SURBLs.  There is no
whitening of messages due to whitelist inclusion, only
non-checking of whitelisted domains.

That was a deliberate design decision IIRC.  It seems to be
revisited every so often, along with other design decisions,
most of which I hope are mostly right.  None of these
decisions were made in a vacuum, we discussed most of them
collaboratively and openly.  Some of them were made when
the project was just Eric Kolve and me, and some were later,
but even two heads are better than one.  :-)

> An excluded domain is therefore ignored in all data and not allocated
> a score positively or negatively, so trying to poison a message with
> whitelisted domains is therefore pointless.

Yep.

> I think we either need to look at a DNS version of
> uridnsbl_skip_domain with long TTL's or we should look at releasing a
> .cf file.  I personally think the more proper implementation may be
> the DNS based version in order to avoid BigEvil type situations.

The the solution we came up for SA 3 with of a small, hard-coded
exclusion list seems to fit the data well and be somewhat
optimal in terms of performance.

An advantage of a separate local whitelist .cf might be
for local programs to be able to output and maintain their
own list, as Chris initially suggested.  But I get back
to the point that whitehats are pretty stable, and a
hard coded list which anyone can edit or update locally
in their existing 25_uribl.cf fits the data pretty well.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 8:15:49 AM, Chris Santerre wrote:
> The idea [of a whitelist DNS list] would cause more lookups and
> time for those who don't cache dns.

That's another excellent argument.  Barring caching, which not
all resolvers do, why do a gazillion DNS lookups on yahoo.com,
w3.org, etc. when we already know they're whitehats?

Hard coding small, local exclusion lists into uridnsbl_skip_domain
and whitelist_spamcop_uri is probably a better solution.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 8:03:35 AM, Bill Landry wrote:
> - Original Message - 
> From: "Daryl C. W. O'Shea" <[EMAIL PROTECTED]>

>>  >> Was the whitelist you were referring to really the SURBL server-side
>> whitelist?
>>  >
>>  >
>>  > Yes! But local SURBL whitelists are needed to reduce traffic and time.
>>
>>
>> I'd much rather see SURBL respond with 127.0.0.0 with a really large TTL
>> for white listed domains.  Any sensible setup will run a local DNS cache
>> which will take care of the load and time issue.

> I agree, and have suggested a whitelist SURBL several times on the SURBL
> discussion list, but it has always fallen on deaf ears - nary a response.
> It would be nice if someone would at least respond as to why this is not a
> reasonable suggestion.

Bill,
We did discuss several times before.  Some of the discussion
may have been behind the scenes in the development of
uridnsbl_skip_domain:

  http://bugzilla.spamassassin.org/show_bug.cgi?id=3805

but we also discussed it on the SURBL discussion list.  As I
recall some of the arguments against it included:

1.  Possible misuse: i.e. mistakenly using it as a blacklist.

2.  Performance: A relatively small number of domains appear
most frequently in hams, like yahoo.com, w3.org, etc.  The
point of diminishing returns in publishing as a DNS list
more than a few hundred whitelisted domains is reached quickly
in terms of decreasing frequency of hits.  Some of this can
be seen in the whitelist sample hit count stats at:

  http://www.surbl.org/dns-queries.whitelist.counts.txt

A cursory statistical analysis will prove my point.

3.  Whitehat domains are pretty stable.  They tend not to
change over the course of many months or even years.

4.  Blackhat domains in contrast tend to change rapidly.
There is statistical research showing that most spam domains
are only used for a few days, then discarded.

5.  Therefore the size and rapid changes of spam domains
are more appropriately communicated in DNS lists than
whitehat domains.

There may have been other arguments, but these are probably
the key ones.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-09 Thread Jeff Chan
On Wednesday, December 8, 2004, 8:47:18 AM, Larry Rosenbaum wrote:
> How about a way to use wildcards with uridnsbl_skip_domain?  I'd like to
> be able to tell the SURBL code not to look up

> *.gov
> *.mil
> *.edu
> and even *.??.us

> since these are unlikely to be hosting spammer web pages.

True, but most people believe that whitelisting entire
TLDs is too broad, and I agree.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Matt Kettler
At 10:58 AM 12/8/2004, Michael Barnes wrote:
> Um. They are?? AFAIK there are absolutely no whitelists to the DNSRBLs in
> SA itself.
I'm not sure if DNSRBLs are the same as URIDNSBLs, or if this was the
intent of the original poster
It was a mistake on Chris's part, and he replied as such.
As for the difference, the two are definitely not the same..
DNSRBLs  work by listing the IP addresses of various mailservers that fit 
certain spam criteria, and are used for Received header parsing..

URIDNSBLs are intended to list the targets of URI's (aka web links), not 
mail relays.

Some services are actually applicable to both situations, but not many. The 
SURBL spamcop URI list is clearly not very useful as a DNSRBL. The original 
spamcop spam-relay list (not the one hosted by surbl) is not very useful as 
a URIDNSBL.

Even though both are generated by the same reports to spamcop,and both are 
queried using similar mechanisms, they each list data extracted from 
different parts of the spam, and are most effective when checked against 
the same parts they were extracted from.




Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Daryl C. W. O'Shea
Chris Santerre wrote:
Assuming that this whitelist would be used to LOWER the score of an email,
and not just exclude them from SURBL. Then we would go thru even
moreresearch before we whitelist a domain. There is a LOT of work that goes
into adding a domain to our whitelist, and that is JUST for exclusion!
Nah, just exclude.  I don't know why anyone would want to use it for a 
nice score since it'd be *so easy* to end up with FNs.  I only used the 
term "white list" as that's what Jeff has called the exclusion list on 
the mailing list, and what you said ("whitelisting local domains") in 
your original message.


My whole idea was skipping the lookup entirley. Why would you want to do a
lookup for google even if it is cached? 
I'd rather trade 2ms for a cached lookup than consuming even more memory 
for the extra local white list array.

Additionally, assuming there isn't an extreme query frequency drop off 
after the top 100 or 200 excluded domains, it would be nice to have 
access to the rest of the exclusion list which wouldn't be realistic to 
be storing (and currently copying around) in memory.

There's got to be a reason why SpamAssassin currently only includes the 
top 100 or whatever excluded domains... either the rest of the data 
wasn't useful or it wasn't worth the performance hit having them in memory.

New additions to the exclusion list would immediately be available too, 
not that that is really a huge concern.

Daryl


RE: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Chris Santerre

>
> >> We do have a whitelist that our private research tools do 
>poll. The 
> >> idea is that if it isn't in SURBL then it is white.
> >>
> >> This also puts more work to the already overworked contributors. ;)
>
>
>How so?  The lookup code is already compatible as is, it's 
>just a matter 
>of adding the records to each of the SURBL zones... from the already 
>existing data files.  That'd take some effort, but I can't imagine it 
>would require anything more than trivial changes... although I've been 
>wrong before.

Assuming that this whitelist would be used to LOWER the score of an email,
and not just exclude them from SURBL. Then we would go thru even
moreresearch before we whitelist a domain. There is a LOT of work that goes
into adding a domain to our whitelist, and that is JUST for exclusion! 

It takes at least twice as long to see if someone is white vs black. 

Thats where the "more work" would come from. You should see some of the long
threads on a single domain up for being whitelisted. Its a good thing Jeff
and I have a sense of humor with eachother ;) 

My whole idea was skipping the lookup entirley. Why would you want to do a
lookup for google even if it is cached? 

--Chris


Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Daryl C. W. O'Shea
Bill Landry wrote:
>> From: "Chris Santerre" <[EMAIL PROTECTED]>
>>
>> Well we have talked about it and  didn't come up with a solid
>> answer. The idea would cause more lookups and time for those who
>> don't cache dns.
It doesn't cause more lookups for anyone.  A local white list file would 
reduces lookups at the expense of process size (and time if the white 
list is very large).

Besides, if someone doesn't want to take the 1-5 minutes it takes to 
setup a local DNS cache they're probably not too interested in saving 
time anyway.

>> We do have a whitelist that our private research tools do poll. The 
>> idea is that if it isn't in SURBL then it is white.
>>
>> This also puts more work to the already overworked contributors. ;)

How so?  The lookup code is already compatible as is, it's just a matter 
of adding the records to each of the SURBL zones... from the already 
existing data files.  That'd take some effort, but I can't imagine it 
would require anything more than trivial changes... although I've been 
wrong before.

> Actually, I was thinking of the whitelist that Jeff has already
> compiled at http://spamcheck.freeapp.net/whitelist-domains.sort
> (currently over 66,500 whitelisted domains).  If you set a long TTL 
on > the query responses, it would certainly cut down on follow-up queries
> for anyone that is running a caching dns.  It would also be a lot less
> resource intensive then trying to run a local whitelist.cf of over
> 66,500 whitelisted domains.

Ditto.  Even if someone isn't running a caching name server, it's highly 
unlikely that their ISP isn't.

Daryl



RE: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Chris Santerre


>-Original Message-
>From: Rosenbaum, Larry M. [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, December 08, 2004 11:47 AM
>To: users@spamassassin.apache.org
>Subject: RE: Feature Request: Whitelist_DNSRBL
>
>
>How about a way to use wildcards with uridnsbl_skip_domain?  
>I'd like to
>be able to tell the SURBL code not to look up
>
>*.gov
>*.mil
>*.edu
>and even *.??.us
>

LOL we've listed a few edu so far :)

LOL @ "BigEvil situation" , its now famous!

Actually I was only saying to list the top look ups from the whitelist, not
the 66,500. That is more of a research and exclusion tool. So no more then
200-300 domains. Check it every month for changes and update. 

I'll probably make up a .cf file and start testing it. 

--Chris


RE: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Rosenbaum, Larry M.
How about a way to use wildcards with uridnsbl_skip_domain?  I'd like to
be able to tell the SURBL code not to look up

*.gov
*.mil
*.edu
and even *.??.us

since these are unlikely to be hosting spammer web pages.

Larry



Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Bill Landry
- Original Message - 
From: "David Hooton" <[EMAIL PROTECTED]>

> On Wed, 8 Dec 2004 08:03:35 -0800, Bill Landry <[EMAIL PROTECTED]>
wrote:
> > I agree, and have suggested a whitelist SURBL several times on the SURBL
> > discussion list, but it has always fallen on deaf ears - nary a
response.
> > It would be nice if someone would at least respond as to why this is not
a
> > reasonable suggestion.
>
> The floor in offering a DNS based whitelist is that it encourages
> people to place a negative score on it.  The problem with this is that
> spammers can poison messages with whitelisted domains, thereby
> bypassing the power of the SURBL

I agree, it should not be used as a HAM indicator, way too easy to abuse.  I
was suggesting that the whitelist be used as a way to exclude the domain
from being constantly queried against the SURBL name servers.

> The concept of "Whitelist" in the SURBL world is more of an "Exclusion
> List" as in "we exclude these domains from being listed" rather than
> we consider the presence of these domains in an email to be a good
> sign of ham.

Exactly.

> An excluded domain is therefore ignored in all data and not allocated
> a score positively or negatively, so trying to poison a message with
> whitelisted domains is therefore pointless.

Yep, agree wholeheartedly.

> I think we either need to look at a DNS version of
> uridnsbl_skip_domain with long TTL's or we should look at releasing a
> .cf file.  I personally think the more proper implementation may be
> the DNS based version in order to avoid BigEvil type situations.

Indeed, my thoughts exactly.

Bill



Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Bill Landry
- Original Message - 
From: "Chris Santerre" <[EMAIL PROTECTED]>

> >-Original Message-
> >From: Bill Landry [mailto:[EMAIL PROTECTED]
> >Sent: Wednesday, December 08, 2004 11:04 AM
> >To: users@spamassassin.apache.org; [EMAIL PROTECTED]
> >Subject: Re: Feature Request: Whitelist_DNSRBL
> >
> >
> >- Original Message - 
> >From: "Daryl C. W. O'Shea" <[EMAIL PROTECTED]>
> >
> >>  >> Was the whitelist you were referring to really the SURBL
> >server-side
> >> whitelist?
> >>  >
> >>  >
> >>  > Yes! But local SURBL whitelists are needed to reduce
> >traffic and time.
> >>
> >>
> >> I'd much rather see SURBL respond with 127.0.0.0 with a
> >really large TTL
> >> for white listed domains.  Any sensible setup will run a
> >local DNS cache
> >> which will take care of the load and time issue.
> >
> >I agree, and have suggested a whitelist SURBL several times on
> >the SURBL
> >discussion list, but it has always fallen on deaf ears - nary
> >a response.
> >It would be nice if someone would at least respond as to why
> >this is not a
> >reasonable suggestion.
>
> Well we have talked about it and  didn't come up with a solid answer.
> The idea would cause more lookups and time for those who don't cache dns.
We
> do have a whitelist that our private research tools do poll. The idea is
> that if it isn't in SURBL then it is white.
>
> This also puts more work to the already overworked contributors. ;)

Actually, I was thinking of the whitelist that Jeff has already compiled at
http://spamcheck.freeapp.net/whitelist-domains.sort (currently over 66,500
whitelisted domains).  If you set a long TTL on the query responses, it
would certainly cut down on follow-up queries for anyone that is running a
caching dns.  It would also be a lot less resource intensive then trying to
run a local whitelist.cf of over 66,500 whitelisted domains.

Anyway, just a thought...

Bill



Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread David Hooton
On Wed, 8 Dec 2004 08:03:35 -0800, Bill Landry <[EMAIL PROTECTED]> wrote:
> I agree, and have suggested a whitelist SURBL several times on the SURBL
> discussion list, but it has always fallen on deaf ears - nary a response.
> It would be nice if someone would at least respond as to why this is not a
> reasonable suggestion.

The floor in offering a DNS based whitelist is that it encourages
people to place a negative score on it.  The problem with this is that
spammers can poison messages with whitelisted domains, thereby
bypassing the power of the SURBL

The concept of "Whitelist" in the SURBL world is more of an "Exclusion
List" as in "we exclude these domains from being listed" rather than
we consider the presence of these domains in an email to be a good
sign of ham.

An excluded domain is therefore ignored in all data and not allocated
a score positively or negatively, so trying to poison a message with
whitelisted domains is therefore pointless.

I think we either need to look at a DNS version of
uridnsbl_skip_domain with long TTL's or we should look at releasing a
.cf file.  I personally think the more proper implementation may be
the DNS based version in order to avoid BigEvil type situations.

Cheers!
-- 
Regards,

David Hooton


RE: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Chris Santerre


>-Original Message-
>From: Bill Landry [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, December 08, 2004 11:04 AM
>To: users@spamassassin.apache.org; [EMAIL PROTECTED]
>Subject: Re: Feature Request: Whitelist_DNSRBL
>
>
>- Original Message - 
>From: "Daryl C. W. O'Shea" <[EMAIL PROTECTED]>
>
>>  >> Was the whitelist you were referring to really the SURBL 
>server-side
>> whitelist?
>>  >
>>  >
>>  > Yes! But local SURBL whitelists are needed to reduce 
>traffic and time.
>>
>>
>> I'd much rather see SURBL respond with 127.0.0.0 with a 
>really large TTL
>> for white listed domains.  Any sensible setup will run a 
>local DNS cache
>> which will take care of the load and time issue.
>
>I agree, and have suggested a whitelist SURBL several times on 
>the SURBL
>discussion list, but it has always fallen on deaf ears - nary 
>a response.
>It would be nice if someone would at least respond as to why 
>this is not a
>reasonable suggestion.

Well we have talked about it and  didn't come up with a solid answer.
The idea would cause more lookups and time for those who don't cache dns. We
do have a whitelist that our private research tools do poll. The idea is
that if it isn't in SURBL then it is white. 

This also puts more work to the already overworked contributors. ;)

--Chris


Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Bill Landry
- Original Message - 
From: "Daryl C. W. O'Shea" <[EMAIL PROTECTED]>

>  >> Was the whitelist you were referring to really the SURBL server-side
> whitelist?
>  >
>  >
>  > Yes! But local SURBL whitelists are needed to reduce traffic and time.
>
>
> I'd much rather see SURBL respond with 127.0.0.0 with a really large TTL
> for white listed domains.  Any sensible setup will run a local DNS cache
> which will take care of the load and time issue.

I agree, and have suggested a whitelist SURBL several times on the SURBL
discussion list, but it has always fallen on deaf ears - nary a response.
It would be nice if someone would at least respond as to why this is not a
reasonable suggestion.

Bill



Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Michael Barnes
On Wed, Dec 08, 2004 at 10:26:15AM -0500, Matt Kettler wrote:
> At 10:17 AM 12/8/2004 -0500, Chris Santerre wrote:
> >OK, we know that the popular domains like yahoo.com and such are hard coded
> >into SA to be skipped on DNSRBL lookups. But it would be great to have a
> >function to add more locally.
> 
> Um. They are?? AFAIK there are absolutely no whitelists to the DNSRBLs in 
> SA itself.

I'm not sure if DNSRBLs are the same as URIDNSBLs, or if this was the
intent of the original poster, but SA 3.0.1 added the configuration
option 'uridnsbl_skip_domain' which does not check the urls in emails
that are from the listed domains.  The following domains have been added
to this list by default in 25_uribl.cf:

4at1.com
5iantlavalamp.com
adobe.com
advertising.com
afa.net
akamai.net
akamaitech.net
amazon.com
aol.com
apache.org
apple.com
arcamax.com
atdmt.com
att.net
bbc.co.uk
bfi0.com
bravenet.com
bridgetrack.com
cc-dt.com
chase.com
cheaptickets.com
chtah.com
citibank.com
citizensbank.com
classmates.com
click-url.com
cnet.com
cnn.com
com.com
comcast.net
constantcontact.com
debian.org
directtrack.com
doubleclick.net
dsbl.org
dsi-enews.net
e-trend.co.jp
earthlink.net
ebay.com
ebaystatic.com
ed10.net
ed4.net
edgesuite.net
ediets.com
exacttarget.com
extm.us
flowgo.com
geocities.com
gmail.com
go.com
google.com
grisoft.com
gte.net
hitbox.com
hotbar.com
hotmail.com
hyperpc.co.jp
ibm.com
ientrymail.com
incredimail.com
investorplace.com
jexiste.fr
joingevalia.com
m0.net
mac.com
macromedia.com
mail.com
marketwatch.com
mcafee.com
mediaplex.com
messagelabs.com
microsoft.com
monster.com
moveon.org
msn.com
mycomicspage.com
myweathercheck.com
netatlantic.com
netflix.com
norman.com
nytimes.com
p0.com
pandasoftware.com
partner2profit.com
paypal.com
pcmag.com
plaxo.com
postdirect.com
prserv.net
quickinspirations.com
redhat.com
rm04.net
roving.com
rr.com
rs6.net
sbcglobal.net
sears.com
sf.net
shockwave.com
si.com
sitesolutions.it
smileycentral.com
sourceforge.net
spamcop.net
speedera.net
sportsline.com
sun.com
suntrust.com
terra.com.br
tiscali.co.uk
topica.com
ual.com
uclick.com
unitedoffers.com
ups.com
verizon.net
w3.org
washingtonpost.com
weatherbug.com
xmr3.com
yahoo.co.uk
yahoo.com
yahoogroups.com
yimg.com
yourfreedvds.com

Mike 

-- 
/-\
| Michael Barnes <[EMAIL PROTECTED]> |
| UNIX Systems Administrator  |
| College of William and Mary |
| Phone: (757) 879-3930   |
\-/


Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Daryl C. W. O'Shea
Chris Santerre wrote:
>> Was the whitelist you were referring to really the SURBL server-side 
whitelist?
>
>
> Yes! But local SURBL whitelists are needed to reduce traffic and time.

I'd much rather see SURBL respond with 127.0.0.0 with a really large TTL 
for white listed domains.  Any sensible setup will run a local DNS cache 
which will take care of the load and time issue.

Daryl


Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Alex Broens
Chris Santerre wrote:
OK, we know that the popular domains like yahoo.com and such are hard coded
into SA to be skipped on DNSRBL lookups. But it would be great to have a
function to add more locally. 

Thinking one step bigger, it would be even better to feed this a file. This
way maybe SURBL can create a file for the top hit legit domains. Then using
SARE and RDJ, people could update that. This would reduce a lot of traffic
and time.
This might also help with the mysterious bug we have seen where some local
domains are being flagged as SURBL hit, when they aren't in SURBL. Perhaps
whitelisting local domains so they are skipped would do away with this. 

Thoughts, suggestions, or coffee?
Chris Santerre 
System Admin and SARE Ninja
http://www.rulesemporium.com
http://www.surbl.org
'It is not the strongest of the species that survives,
not the most intelligent, but the one most responsive to change.'
Charles Darwin 
First, where's that coffee?
then: I keep a .cf file with a quite a few lines like.
uridnsbl_skip_domain ibill.com blabla.tld  local-boobie-site.dom
I assume that if you pick up Jeff's white list and transform that into a
.cf then we'll see the sa-blacklist effect, LOTS of ram needed.
For local domains and those you see most according to your client base
the above works fine (for me)
more coffee?
Alex


RE: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Chris Santerre

>
>>Thinking one step bigger, it would be even better to feed 
>this a file. This
>>way maybe SURBL can create a file for the top hit legit 
>domains. Then using
>>SARE and RDJ, people could update that. This would reduce a 
>lot of traffic
>>and time.
>
>Wait, now you're bringing SURBL into this.. are you talking 
>normal DNSRBLS, 
>or URIDNSBLS? Or both?

Dang my brain!!! I meant URIDNSBLS!! Freaking too many abreviations in my
head :) 

>
>Was the whitelist you were referring to really the SURBL 
>server-side whitelist?

Yes! But local SURBL whitelists are needed to reduce traffic and time. 

>
>>This might also help with the mysterious bug we have seen 
>where some local
>>domains are being flagged as SURBL hit, when they aren't in 
>SURBL. Perhaps
>>whitelisting local domains so they are skipped would do away 
>with this.
>
>Agreed.. It would provide users a short-term fix, although really the 
>problem does need to be rooted out at some point..
>
>>Thoughts, suggestions, or coffee?
>
>All of the above?

Pouring a cup right NOW!

--Chris


Re: Feature Request: Whitelist_DNSRBL

2004-12-08 Thread Matt Kettler
At 10:17 AM 12/8/2004 -0500, Chris Santerre wrote:
OK, we know that the popular domains like yahoo.com and such are hard coded
into SA to be skipped on DNSRBL lookups. But it would be great to have a
function to add more locally.
Um. They are?? AFAIK there are absolutely no whitelists to the DNSRBLs in 
SA itself.

Don't confuse the "EXISTING_DOMAINS" list in DNS.pm with a whitelist.
That's actually a list of domains that are used to test if your DNS is 
working if you don't have dns_available set to yes. SA does a quick MX 
query for one of the domains in the list, and if it gets an answer, it 
knows it's working...

However, I do agree it would be nice to be able to have a DNSBL whitelist 
capability, if for no other reason than fixing any listings that might 
cause short-term FPs.

Thinking one step bigger, it would be even better to feed this a file. This
way maybe SURBL can create a file for the top hit legit domains. Then using
SARE and RDJ, people could update that. This would reduce a lot of traffic
and time.
Wait, now you're bringing SURBL into this.. are you talking normal DNSRBLS, 
or URIDNSBLS? Or both?

Was the whitelist you were referring to really the SURBL server-side whitelist?
This might also help with the mysterious bug we have seen where some local
domains are being flagged as SURBL hit, when they aren't in SURBL. Perhaps
whitelisting local domains so they are skipped would do away with this.
Agreed.. It would provide users a short-term fix, although really the 
problem does need to be rooted out at some point..

Thoughts, suggestions, or coffee?
All of the above?