Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-15 Thread Sam Clippinger
Probably not, at least not in the next version.  spamdyke's DNS system 
sends its queries simultaneously and accepts the first positive response 
it receives.  The rest are discarded.  In order to log them all, 
spamdyke would have to wait for all responses to come back, which would 
slow down the DNS system quite a bit (it would make spamdyke only as 
fast as the slowest RBL instead of the fastest).

If you're just wanting to evaluate the RBLs you're using, I think you 
could probably do that more effectively from a daily script that 
analyzes the mail log, requeries RBLs and generates statistics.

-- Sam Clippinger

Eric Shubert wrote:
> I like having specific RBLs logged. I just installed spamdyke on a few
> qmail-toasters yesterday (replacing rblsmtpd), and was going to as about
> this. Michael beat me to it! ;)
> 
> If simultaneous queries are being done, can all RBLs that match be logged?
> Perhaps a comma separated list within parenthesis. This would make it
> possible to gather stats on the effectiveness of the RBLs being used.
> 
> Sam Clippinger wrote:
>> Yes, this is certainly possible.  Right now spamdyke identifies the RBL 
>> in its message to the remote server but not in the logs.  Good idea!
>>
>> What would be a good way to log this information (preferably without 
>> breaking existing scripts)?  I'm thinking as I type here, but spamdyke 
>> already follows the rejection reason with parenthesis (when the log 
>> level is high enough) to indicate which file/line matched for file-based 
>> filters... perhaps the same could be done for RBLs/RHSBLs.  Something 
>> like this:
>>  DENIED_RBL_MATCH(rbl.example.com)
>>
>> As for reordering the RBLs to put the often-matched ones first, the next 
>> version of spamdyke will make that less necessary.  By default, it will 
>> query all RBLs simultaneously, regardless of their order.  (That 
>> behavior can be prevented with a new flag -- ordering would be important 
>> in that case.)
>>
>> -- Sam Clippinger
>>
>> Michael Colvin wrote:
 To find real numbers, you would have to consider how many 
 connections are accepted, how many are rejected and for what 
 reasons.  Then look at the popularity of different spamdyke 
 features and specifically the popularity of different DNS 
 RBLs.  Use all that to find out what percentage of rejected 
 connections could avoid the DNS queries due to local tests.  
>>> Along those lines, is it possible, or can it be possible, to have spamdyke's
>>> logs indicate which DNS RBL caused a message to be rejected?  I'm assuming
>>> that once a reason for rejection is found, IE, the IP is listed in a
>>> particular RBL, further tests against other RBL's in the list are not
>>> performed?  Knowing, statistically, which ones have a higher rejection rate,
>>> and queuing those first in the list of RBLS might save some time.
>>>
>>> Or course, multiple RBLS could reject the same message, and the one first in
>>> line would have the higher percentage, but this would give us a way to move
>>> them around and check the results...
>>>
>>> Just a thought from a newbie to spamdyke. 
>>>
>>> BTW, I LOVE Spamdyke!  What a difference it has made in my system's ability
>>> to filter spam and save resources!  It's a God send!
>>>
>>> Mike
>>>
> 
> 
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-14 Thread Eric Shubert
I like having specific RBLs logged. I just installed spamdyke on a few
qmail-toasters yesterday (replacing rblsmtpd), and was going to as about
this. Michael beat me to it! ;)

If simultaneous queries are being done, can all RBLs that match be logged?
Perhaps a comma separated list within parenthesis. This would make it
possible to gather stats on the effectiveness of the RBLs being used.

Sam Clippinger wrote:
> Yes, this is certainly possible.  Right now spamdyke identifies the RBL 
> in its message to the remote server but not in the logs.  Good idea!
> 
> What would be a good way to log this information (preferably without 
> breaking existing scripts)?  I'm thinking as I type here, but spamdyke 
> already follows the rejection reason with parenthesis (when the log 
> level is high enough) to indicate which file/line matched for file-based 
> filters... perhaps the same could be done for RBLs/RHSBLs.  Something 
> like this:
>   DENIED_RBL_MATCH(rbl.example.com)
> 
> As for reordering the RBLs to put the often-matched ones first, the next 
> version of spamdyke will make that less necessary.  By default, it will 
> query all RBLs simultaneously, regardless of their order.  (That 
> behavior can be prevented with a new flag -- ordering would be important 
> in that case.)
> 
> -- Sam Clippinger
> 
> Michael Colvin wrote:
>>> To find real numbers, you would have to consider how many 
>>> connections are accepted, how many are rejected and for what 
>>> reasons.  Then look at the popularity of different spamdyke 
>>> features and specifically the popularity of different DNS 
>>> RBLs.  Use all that to find out what percentage of rejected 
>>> connections could avoid the DNS queries due to local tests.  
>> Along those lines, is it possible, or can it be possible, to have spamdyke's
>> logs indicate which DNS RBL caused a message to be rejected?  I'm assuming
>> that once a reason for rejection is found, IE, the IP is listed in a
>> particular RBL, further tests against other RBL's in the list are not
>> performed?  Knowing, statistically, which ones have a higher rejection rate,
>> and queuing those first in the list of RBLS might save some time.
>>
>> Or course, multiple RBLS could reject the same message, and the one first in
>> line would have the higher percentage, but this would give us a way to move
>> them around and check the results...
>>
>> Just a thought from a newbie to spamdyke. 
>>
>> BTW, I LOVE Spamdyke!  What a difference it has made in my system's ability
>> to filter spam and save resources!  It's a God send!
>>
>> Mike
>>


-- 
-Eric 'shubes'
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-14 Thread Sam Clippinger
The aggressive DNS queries will definitely increase the momentary load 
on the DNS servers, because they will get a burst of simultaneous 
queries each time a remote server connects.  However, the overall load 
won't go up because the long-term rate of DNS queries is determined by 
the rate of SMTP connections.  I believe most sites will only notice an 
increase in spamdyke's speed with no new load on their DNS servers.

The new DNS behavior is configurable though, including an option to 
return to the behavior used by the standard system resolver.  That way, 
if the DNS servers are having trouble, spamdyke can be made less demanding.

-- Sam Clippinger

Michael Colvin wrote:
> Great!  Of course, this "feature" could also be used to determine if a
> specific RBL is causing to many false-positives too...
> 
> Running all the checks simultaneously certainly will negate the need to
> order them in any specific order and should make the overall process that
> much faster, especially if you're using multiple RBL's.
> 
> What kind of effect will that have on server load?  Many RBL lookups
> sequencially versus many RBL lookups simultaneously?  Seems like the process
> might be faster, but will take more resources on the server?  Which would
> likely mean a basic "Push" with the net result being faster handling of the
> session?
> 
> Thanks again!
>  
> 
> Mike
> 
> 
>> -Original Message-
>> From: [EMAIL PROTECTED] 
>> [mailto:[EMAIL PROTECTED] On Behalf Of Sam 
>> Clippinger
>> Sent: Monday, April 14, 2008 9:40 AM
>> To: spamdyke users
>> Subject: Re: [spamdyke-users] let qmail decide if it accepts 
>> a recipient before doing RHSBL?
>>
>> Yes, this is certainly possible.  Right now spamdyke 
>> identifies the RBL in its message to the remote server but 
>> not in the logs.  Good idea!
>>
>> What would be a good way to log this information (preferably 
>> without breaking existing scripts)?  I'm thinking as I type 
>> here, but spamdyke already follows the rejection reason with 
>> parenthesis (when the log level is high enough) to indicate 
>> which file/line matched for file-based filters... perhaps the 
>> same could be done for RBLs/RHSBLs.  Something like this:
>>  DENIED_RBL_MATCH(rbl.example.com)
>>
>> As for reordering the RBLs to put the often-matched ones 
>> first, the next version of spamdyke will make that less 
>> necessary.  By default, it will query all RBLs 
>> simultaneously, regardless of their order.  (That behavior 
>> can be prevented with a new flag -- ordering would be 
>> important in that case.)
>>
>> -- Sam Clippinger
>>
>> Michael Colvin wrote:
>>>> To find real numbers, you would have to consider how many 
>> connections 
>>>> are accepted, how many are rejected and for what reasons.  
>> Then look 
>>>> at the popularity of different spamdyke features and 
>> specifically the 
>>>> popularity of different DNS RBLs.  Use all that to find out what 
>>>> percentage of rejected connections could avoid the DNS 
>> queries due to 
>>>> local tests.
>>> Along those lines, is it possible, or can it be possible, to have 
>>> spamdyke's logs indicate which DNS RBL caused a message to be 
>>> rejected?  I'm assuming that once a reason for rejection is 
>> found, IE, 
>>> the IP is listed in a particular RBL, further tests against other 
>>> RBL's in the list are not performed?  Knowing, statistically, which 
>>> ones have a higher rejection rate, and queuing those first 
>> in the list of RBLS might save some time.
>>> Or course, multiple RBLS could reject the same message, and the one 
>>> first in line would have the higher percentage, but this 
>> would give us 
>>> a way to move them around and check the results...
>>>
>>> Just a thought from a newbie to spamdyke. 
>>>
>>> BTW, I LOVE Spamdyke!  What a difference it has made in my system's 
>>> ability to filter spam and save resources!  It's a God send!
>>>
>>> Mike
>>>
>>>
>>>
>>> ___
>>> spamdyke-users mailing list
>>> spamdyke-users@spamdyke.org
>>> http://www.spamdyke.org/mailman/listinfo/spamdyke-users
>> ___
>> spamdyke-users mailing list
>> spamdyke-users@spamdyke.org
>> http://www.spamdyke.org/mailman/listinfo/spamdyke-users
>>
> 
> ___
> spamdyke-users mailing list
> spamdyke-users@spamdyke.org
> http://www.spamdyke.org/mailman/listinfo/spamdyke-users
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-14 Thread Michael Colvin
Great!  Of course, this "feature" could also be used to determine if a
specific RBL is causing to many false-positives too...

Running all the checks simultaneously certainly will negate the need to
order them in any specific order and should make the overall process that
much faster, especially if you're using multiple RBL's.

What kind of effect will that have on server load?  Many RBL lookups
sequencially versus many RBL lookups simultaneously?  Seems like the process
might be faster, but will take more resources on the server?  Which would
likely mean a basic "Push" with the net result being faster handling of the
session?

Thanks again!
 

Mike


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Sam 
> Clippinger
> Sent: Monday, April 14, 2008 9:40 AM
> To: spamdyke users
> Subject: Re: [spamdyke-users] let qmail decide if it accepts 
> a recipient before doing RHSBL?
> 
> Yes, this is certainly possible.  Right now spamdyke 
> identifies the RBL in its message to the remote server but 
> not in the logs.  Good idea!
> 
> What would be a good way to log this information (preferably 
> without breaking existing scripts)?  I'm thinking as I type 
> here, but spamdyke already follows the rejection reason with 
> parenthesis (when the log level is high enough) to indicate 
> which file/line matched for file-based filters... perhaps the 
> same could be done for RBLs/RHSBLs.  Something like this:
>   DENIED_RBL_MATCH(rbl.example.com)
> 
> As for reordering the RBLs to put the often-matched ones 
> first, the next version of spamdyke will make that less 
> necessary.  By default, it will query all RBLs 
> simultaneously, regardless of their order.  (That behavior 
> can be prevented with a new flag -- ordering would be 
> important in that case.)
> 
> -- Sam Clippinger
> 
> Michael Colvin wrote:
> > 
> >> To find real numbers, you would have to consider how many 
> connections 
> >> are accepted, how many are rejected and for what reasons.  
> Then look 
> >> at the popularity of different spamdyke features and 
> specifically the 
> >> popularity of different DNS RBLs.  Use all that to find out what 
> >> percentage of rejected connections could avoid the DNS 
> queries due to 
> >> local tests.
> > 
> > Along those lines, is it possible, or can it be possible, to have 
> > spamdyke's logs indicate which DNS RBL caused a message to be 
> > rejected?  I'm assuming that once a reason for rejection is 
> found, IE, 
> > the IP is listed in a particular RBL, further tests against other 
> > RBL's in the list are not performed?  Knowing, statistically, which 
> > ones have a higher rejection rate, and queuing those first 
> in the list of RBLS might save some time.
> > 
> > Or course, multiple RBLS could reject the same message, and the one 
> > first in line would have the higher percentage, but this 
> would give us 
> > a way to move them around and check the results...
> > 
> > Just a thought from a newbie to spamdyke. 
> > 
> > BTW, I LOVE Spamdyke!  What a difference it has made in my system's 
> > ability to filter spam and save resources!  It's a God send!
> > 
> > Mike
> > 
> > 
> > 
> > ___
> > spamdyke-users mailing list
> > spamdyke-users@spamdyke.org
> > http://www.spamdyke.org/mailman/listinfo/spamdyke-users
> ___
> spamdyke-users mailing list
> spamdyke-users@spamdyke.org
> http://www.spamdyke.org/mailman/listinfo/spamdyke-users
> 

___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-14 Thread Andras Korn
On Mon, Apr 14, 2008 at 09:40:51AM -0500, Sam Clippinger wrote:

> Andras Korn wrote:
> > On Sun, Apr 13, 2008 at 02:55:16PM -0500, Sam Clippinger wrote:
> >> Most qmail servers run a stock version of qmail-smtpd, which will only
> >> reject recipients for relaying.
> > 
> > They shouldn't, as stock qmail is liable to causing backscatter. No
> > self-respecting admin should run qmail as distributed by DJB today. Mail to
> > bogus recipients must not be accepted and then bounced later. Some mechanism
> > must be in place to ensure that mail to bogus recipients is never accepted
> > at all.
> 
> I agree.  But regardless of what /should/ be the case, the fact is that 
> most qmail servers run the stock version of qmail-smtpd.  I can't 
> justify making a change that will make spamdyke less efficient for the 
> majority and only slightly more efficient for the minority.

You yourself said that the decreased efficiency, if any, would be marginal.

This behaviour could even be configurable: "delay-dns-blacklis-checks=1|0"
or similar, defaulting to off if you're worried about efficiency.

> To find real numbers, you would have to consider how many connections 
> are accepted, how many are rejected and for what reasons.  Then look at 
> the popularity of different spamdyke features and specifically the 
> popularity of different DNS RBLs.  Use all that to find out what 
> percentage of rejected connections could avoid the DNS queries due to 
> local tests.

I can come up with local figures, but knowing globally what features of
spamdyke are used how often is probably impossible.

> Lastly, find a way to evaluate the real cost (wall time, server load and
> network load) of spamdyke's DNS queries versus the additional load
> generated by passing the extra SMTP traffic to qmail.

I can't imagine this latter additional load as being nontrivial, but it
could be measured to some extent using strace -c.

> If all of those numbers were available, my instinct says the advantage 
> of your proposed change would be very small at best.

This is still ignoring the unnecessary load on the RBL DNS servers (most of
us are using them for free, yet someone must pay for their maintenance and
bandwidth, so let's not be wasteful).

Also, I still think that in the case of email service, saving wall time
(i.e. reducing latency) is more beneficial than saving CPU time (probably a
minuscule amount of CPU time, at that). I find a net gain of 9 seconds per
message with a single bogus recipient hard to ignore.

Andras

-- 
 Andras Korn 
  QOTD:
History will record it. I know it because I'll write it myself.
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-14 Thread Sam Clippinger
Yes, this is certainly possible.  Right now spamdyke identifies the RBL 
in its message to the remote server but not in the logs.  Good idea!

What would be a good way to log this information (preferably without 
breaking existing scripts)?  I'm thinking as I type here, but spamdyke 
already follows the rejection reason with parenthesis (when the log 
level is high enough) to indicate which file/line matched for file-based 
filters... perhaps the same could be done for RBLs/RHSBLs.  Something 
like this:
DENIED_RBL_MATCH(rbl.example.com)

As for reordering the RBLs to put the often-matched ones first, the next 
version of spamdyke will make that less necessary.  By default, it will 
query all RBLs simultaneously, regardless of their order.  (That 
behavior can be prevented with a new flag -- ordering would be important 
in that case.)

-- Sam Clippinger

Michael Colvin wrote:
> 
>> To find real numbers, you would have to consider how many 
>> connections are accepted, how many are rejected and for what 
>> reasons.  Then look at the popularity of different spamdyke 
>> features and specifically the popularity of different DNS 
>> RBLs.  Use all that to find out what percentage of rejected 
>> connections could avoid the DNS queries due to local tests.  
> 
> Along those lines, is it possible, or can it be possible, to have spamdyke's
> logs indicate which DNS RBL caused a message to be rejected?  I'm assuming
> that once a reason for rejection is found, IE, the IP is listed in a
> particular RBL, further tests against other RBL's in the list are not
> performed?  Knowing, statistically, which ones have a higher rejection rate,
> and queuing those first in the list of RBLS might save some time.
> 
> Or course, multiple RBLS could reject the same message, and the one first in
> line would have the higher percentage, but this would give us a way to move
> them around and check the results...
> 
> Just a thought from a newbie to spamdyke. 
> 
> BTW, I LOVE Spamdyke!  What a difference it has made in my system's ability
> to filter spam and save resources!  It's a God send!
> 
> Mike
> 
> 
> 
> ___
> spamdyke-users mailing list
> spamdyke-users@spamdyke.org
> http://www.spamdyke.org/mailman/listinfo/spamdyke-users
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-14 Thread Michael Colvin


> 
> To find real numbers, you would have to consider how many 
> connections are accepted, how many are rejected and for what 
> reasons.  Then look at the popularity of different spamdyke 
> features and specifically the popularity of different DNS 
> RBLs.  Use all that to find out what percentage of rejected 
> connections could avoid the DNS queries due to local tests.  

Along those lines, is it possible, or can it be possible, to have spamdyke's
logs indicate which DNS RBL caused a message to be rejected?  I'm assuming
that once a reason for rejection is found, IE, the IP is listed in a
particular RBL, further tests against other RBL's in the list are not
performed?  Knowing, statistically, which ones have a higher rejection rate,
and queuing those first in the list of RBLS might save some time.

Or course, multiple RBLS could reject the same message, and the one first in
line would have the higher percentage, but this would give us a way to move
them around and check the results...

Just a thought from a newbie to spamdyke. 

BTW, I LOVE Spamdyke!  What a difference it has made in my system's ability
to filter spam and save resources!  It's a God send!

Mike



___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-14 Thread Sam Clippinger
Andras Korn wrote:
> On Sun, Apr 13, 2008 at 02:55:16PM -0500, Sam Clippinger wrote:
>> Most qmail servers run a stock version of qmail-smtpd, which will only
>> reject recipients for relaying.
> 
> They shouldn't, as stock qmail is liable to causing backscatter. No
> self-respecting admin should run qmail as distributed by DJB today. Mail to
> bogus recipients must not be accepted and then bounced later. Some mechanism
> must be in place to ensure that mail to bogus recipients is never accepted
> at all.

I agree.  But regardless of what /should/ be the case, the fact is that 
most qmail servers run the stock version of qmail-smtpd.  I can't 
justify making a change that will make spamdyke less efficient for the 
majority and only slightly more efficient for the minority.

> I think you should always wait for RCPT TO, even if it's not necessary for
> whitelist decisions, because then you can log whose mail you're rejecting.
> rblsmtpd's inability to do this is one of its major shortcomings.

spamdyke does always wait for RCPT, so that it can log the recipient. 
But it does not keep qmail running the entire time, if there is no 
chance the message will be accepted.  If a recipient whitelist is not in 
use, there's no reason to wait.

>> I suspect we're debating fractional efficiencies here anyway -- I've 
>> never benchmarked either scenario.
> 
> Well, fwiw, I just ran a quick test: querying the handful of RBLs I have
> configured in parallel takes about 10 seconds (as long as it takes for the
> slowest of them to reply). Rejecting a mail based on the local list of valid
> recipients takes a good deal less than a second.

Of course local file accesses are faster than DNS queries.  That's not 
what I meant.

To find real numbers, you would have to consider how many connections 
are accepted, how many are rejected and for what reasons.  Then look at 
the popularity of different spamdyke features and specifically the 
popularity of different DNS RBLs.  Use all that to find out what 
percentage of rejected connections could avoid the DNS queries due to 
local tests.  Lastly, find a way to evaluate the real cost (wall time, 
server load and network load) of spamdyke's DNS queries versus the 
additional load generated by passing the extra SMTP traffic to qmail.

That last step is the part I don't know how to measure.

If all of those numbers were available, my instinct says the advantage 
of your proposed change would be very small at best.

-- Sam Clippinger
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-13 Thread Andras Korn
On Sun, Apr 13, 2008 at 02:55:16PM -0500, Sam Clippinger wrote:

> Andras Korn wrote:
> > I don't agree. These days, there are RBLs that will automatically list and
> > delist IPs in the space of a few hours, well within the lifetime of a single
> > email message.
> 
> If a server is being added and removed from RBLs in the space of a few 
> hours, its behavior must be just on the border between "legitimate" and 
> "spammer".  In that case, I would think the administrator would want to 
> know about it by receiving a few complaints from users whose messages 
> were being bounced.

Again, let's agree to disagree. :)

> You started this thread with a complaint that temporary rejections were 
> needlessly consuming your server resources by causing the remote server 
> to retry deliveries multiple times.  I guarantee that making the RBL 
> filter return temporary rejection codes would waste considerably more 
> resources for everyone, as RBLs are much more common and more widely 
> used than RHSBLs.

In a way, that is true. However, if you allowed qmail to permanently reject
some of the spam (because it's addressed to a bogus recipient), the
temporary rejections wouldn't make that much of a difference because there
wouldn't be so many of them.

The added load caused by RBL-based temporary rejections I'm willing to
accept.

> > rblsmtpd also uses temporary rejects, fwiw.
> 
> Well, most of the major email providers (AOL, Yahoo!, GMail, Hotmail, 
> etc) use permanent rejections for RBL matches.

They probably use their own RBLs though, don't they? Also, they have
hundreds of thousands of users, so they aren't going to care about any
particular message not getting through.

At smaller sites, it's possible to keep a virtual eye on the qmail log (say,
using a script) that can alert you when some new type of mail is being
blocked. It's nice to be able to interfere.

> > Temporary rejects also give the administrator a chance to whitelist an IP
> > they do want to receive mail from (such as when it turns out that your new
> > business partner's ISP just got blacklisted by an RBL).
> 
> The administrator would have to be carefully watching the outbound queue 
> to notice a message was being held, then investigate the logs to find 
> out why.  I can't envision this happening unless the server is new and 
> the administrator is testing to make sure everything is working.

I didn't mean the administrator of the server sending the message, but the
admin of the server rejecting it. I've often manually whitelisted IPs
blocked by one RBL or the other. I have found this to be practically the
only way to use RBLs run by 3rd parties to block mail (instead of just
increasing their spamminess score in SpamAssassin).

> I understand now.
> 
> What you're describing would make spamdyke more efficient only for users 
> who have modified/replaced their qmail-smtpd to support blacklists or 
> other filters.

Yes.

> Most qmail servers run a stock version of qmail-smtpd, which will only
> reject recipients for relaying.

They shouldn't, as stock qmail is liable to causing backscatter. No
self-respecting admin should run qmail as distributed by DJB today. Mail to
bogus recipients must not be accepted and then bounced later. Some mechanism
must be in place to ensure that mail to bogus recipients is never accepted
at all.

> On a stock qmail installation, this change would make spamdyke _less_ 
> efficient, since it would keep qmail running for all connections, at 
> least until the DATA command is given.

Yes, if you only consider the single server spamdyke and qmail are running
on. But issuing needless DNS queries also puts supefluous load on the local
caching DNS resolver and the DNS servers of the RBLs/RHSBLs. Wouldn't it be
a courtesy to them to not query their servers if a local decision can be
made to reject a message?

Also, local decisions have lower latency. It may be possible to reject a
message based on local tests in less wall time than by waiting for the RBLs;
thus, a higher connection rate could potentially be served, because many
connections would end sooner.

> However, the current code closes qmail as soon as possible to free up
> resources.

"As soon as possible" in terms of the SMTP conversation, certainly; but not
as soon as possible in real time, I'm pretty sure.

> "As soon as possible" depends on the configured filters -- the possibility
> of SMTP AUTH and the use of sender whitelists require qmail to continue
> running until "MAIL FROM" is seen.  The use of recipient whitelists
> require qmail to continue running until "RCPT TO" is seen.  But if
> spamdyke is configured to do graylisting, some RBLs, some rDNS tests and
> SMTP AUTH (a typical setup), qmail will be closed as soon as the "MAIL
> FROM" command is given.

I think you should always wait for RCPT TO, even if it's not necessary for
whitelist decisions, because then you can log whose mail you're rejecting.
rblsmtpd's inability to do this is one of its major

Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-13 Thread Sam Clippinger
Andras Korn wrote:
> I don't agree. These days, there are RBLs that will automatically list and
> delist IPs in the space of a few hours, well within the lifetime of a single
> email message.

If a server is being added and removed from RBLs in the space of a few 
hours, its behavior must be just on the border between "legitimate" and 
"spammer".  In that case, I would think the administrator would want to 
know about it by receiving a few complaints from users whose messages 
were being bounced.

You started this thread with a complaint that temporary rejections were 
needlessly consuming your server resources by causing the remote server 
to retry deliveries multiple times.  I guarantee that making the RBL 
filter return temporary rejection codes would waste considerably more 
resources for everyone, as RBLs are much more common and more widely 
used than RHSBLs.

> rblsmtpd also uses temporary rejects, fwiw.

Well, most of the major email providers (AOL, Yahoo!, GMail, Hotmail, 
etc) use permanent rejections for RBL matches.

> Temporary rejects also give the administrator a chance to whitelist an IP
> they do want to receive mail from (such as when it turns out that your new
> business partner's ISP just got blacklisted by an RBL).

The administrator would have to be carefully watching the outbound queue 
to notice a message was being held, then investigate the logs to find 
out why.  I can't envision this happening unless the server is new and 
the administrator is testing to make sure everything is working.

> Currently, what happens is this (IIRC):
> 
> 1. client 1.2.3.4 connects.
> 2. spamdyke checks rdns, RBLs, blacklists and whitelists, rejects message if
>necessary.
> 3. client issues HELO/EHLO.
> 4. spamdyke checks DNS, rejects message if necessary.
> 5. spamdyke forwards HELO to qmail.
> 6. client issues MAIL FROM.
> 7. spamdyke checks DNS, RHSBLs, blacklists and whitelists, rejects message
>if necessary.
> 8. spamdyke forwards MAIL FROM to qmail.
> 9. client issues RCPT TO.
> 10. spamdyke consults localdomains, blacklists, whitelists, relay access and
> whatnot; rejects receipient if necessary.
> 11. spamdyke forwards recipient to qmail.
> 12. repeat 9-11 until client issues DATA.
> 13. spamdyke forwards DATA to qmail.
> 14. actual message is transferred.
> 
> What I suggest is to skip all DNS based tests until just before step 13. If
> qmail accepted none of the recipients (including the case where it didn't
> even get to see them because they were filtered by spamdyke), there is
> nothing to do and we saved some slow DNS queries.
> 
> If some recipients were accepted, spamdyke does the DNS lookups and if they
> indicate that the message should be rejected, it sends an appropriate 45x or
> 55x response to the DATA command of the client. Instead of DATA, it sends
> QUIT to qmail.

I understand now.

What you're describing would make spamdyke more efficient only for users 
who have modified/replaced their qmail-smtpd to support blacklists or 
other filters.  Most qmail servers run a stock version of qmail-smtpd, 
which will only reject recipients for relaying.  Since spamdyke already 
blocks relaying itself, qmail never gets to issue rejections for those 
cases.

On a stock qmail installation, this change would make spamdyke _less_ 
efficient, since it would keep qmail running for all connections, at 
least until the DATA command is given.  However, the current code closes 
qmail as soon as possible to free up resources.  "As soon as possible" 
depends on the configured filters -- the possibility of SMTP AUTH and 
the use of sender whitelists require qmail to continue running until 
"MAIL FROM" is seen.   The use of recipient whitelists require qmail to 
continue running until "RCPT TO" is seen.  But if spamdyke is configured 
to do graylisting, some RBLs, some rDNS tests and SMTP AUTH (a typical 
setup), qmail will be closed as soon as the "MAIL FROM" command is given.

I suspect we're debating fractional efficiencies here anyway -- I've 
never benchmarked either scenario.  I've also found that the most 
efficient scenario is often counterintuitive (meaning my initial 
hypotheses are often wrong).  For example, I never thought that reading 
spamdyke's configuration from a file would be faster than reading the 
command line but my testing showed that it is (apparently my file parser 
is more efficient than glibc's getopt() function).

If you can think of a way to test the efficiency/cost of the two 
approaches, I would be very interested to see the results.

> Ps. not that it matters, but by buffering the list of recipients there is
> still a way out in the situation you described (which doesn't however arise
> in the scheme I am suggesting): just kill the qmail-smtpd child and spawn
> another, but don't give it that particular recipient address. I'm only
> including this footnote because this approach could conceivably be useful in
> other situations.

I've considered doing this a

Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-13 Thread Andras Korn
On Sun, Apr 13, 2008 at 12:29:51PM -0500, Sam Clippinger wrote:

> RBL rejections use permanent codes.  This is because an RBL/RHSBL match 
> is permanent within the lifetime of an email message -- the situation 
> won't change in a few minutes, so there's no point in retrying multiple 
> times.  A human must intervene and correct the situation before any 
> email can be delivered.

I don't agree. These days, there are RBLs that will automatically list and
delist IPs in the space of a few hours, well within the lifetime of a single
email message.

rblsmtpd also uses temporary rejects, fwiw.

Temporary rejects also give the administrator a chance to whitelist an IP
they do want to receive mail from (such as when it turns out that your new
business partner's ISP just got blacklisted by an RBL).

Permanent rejects are less of a problem with RHSBLs because those are less
ephemeral and whether a domain is being used for spamming or not doesn't
change that quickly.

> Regarding the delayed checking of recipient filters, let me spell this 
> out as I understand your suggestion.  The process would look like this:
>   Remote server connects, spamdyke and qmail start.
>   QMAIL->SPAMDYKE: 220 qmail ESMTP
>   SPAMDYKE->REMOTE: 220 qmail ESMTP
>   REMOTE->SPAMDYKE: HELO myname
>   SPAMDYKE->QMAIL: HELO myname
>   QMAIL->SPAMDYKE: 250 Hello, pleased to meet you.
>   SPAMDYKE->REMOTE: 250 Hello, pleased to meet you.
>   REMOTE->SPAMDYKE: MAIL FROM:<[EMAIL PROTECTED]>
>   SPAMDYKE->QMAIL: MAIL FROM:<[EMAIL PROTECTED]>
>   QMAIL->SPAMDYKE: 250 OK
>   SPAMDYKE->REMOTE: 250 OK
>   REMOTE->SPAMDYKE: RCPT TO:<[EMAIL PROTECTED]>
> At this point, you're suggesting that spamdyke pass the recipient to 
> qmail first, without running its own filters.  That would look like this:

No. What I was suggesting was that if spamdyke can decide at this point that
the recipient is invalid, it should of course reject it.

_But_ it should hold off on all DNS queries until it receives a DATA verb
from the client.

> [...]

All of this is perfectly valid and true, but it's not what I was suggesting.
Let me try to rephrase it.

Currently, what happens is this (IIRC):

1. client 1.2.3.4 connects.
2. spamdyke checks rdns, RBLs, blacklists and whitelists, rejects message if
   necessary.
3. client issues HELO/EHLO.
4. spamdyke checks DNS, rejects message if necessary.
5. spamdyke forwards HELO to qmail.
6. client issues MAIL FROM.
7. spamdyke checks DNS, RHSBLs, blacklists and whitelists, rejects message
   if necessary.
8. spamdyke forwards MAIL FROM to qmail.
9. client issues RCPT TO.
10. spamdyke consults localdomains, blacklists, whitelists, relay access and
whatnot; rejects receipient if necessary.
11. spamdyke forwards recipient to qmail.
12. repeat 9-11 until client issues DATA.
13. spamdyke forwards DATA to qmail.
14. actual message is transferred.

What I suggest is to skip all DNS based tests until just before step 13. If
qmail accepted none of the recipients (including the case where it didn't
even get to see them because they were filtered by spamdyke), there is
nothing to do and we saved some slow DNS queries.

If some recipients were accepted, spamdyke does the DNS lookups and if they
indicate that the message should be rejected, it sends an appropriate 45x or
55x response to the DATA command of the client. Instead of DATA, it sends
QUIT to qmail.

With "DNS lookups" I mainly mean RBL and RHSBL lookups, because there are
potentially many of those; there is normally no need to hold back on rdns,
as tcpserver will/may have attempted it anyway, so the response, if any,
should be in the DNS cache already.

Andras

Ps. not that it matters, but by buffering the list of recipients there is
still a way out in the situation you described (which doesn't however arise
in the scheme I am suggesting): just kill the qmail-smtpd child and spawn
another, but don't give it that particular recipient address. I'm only
including this footnote because this approach could conceivably be useful in
other situations.

-- 
 Andras Korn 
  QOTD:
I'd like to live like a poor person with lots of money.
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-13 Thread Sam Clippinger
RBL rejections use permanent codes.  This is because an RBL/RHSBL match 
is permanent within the lifetime of an email message -- the situation 
won't change in a few minutes, so there's no point in retrying multiple 
times.  A human must intervene and correct the situation before any 
email can be delivered.

The only DNS-related filters that send temporary rejection codes are 
those that could be triggered by no response from a DNS server.  In 
other words, reject-empty-rdns, reject-unresolvable-rdns and 
reject-missing-sender-mx.  If the DNS server is slow or overloaded, 
retrying in a few minutes could allow delivery because the DNS server 
may have recovered in that time.  Beyond those, graylisting, timeouts 
and max-recipients send temporary rejections.  All other filters send 
permanent rejections.

Regarding the delayed checking of recipient filters, let me spell this 
out as I understand your suggestion.  The process would look like this:
Remote server connects, spamdyke and qmail start.
QMAIL->SPAMDYKE: 220 qmail ESMTP
SPAMDYKE->REMOTE: 220 qmail ESMTP
REMOTE->SPAMDYKE: HELO myname
SPAMDYKE->QMAIL: HELO myname
QMAIL->SPAMDYKE: 250 Hello, pleased to meet you.
SPAMDYKE->REMOTE: 250 Hello, pleased to meet you.
REMOTE->SPAMDYKE: MAIL FROM:<[EMAIL PROTECTED]>
SPAMDYKE->QMAIL: MAIL FROM:<[EMAIL PROTECTED]>
QMAIL->SPAMDYKE: 250 OK
SPAMDYKE->REMOTE: 250 OK
REMOTE->SPAMDYKE: RCPT TO:<[EMAIL PROTECTED]>
At this point, you're suggesting that spamdyke pass the recipient to 
qmail first, without running its own filters.  That would look like this:
SPAMDYKE->QMAIL: RCPT TO:<[EMAIL PROTECTED]>
QMAIL->SPAMDYKE: 250 OK
qmail did not reject the recipient, so spamdyke runs its filters and 
decides the recipient should be blocked.  So it sends a rejection to the 
remote server.
SPAMDYKE->REMOTE: 554 Recipient rejected.
Then a valid recipient is sent:
REMOTE->SPAMDYKE: RCPT TO:<[EMAIL PROTECTED]>
SPAMDYKE->QMAIL: RCPT TO:<[EMAIL PROTECTED]>
QMAIL->SPAMDYKE: 250 OK
SPAMDYKE->REMOTE: 250 OK
After that, the message is delivered:
REMOTE->SPAMDYKE: DATA
SPAMDYKE->QMAIL: DATA
QMAIL->SPAMDYKE: 354 Proceed.
SPAMDYKE->REMOTE: 354 Proceed.
REMOTE->SPAMDYKE: (message data)
SPAMDYKE->QMAIL: (message data)
QMAIL->SPAMDYKE: 250 Accepted.
SPAMDYKE->REMOTE: 250 Accepted.
REMOTE->SPAMDYKE: QUIT
SPAMDYKE->QMAIL: QUIT
QMAIL->SPAMDYKE: 221 Bye.
SPAMDYKE->REMOTE: 221 Bye.

 From the remote server's point of view, the conversation looked like this:
Remote server connects, spamdyke and qmail start.
SPAMDYKE->REMOTE: 220 qmail ESMTP
REMOTE->SPAMDYKE: HELO myname
SPAMDYKE->REMOTE: 250 Hello, pleased to meet you.
REMOTE->SPAMDYKE: MAIL FROM:<[EMAIL PROTECTED]>
SPAMDYKE->REMOTE: 250 OK
REMOTE->SPAMDYKE: RCPT TO:<[EMAIL PROTECTED]>
SPAMDYKE->REMOTE: 554 Recipient rejected.
REMOTE->SPAMDYKE: RCPT TO:<[EMAIL PROTECTED]>
SPAMDYKE->REMOTE: 250 OK
REMOTE->SPAMDYKE: DATA
SPAMDYKE->REMOTE: 354 Proceed.
REMOTE->SPAMDYKE: (message data)
SPAMDYKE->REMOTE: 250 Accepted.
REMOTE->SPAMDYKE: QUIT
SPAMDYKE->REMOTE: 221 Bye.
One recipient was accepted and one was rejected.

But from qmail's point of view, the conversation looked like this:
Remote server connects, spamdyke and qmail start.
QMAIL->SPAMDYKE: 220 qmail ESMTP
SPAMDYKE->QMAIL: HELO myname
QMAIL->SPAMDYKE: 250 Hello, pleased to meet you.
SPAMDYKE->QMAIL: MAIL FROM:<[EMAIL PROTECTED]>
QMAIL->SPAMDYKE: 250 OK
SPAMDYKE->QMAIL: RCPT TO:<[EMAIL PROTECTED]>
QMAIL->SPAMDYKE: 250 OK
SPAMDYKE->QMAIL: RCPT TO:<[EMAIL PROTECTED]>
QMAIL->SPAMDYKE: 250 OK
SPAMDYKE->QMAIL: DATA
QMAIL->SPAMDYKE: 354 Proceed.
SPAMDYKE->QMAIL: (message data)
QMAIL->SPAMDYKE: 250 Accepted.
SPAMDYKE->QMAIL: QUIT
QMAIL->SPAMDYKE: 221 Bye.
Two recipients were given and both were accepted.  Both recipients will 
receive the message.  If that happens, it's like spamdyke isn't 
installed at all -- all mail will get through no matter what.

-- Sam Clippinger

Andras Korn wrote:
> On Sat, Apr 12, 2008 at 06:10:04PM -0500, Sam Clippinger wrote:
> 
>> The RHSBL filter checks rDNS names and sender addresses, not recipient
>> addresses.
> 
> I know.
> 
>> It also produces permanent rejection codes, not temporary ones.
> 
> OK, with RHSBL, that is probably justified. However, I hope RBL by default
> produces temporary rejects?
> 
>> If you're seeing the same sender rejected repeatedly, it's because the
>> remote server is sending repeatedly.
> 
> Strange that they didn't do it so far, but apparently this is the case.
> 
>> Also, spamdyke should be disconn

Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-12 Thread Andras Korn
On Sat, Apr 12, 2008 at 06:10:04PM -0500, Sam Clippinger wrote:

> The RHSBL filter checks rDNS names and sender addresses, not recipient
> addresses.

I know.

> It also produces permanent rejection codes, not temporary ones.

OK, with RHSBL, that is probably justified. However, I hope RBL by default
produces temporary rejects?

> If you're seeing the same sender rejected repeatedly, it's because the
> remote server is sending repeatedly.

Strange that they didn't do it so far, but apparently this is the case.

> Also, spamdyke should be disconnecting (and killing) qmail as soon as 
> the blacklisted sender is given (depending on your configuration -- if 
> you're using a recipient whitelist, qmail is disconnected after the RCPT 
> command).  After that, all SMTP traffic is answered by spamdyke (with 
> rejection codes).  So at least for that short time, spamdyke is saving 
> resources.
> 
> However, with regard to blacklisted recipients, the reason spamdyke runs 
> its filters before passing the RCPT command to qmail is because there 
> may be multiple recipients.  Once a recipient has been passed to qmail, 
> it cannot be removed.  Passing the RCPT command just to check the status 
> code would effectively defeat spamdyke.
> 
> For example, imagine an unpatched qmail server.  The remote server names 
> a blacklisted recipient, spamdyke passes it to qmail, checks the status 
> code, then sends a rejection to the remote server.  Then the remote 
> server names a second recipient that is not blacklisted.  spamdyke must 
> allow the message to pass through because the second recipient is 
> legitimate.  However, because the first recipient was already sent to 
> qmail, that recipient will also receive the message.

I'm not sure I understand what you're saying.

If a recipient is blacklisted in spamdyke, spamdyke should of course reject
it.

If it is blacklisted by qmail, qmail will reject it and spamdyke needn't
worry about it.

The SMTP conversation can continue, with each recipient specified by the
client being treated as above.

Finally, if any recipients were accepted by the backend qmail, spamdyke can
check RBL and RHSBL, and if there is a match, reject the client temporarily
(for RBL) or permanently (in the case of RHSBL), and send a QUIT to the
backend qmail.

The costly DNS lookups needn't be performed at all if qmail rejects all
recipients.

I see no situation where this scheme would result in mail being passed to
recipients who would otherwise not receive it.

I think all feasible local tests should be carried out before resorting to
remote tests, because those can be (and typically are) much slower.

Andras

-- 
 Andras Korn 
  QOTD:
   When I was your age, we had to walk ten miles to a node.
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


Re: [spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-12 Thread Sam Clippinger
The RHSBL filter checks rDNS names and sender addresses, not recipient 
addresses.  It also produces permanent rejection codes, not temporary 
ones.  If you're seeing the same sender rejected repeatedly, it's 
because the remote server is sending repeatedly.

Also, spamdyke should be disconnecting (and killing) qmail as soon as 
the blacklisted sender is given (depending on your configuration -- if 
you're using a recipient whitelist, qmail is disconnected after the RCPT 
command).  After that, all SMTP traffic is answered by spamdyke (with 
rejection codes).  So at least for that short time, spamdyke is saving 
resources.

However, with regard to blacklisted recipients, the reason spamdyke runs 
its filters before passing the RCPT command to qmail is because there 
may be multiple recipients.  Once a recipient has been passed to qmail, 
it cannot be removed.  Passing the RCPT command just to check the status 
code would effectively defeat spamdyke.

For example, imagine an unpatched qmail server.  The remote server names 
a blacklisted recipient, spamdyke passes it to qmail, checks the status 
code, then sends a rejection to the remote server.  Then the remote 
server names a second recipient that is not blacklisted.  spamdyke must 
allow the message to pass through because the second recipient is 
legitimate.  However, because the first recipient was already sent to 
qmail, that recipient will also receive the message.

-- Sam Clippinger

Andras Korn wrote:
> Hi,
> 
> since I installed spamdyke my logs are inundated with messages like this
> one:
> 
> DENIED_RHSBL_MATCH from: [EMAIL PROTECTED] to: [EMAIL PROTECTED] origin_ip: 
> 85.179.173.120 origin_rdns: e179173120.adsl.alicedsl.de auth: (unknown)
> 
> The recipient address is bogus and my (patched) qmail-smtpd would reject it
> permanently. Apparently, since it matches a RHSBL, spamdyke rejects the
> message temporarily, and the same client keeps trying for a while, always
> costing me some resources.
> 
> I think this is wasteful; it would be better to only do the RHSBL lookup
> after the backend qmail-smtpd accepted the recipient address. If the
> backend qmail-smtpd throws a permanent rejection, spamdyke could just pass
> it on to the client.
> 
> Andras
> 
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users


[spamdyke-users] let qmail decide if it accepts a recipient before doing RHSBL?

2008-04-12 Thread Andras Korn
Hi,

since I installed spamdyke my logs are inundated with messages like this
one:

DENIED_RHSBL_MATCH from: [EMAIL PROTECTED] to: [EMAIL PROTECTED] origin_ip: 
85.179.173.120 origin_rdns: e179173120.adsl.alicedsl.de auth: (unknown)

The recipient address is bogus and my (patched) qmail-smtpd would reject it
permanently. Apparently, since it matches a RHSBL, spamdyke rejects the
message temporarily, and the same client keeps trying for a while, always
costing me some resources.

I think this is wasteful; it would be better to only do the RHSBL lookup
after the backend qmail-smtpd accepted the recipient address. If the
backend qmail-smtpd throws a permanent rejection, spamdyke could just pass
it on to the client.

Andras

-- 
 Andras Korn 
  QOTD:
  Az elet olyan, mint egy motor. Be kell rugni, hogy menjen.
___
spamdyke-users mailing list
spamdyke-users@spamdyke.org
http://www.spamdyke.org/mailman/listinfo/spamdyke-users