Re: Odd behaviour under load.

2009-05-10 Thread mouss
Charles Gregory a écrit :
 On Fri, 8 May 2009, Mark wrote:
 Okay, working from the idea that indeed the connecting client is
 timing out waiting for the 250 OK after sending the message, I would
 think DNS lookups are the most costly, time-wise. So, I would examine
 the RBL lookups first: it only takes the presence of one or two
 defunct DNSBL lookups, or an unruly 'rbl_timeout' setting, to muck
 things up.
 
 I run the 'standard' set of RBL lookups for SA. Though it's worth
 mentioning that my front end runs a half-dozen 'high trust' RBL's
 while processing the envelope RCPT command. So by the time DATA is
 accepted, those should be cached locally, 

does your front end parse the received headers and check the IPs found
there against DNSBLs? if not, then caching will only work for the IP
that sent the message to your front end.

 and not add to the time
 that SA is running. I think maybe I should look for a convenient way to
 limit the RBL checks to the 'most popular' (most reliable, good hits)
 RBL's


you may want a two-step SA:

- during smtp time, run a lightweight SA (minimize latency and
minimize FPs, at the cost of increasing the ratio of FNs)

- after mail has been accepted, pass it through a normal SA configuration




RE: Odd behaviour under load.

2009-05-08 Thread Mark
-Original Message-
From: Charles Gregory [mailto:cgreg...@hwcn.org]
Sent: donderdag 7 mei 2009 21:58
To: SpamAssassin Users List
Subject: Odd behaviour under load.

 Do yahoo and python.org enforce a shorter time-out?

Highly doubtful. RFC 2821, Section 4.5.3.2 (Timeouts) gives you a 2
minutes window while awaiting the 354 Start Input reply to a DATA
command. But please note that timeouts in this Section are a SHOULD, not
a MUST.

 Or does some characteristic of list mail make it take longer to
 process in sa?

Headers are part of the DATA stream. Hence, at the time a connecting
server is awaiting your 354 Start Input reply to their DATA command, the
whole notion of a (possible) mailing-list is simply not within scope yet
(as the headers have not been received yet). At the start of DATA, you've
basically just concluded the envelope-stage of the SMTP dialogue (MAIL
FROM, RCPT TO, etc). So, whatever characteristic of list mail *might* make
things taking longer to process in SA, it's simply not yet relevant here.

If you're doing Milter-type SPF-lookups right after MAIL FROM, for
instance, you'd have a 5 minutes timeout window. Same after RCPT TO. But
since both those actions are completed by the time the connecting server
issues its DATA command, there's basically no valid reason for your server
to wait unusually long to respond with its 354 reply. Not unless it's very
broken, that is.

 At first I didn't associate this behaviour with SA because the
 duplicates were being sent by the originating system, not from within
 mine. I wasn't even certain that the fault was mine, given the way it
 mostly seemed to occur with certain list servers. The mail was getting
 delivered to my local recipient, but the sending system was resending
 a few minutes later (and would do so three or four times).

It's quite possible for a connecting mail server to retry after it
received, say, one of the many 4.x.x codes. However, what's definitely
broken in your setup is that, in such a scenario, apparently your LDA
already delivers (part of an) aborted mail. That seems too strange for
words; so, likely, you're not sending out proper reply codes to connecting
mail servers.

So, the solution is simple: fix your mail server; or have someone do it
for you.

- Mark



RE: Odd behaviour under load.

2009-05-08 Thread John Hardin

On Fri, 8 May 2009, Mark wrote:


From: Charles Gregory [mailto:cgreg...@hwcn.org]


Do yahoo and python.org enforce a shorter time-out?


Highly doubtful. RFC 2821, Section 4.5.3.2 (Timeouts) gives you a 2 
minutes window while awaiting the 354 Start Input reply to a DATA 
command.


...are you sure that's the timeout he's referring to? How does SA come 
into play **before** the sending MTA sends the message body?


I suspect the sender is timing out waiting for the 250 OK after sending 
the message, hence my (humorous) 100 Please hold... suggestion. (Jeeze, 
SM, lighten up!)


Per RFC2821, the timeout for the 250 OK is ten minutes, but again, those 
are SHOULDs.


And I can see a busy list server reducing those timeouts a bit - they are 
sending a _lot_ of mail.



The mail was getting delivered to my local recipient, but the sending
system was resending a few minutes later (and would do so three or
four times).


It's quite possible for a connecting mail server to retry after it
received, say, one of the many 4.x.x codes. However, what's definitely
broken in your setup is that, in such a scenario, apparently your LDA
already delivers (part of an) aborted mail. That seems too strange for
words; so, likely, you're not sending out proper reply codes to
connecting mail servers.

So, the solution is simple: fix your mail server; or have someone do
it for you.


More likely the client is disconnecting after the timeout and _assuming_
a 4xx state, and requeueing the message. If there's any problem with
Charles' server it's that it continues delivery after the sending client
has dropped the connection. I wouldn't really see that as a problem,
though, as I'd rather get multiple deliveries in that case than have
both sides make assumptions the other direction and _lose_ the message.

Charles, here's something you could try: using telnet, connect to your
MTA and pretend you're the yahoo listserv. Feed in one of the
problematic messages, and time how long it takes from sending the final
period to receiving the 250 OK. That will give you something to compare
to how long it takes to scan the message outside of your MTA
envrionment.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Homeland Security: Specializing in Tactical Band-aids for Strategic
  Problems.   -- Eric K. in Bruce Schneier's blog
---
 Today: the 64th anniversary of VE day


Re: Odd behaviour under load.

2009-05-08 Thread Jonas Eckerman

Mark wrote:


Do yahoo and python.org enforce a shorter time-out?



Highly doubtful. RFC 2821, Section 4.5.3.2 (Timeouts) gives you a 2
minutes window while awaiting the 354 Start Input reply to a DATA
command.


Wich is of course irrelevant since SpamAssassin must be called after the 
client sends the data, not while the client if waiting for permission to 
send it.


The relevant timeout is the after the client sends CRLF.CRLF to end 
the data and then waits for 250 or an error. RFC2821 4.5.3.2 specifies 
that this timeout SHOULD be 10 minutes.


 But please note that timeouts in this Section are a SHOULD, not

a MUST.


Exactly. But one should remember that RFC2821 has this to say about SHOULD:

2.2.2:
---8---
there may exist valid reasons in particular circumstances to ignore a 
particular item, but the full implications must be understood and 
carefully weighed before choosing a different course.

---8---

I'm not convinced all mail list operators satisfies that (see below).

While I have no idea what Yahoo or python.org has set those timeouts to, 
I have seen mail from listservers with as low as a 1 (one) minute timeout.


(The rationale for one such system was that they needed to empty their 
mail queue as fast as possible. When told that the low timeout in 
reality might make them empty their queue slower rather than faster 
(because their system had to retry mail that would otherwise get out on 
the first try), they simply didnät listen.)



Or does some characteristic of list mail make it take longer to
process in sa?



Headers are part of the DATA stream. Hence, at the time a connecting
server is awaiting your 354 Start Input reply to their DATA command, the
whole notion of a (possible) mailing-list is simply not within scope yet
(as the headers have not been received yet).


But... SpamAssassin should not be called before the DATA is sent. 
SpamAssassin is a content filter that scans the data (including the 
headers).



It's quite possible for a connecting mail server to retry after it
received, say, one of the many 4.x.x codes. However, what's definitely
broken in your setup is that, in such a scenario, apparently your LDA
already delivers (part of an) aborted mail.


The LDA delivers what it gets, and really can't know ether the mail is 
complete or not. It's the responibility of the server wether a 
non-complete mail should be discarded or send to the LDA.


Once the client has sent data end (CRLF.CRLF) and the server has 
recieved it, it's normal (and correct) for the server to deliver the 
mail and then send a relevant response to the client.


If the client disconnects (due to a timeout or other problem) after 
sending CRLF.CRLF but before getting a response from the server, 
this tend to result in the server delivering the message and the client 
retrying it.


RFS2821 spoecifically mentions this.

4.5.3.2:
---8---
When the receiver gets the final period terminating the message data, it 
typically performs processing to deliver the message to a user mailbox. 
A spurious timeout at this point would be very wasteful and would 
typically result in delivery of multiple copies of the message, since it 
has been successfully sent and the server has accepted responsibility 
for delivery.

---8---

6.1:
---8---
To avoid receiving duplicate messages as the result of timeouts, a 
receiver-SMTP MUST seek to minimize the time required to respond to the 
final CRLF.CRLF end of data indicator.

---8---

The resoning behind this behaviour is that the server doesn't know 
wether the client received the 250 response or not. If the client did 
receive it, the server MUST take responisibilty for delivering the 
message (or notifying the sender of a failure if it can't deliver it). 
So the server takes the safe way of assuming the client did receive the 
positive response.


Similarly, since the client doesn't know wether the server managed to 
deliver the message or not, the client MUST take responsibility for 
retrying the message (or notifying the sender of a failure).


The basic idea is that it's more important to not loose mail than to 
allways avoid duplicates.


 That seems too strange for

words; so, likely, you're not sending out proper reply codes to connecting
mail servers.


Quite the opposite. It seems perfectly normal. His servers are most 
likely sending the correct responses to the client, but the client 
disconnects due to a timeout before getting the response and therefore 
doesn't know that the mail was eventually accepted for delivery.



So, the solution is simple: fix your mail server; or have someone do it
for you.


There's two parts to the solution.

Part One: Fix his mail server so that it does take overly long to 
respond to data end (CRLF.CRLF). This is a MUST.


It is, of course, quiote possible that a normally perfectly good mail 
server is temporarily overloaded, has disk problems, network problems, 
or for some other temporary reason takes a long time to respond. This 

RE: Odd behaviour under load.

2009-05-08 Thread Mark
-Original Message-
From: John Hardin [mailto:jhar...@impsec.org] 
Sent: vrijdag 8 mei 2009 15:52
To: Mark
Cc: users@spamassassin.apache.org
Subject: RE: Odd behaviour under load.

 On Fri, 8 May 2009, Mark wrote:

  From: Charles Gregory [mailto:cgreg...@hwcn.org]
 
  Do yahoo and python.org enforce a shorter time-out?
 
  Highly doubtful. RFC 2821, Section 4.5.3.2 (Timeouts) gives you a 2
   minutes window while awaiting the 354 Start Input reply to a DATA 
  command.

 are you sure that's the timeout he's referring to?

 I suspect the sender is timing out waiting for the 250 OK after
 sending the message, hence my (humorous) 100 Please hold...
 suggestion. (Jeeze, SM, lighten up!)

I suspect that's indeed what he meant. I guess I took his So obviously
some systems were 'timing out' waiting for my server to  respond to the
DATA command. a bit too literal.

 How does SA come into play **before** the sending MTA sends the
 message body?

It doesn't. :) That's why it did make no sense to me.

 More likely the client is disconnecting after the timeout and
 _assuming_ a 4xx state, and requeueing the message

 If there's any problem with Charles' server it's that it continues
 delivery after the sending client has dropped the connection.
 I wouldn't really see that as a problem, though, as I'd rather get
 multiple deliveries in that case than have both sides make assumptions
 the other direction and _lose_ the message.

Well, _assuming_ a 4xx state is an assumption, too. :) And not one I like
per se, to be honest (though one could make a good case for its rationale,
saying an unexpected disconnect is precisely the kind of 'transient'
error-state the 4.x.x codes are meant to cover). But resending a message,
over and over, without the receiving server indicating such need, well,
that could here and there raise an eyebrow or two as well.

Okay, working from the idea that indeed the connecting client is timing
out waiting for the 250 OK after sending the message, I would think DNS
lookups are the most costly, time-wise. So, I would examine the RBL
lookups first: it only takes the presence of one or two defunct DNSBL
lookups, or an unruly 'rbl_timeout' setting, to muck things up.

- Mark



RE: Odd behaviour under load.

2009-05-08 Thread SM

Hi John,
At 06:50 08-05-2009, John Hardin wrote:
I suspect the sender is timing out waiting for the 250 OK after 
sending the message, hence my (humorous) 100 Please hold... 
suggestion. (Jeeze, SM, lighten up!)


There has already been such a proposal.  Someone might take your 
humorous suggestion seriously, hence my comment.


Regards,
-sm  



Re: [sa] RE: Odd behaviour under load.

2009-05-08 Thread Charles Gregory

On Fri, 8 May 2009, Mark wrote:

Headers are part of the DATA stream. Hence, at the time a connecting
server is awaiting your 354 Start Input reply to their DATA 
command


My apologies. I have misled with the phrase 'data command'.
I was referring to the response that the sending server expects to see
after it has finished sending data.

- C


RE: Odd behaviour under load.

2009-05-08 Thread Charles Gregory

On Fri, 8 May 2009, John Hardin wrote:
I suspect the sender is timing out waiting for the 250 OK after 
sending the message, hence my (humorous) 100 Please hold... 
suggestion. (Jeeze, SM, lighten up!)


(nod) I should not have said data command. Apologies again.

And I can see a busy list server reducing those timeouts a bit - they 
are sending a _lot_ of mail.


That was my guess. Just wondering if anyone *knew* for certain.
I suppose I could always test it, but as I said, I figured out how to stop 
the problem. So another time, perhaps


Charles, here's something you could try: using telnet, connect to your 
MTA and pretend you're the yahoo listserv. Feed in one of the 
problematic messages, and time how long it takes from sending the final 
period to receiving the 250 OK. That will give you something to compare 
to how long it takes to scan the message outside of your MTA 
envrionment.


Actually, my logs are sufficiently verbose that I can check the times it 
takes from DATA until the message is handed off to my LDA. I didn't make a 
note of exact timing, but it wasn't 10 minutes. More like 4 or 5 
(though don't quote me on that - this is 'resolved' so I'm too busy to 
double-check those logs). As a side note, for people wondering why my SA 
is so slow, I thought I would mention that my SMTP front end (Mail 
Avenger) has a bug that prevents me from properly using 'spamc', so all 
load issues are compounded buy having to use 'spamassassin'. Fortunately, 
without those 'full' rules, it handles things quite well


- Charles


RE: Odd behaviour under load.

2009-05-08 Thread Charles Gregory

On Fri, 8 May 2009, Mark wrote:
Okay, working from the idea that indeed the connecting client is timing 
out waiting for the 250 OK after sending the message, I would think 
DNS lookups are the most costly, time-wise. So, I would examine the RBL 
lookups first: it only takes the presence of one or two defunct DNSBL 
lookups, or an unruly 'rbl_timeout' setting, to muck things up.


I run the 'standard' set of RBL lookups for SA. Though it's worth 
mentioning that my front end runs a half-dozen 'high trust' RBL's
while processing the envelope RCPT command. So by the time DATA is 
accepted, those should be cached locally, and not add to the time
that SA is running. I think maybe I should look for a convenient 
way to limit the RBL checks to the 'most popular' (most reliable, good 
hits) RBL's


- C


RE: Odd behaviour under load.

2009-05-08 Thread John Hardin

On Fri, 8 May 2009, Charles Gregory wrote:

As a side note, for people wondering why my SA is so slow, I thought I 
would mention that my SMTP front end (Mail Avenger) has a bug that 
prevents me from properly using 'spamc', so all load issues are 
compounded buy having to use 'spamassassin'. Fortunately, without those 
'full' rules, it handles things quite well


That bites.

You can probably work around it, though, by playing some PATH games and 
getting Mail Avenger to see a shell script named spamassassin (that 
actually runs spamc) before seeing the real spamassassin program.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It is not the place of government to make right every tragedy and
  woe that befalls every resident of the nation.
---
 Today: the 64th anniversary of VE day


RE: Odd behaviour under load.

2009-05-08 Thread Charles Gregory

On Fri, 8 May 2009, John Hardin wrote:

 ... my SMTP front end (Mail Avenger) has a bug that
 prevents me from properly using 'spamc'
You can probably work around it, though, by playing some PATH games and 
getting Mail Avenger to see a shell script named spamassassin (that 
actually runs spamc) before seeing the real spamassassin program.


The issue isn't getting spamc to run. The problem lays in the fact that
mail avenger uses a wrapper named 'edinplace' that keeps the message in 
its original standard input pipe, and enables spamassassin's modified 
output message to replace that standard input. edinplace has a problem 
with 'rewinding' the pipe if it has been fed to 'spamc' but not if it is 
fed to 'spamassassin'. I have written to the mail avenger author to 
address this issue.


- Charles


Odd behaviour under load.

2009-05-07 Thread Charles Gregory

Hallo!

Just wanted to throw in an observation on my system's behaviour with 
spamassassin 'overloaded' Not really a complaint, as I know what I 
did 'wrong'. But curious about one of the effects


During the recent run of image spams, I tried a couple of different
pieces of code that involved 'full' tests. I knew they would slow things 
down a bit, but they had an unexpected side-effect: Various people on my 
system, including myself, started to receive duplicates for some mail. 
Particularly *list* mails like the sa list, or yahoo groups.


At first I didn't associate this behaviour with SA because the duplicates 
were being sent by the originating system, not from within mine. I 
wasn't even certain that the fault was mine, given the way it mostly 
seemed to occur with certain list servers. The mail was getting delivered 
to my local recipient, but the sending system was resending a few minutes 
later (and would do so three or four times).


Yes, I have my spamassassin in my SMTP front end, so I figured out that 
the timing of the appearance of these duplicates coincided with my adding 
those 'full' tests, and when I removed them, the duplicates stopped.
So obviously some systems were 'timing out' waiting for my server to 
respond to the DATA command. But I am slightly curious as to why this only 
happens with mailing lists... Do yahoo and python.org enforce a shorter 
time-out? Or does some characteristic of list mail make it take longer to 
process in sa?


- Charles



Re: Odd behaviour under load.

2009-05-07 Thread John Hardin

On Thu, 7 May 2009, Charles Gregory wrote:

So obviously some systems were 'timing out' waiting for my server to respond 
to the DATA command.


Heh. Does the SMTP protocol need a 100 Please hold... reply?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control laws cannot reduce violent crime, because gun control
  laws assume a violent criminal will obey the law.
---
 Tomorrow: the 64th anniversary of VE day


Re: Odd behaviour under load.

2009-05-07 Thread SM

At 13:15 07-05-2009, John Hardin wrote:

Heh. Does the SMTP protocol need a 100 Please hold... reply?


No.  Fix the mail server instead of the protocol.

Regards,
-sm