On Fri, Apr 20, 2001 at 02:06:02PM +1200, Jason Haar wrote:
> Hi there
>
> I'm the author of Qmail-Scanner - an Email scanning harness that can be used
> to block attachments, scan for viruses, etc. It's hooked in as a replacement
> for qmail-queue.
>
> The installation of a rather slow virus scanner on my own systems had lead
> me to realise a rare error condition I hadn't expected. This virus scanner
> didn't like scanning a 90Mb zip'ped AVI file (ahem) - whereas another vendor
> scanner took 1.5minutes to scan it, this one took nearly two hours...
>
> The sending SMTP server's qmail-remote timed out the SMTP session after 20
> minutes - as being in error - as it had waited "too long" for the final "OK".
> However, STDOUT on the receiving box still received the "mail from|rcpt to"
> envelope headers, so after 2 hours Qmail-Scanner happily delivered it back
> to the real qmail-queue for real delivery.
So let me get this right, what's happening is this:
o the remote site is connecting to qmail-smtpd
o qmail-smtpd is in turn invoking your replacement qmail-queue program
called Qmail-Scanner
o Qmail-Scanner is in turn invoking the real qmail-queue.
Your problem arises when Qmail-Scanner (more correctly the scanner it
invokes I guess) takes a long time to process the data. In fact longer
than the SMTP timeout of the remote site. Then here's what happens:
o the remote site times out and closes the socket thinking the email
delivery has failed
o meanwhile Qmail-Scanner et al are happily processing the email
totally oblivious to the lost connection. Eventually the scan
completes and the mail is injected into the local queue with
qmail-queue.
The key is that Qmail-Scanner doesn't know that the socket has been
closed and that qmail-smtpd has exited.
My suggestion is that you take a two-pronged approach.
First off, introduce a timeout in Qmail-Scanner and exit accordingly
(exit(52) according to the qmail-queue man page).
Second off, I'd determine the process id of the parent with getppid()
and at the point at which the scan is complete - but just prior to
completing the qmail-queue - I'd use kill(parent, 0) to determine that
qmail-smtpd is still around.
All you are really doing is reducing the window of risk to a very
small - but non-zero - size. But non-zero is ok as SMTP is idempotent.
Your remaining problem is that the sender will never succeed as the
mail is too large to process within their SMTP time-frame, so a better
strategy might be to disconnect the scanner from SMTP. This is pretty
trivial with a two-instance qmail install but it sure adds complexity
for your customers.
Regards.
>
> However... back on the sending host, it tried to send it again...
>
> I had a little loop going there - quite nasty. Can you say "busy system"? :-)
>
> Anyhoo, the virus scanner is the real culprit here - and that's something
> that can be fixed (i.e. get another). The problem is WHY did the recipient
> qmail-smtpd send through the envelope headers via STDOUT to
> qmail-queue/Qmail-Scanner? Upon noticing the sender going away, shouldn't it
> have recognised that as an error condition?
>
> I'm gonna have to alarm Qmail-Scanner so it also spits the dummy before 20
> minutes (I hope other MTAs don't have shorter timeouts). That way it'll
> always be telling the sender MTA it's in trouble.
>
> Another solution would be to just accept the message before scanning it, and
> scan it after the sending server has gone away - but then I'd have to write
> an entire requeuing infrastructure to handle transient errors too (not
> bl**dy likely ;-)
>
> Oh yeah - and please don't say "limit the size" - we LIKE sending large
> things here :-) [we just don't appear to like receiving them ;-)]
>
> Am I missing something here? This seems to imply that if you had
> /var/qmail/queue on a VERY slow (but otherwise reliable) disk, that you
> would see this problem too. I hope I'm just been stupid and missed
> something obvious...
>
>
> --
> Cheers
>
> Jason Haar
>
> Unix/Special Projects, Trimble NZ
> Phone: +64 3 9635 377 Fax: +64 3 9635 417