This isn't how Sendmail works. The entire message is cached to the queue
before milter is told anything about the headers or body. There's no "a few
seconds ahead", it's all the way ahead. Milter has no opportunity to say
REJECT in the middle of the SMTP DATA phase because the filter doesn't even
know that's where the MTA is.
Murray is right. I've confirmed this with a test on my own Sendmail setup.
Interesting... Our milter implementation actually sends the message to the
client as it is received. Of course any REJECTs or ACCEPTs or whatever that
occur along the way are noted and further output to the milter is blocked.
The status is then considered after the message is complete.
Implementing it the way we did was something of a PITA; I certainly can
understand why sendmail opted for the collect then send approach.
The sendmail way has the advantage of being more efficient if you have
slow clients and/or heavy milter processes.
How does that follow? The only difference is that in one case you're feeding it
the data as it comes in while in the other you're buffering up the data and
sending it as soon as the end of the message is received. Either way the milter
is running throughout the message transfer, unless you're willing to hold off
on even starting the milter until after the message transfer, in which
case it can't do things like refuse specific recipients.
Maybe I'm wrong, but I thought sendmail starts milters at session startup.
On my system (Red Hat Linux), the milter (PyMilter) runs all the time as
a process separate from Sendmail. Routines within the milter process
are called after each SMTP command, after each header, at the end of all
headers, after each block of body data, and at the end of all data. A
separate SpamAssassin process is started (by the milter end of data
routine) for each message needing filtering. (The SpamAssassin process
is actually a stub process, which is run for each message, and the main
SpamAssassin process, which stays in memory all the time.) I could be
wrong on some details here.
So I'm as puzzled as Ned about the claims of efficiency. It might make
sense for SpamAssassin to wait for the end of data, but I can't see how
buffering all the data, and not actually running each milter routine at
the time it appears to be called, I can't see how that does anything but
open a door for abuse.
************************************************************ *
* David MacQuigg, PhD email: macquigg at ece.arizona.edu * *
* Research Associate phone: USA 520-721-4583 * * *
* ECE Department, University of Arizona * * *
* 9320 East Mikelyn Lane * * *
* http://purl.net/macquigg Tucson, Arizona 85710 *
************************************************************ *