Babu.N writes:
> http://wiki.apache.org/spamassassin/OutOfMemoryProblems
>
> This link suggests that one should skip sending large emails to
> SpamAssassin (for better performance). It states that "Tests show
> that larger messages are overwhelmingly likely to be non-spam, given
> the economics of spamming". If spammers use botnets to pump spam, is
> this statement still valid ?
Yes, pretty much; large spam still affects botnet senders, since it
greatly reduces the rate at which they can emit spam. (They care
a lot about that.)
The exception is Japanese-language spam targeted at recipients in Japan,
which tends to be pretty bulky -- I would guess due to the great consumer
broadband situation over there.
> In case of botnet spamming, spammers may send large emails (as it is
> the network of the botnet which is used, but not the spammer), with
> top most portion of the email containing spam message & rest of the
> email having some bulk to sizeup the email.
>
> Is it not better if SA takes any-size email & attempts scanning on
> only the top-most portion (say initial 500KB) of the email content
> (as it may not make sense for spammers to keep their advertisement in
> later portions of the email) ?
As Mark says, it would make sense to have a way for SpamAssassin to
deal more sensibly with large mails.
However, it's worth noting that your idea fails in the face of HTML
messages -- it's trivial for a spammer to generate a HTML message
along these lines:
From: spammer
Subject: hi
Content-Type: text/html
<div style="display: none">
[2MB of innocent-looking text]
</div>
[spam payload here]
Which then renders as:
From: spammer
Subject: hi
[spam payload here]
ie. the 2MB of innocent-looking text is silently hidden, serving only to
mislead naive filters. There are plenty more ways to do this using
Javascript and CSS, and probably MIME multipart/alternative tricks too.
It gets complex very fast...
--j.