Babu.N wrote: > In case of botnet spamming, spammers may send large emails (as it is > the network of the botnet which is used, but not the spammer), ...
I do see some large spam messages on occasion, not many, but still, perhaps it's time to start considering it. > Is it not better if SA takes any-size email & attempts scanning on > only the top-most portion (say initial 500KB) of the email content > (as it may not make sense for spammers to keep their advertisement in > later portions of the email) ? I have another problem on my mind, unrelated to your concerns, but requires the same kind of solution. To verify DKIM and DomainKeys -signed messages, one needs to have access to a complete message. The value of a verified signature goes beyond catching phishing in small messages. In order to do that, signatures currently need to be checked twice - by SpamAssassin for its own purpose, and by another program to be able to verify all mail, even large ones. This is not particularly efficient, and duplicates admin work, needing to handle an additional product. Another application that would benefit from seeing a complete message is a FuzzyOCR and similar plugins, which scan pictures in a mail, and picture-type spam seems to be the first one to exceed the few-hundred kB scanning limit. What I would like to see is for a caller of a SpamAssassin library to be able to pass to SA an open file handle (or a file descriptor) holding a complete original message, perhaps in addition to a current way of passing an in-memory copy of a (fraction of-) a mail. This way the SpamAssassin plugins and tools which are fast or need access to a complete message can have access to it through a file handle, and the rest can work as usual on a (possibly truncated) memory copy. See also http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5521 for my concerns about wasteful use of in-memory copies of mail in various forms, although the PR is just a tip of an iceberg and I didn't go into detail there. Mark
