OVERVIEW

RFC 821 (SMTP) specifies a line length of 1000 characters (including CR LF
separators). Likewise, RFC 2045 (MIME) defines 998 octets per line for the
8bit Content-Transfer-Encoding. A good summary can be found at Wikipedia:
<http://en.wikipedia.org/wiki/MIME#Content-Transfer-Encoding>

DSPAM does not respect these limits when creating the body parts in the 8bit
encoding. The number of characters in the line is unbounded--resulting in a
message which does not comply with the SMTP and MIME standards.

The effect of this bug is that some messages that are processed with DSPAM
may be quietly discarded at various points in the mail system. In my case,
it was happening with Cyrus' deliver command (called via the
TrustedDeliveryAgent configuration.)

IF DSPAM IS EMBEDDING SIGNATURES IN THE MESSAGE BODY, MAIL CAN BE LOST
PERMANENTLY.

Since the root of the problem is independent of the delivery mechanism, I
suspect that this could be an issue with other configurations as well (e.g.
using LMTP & reinjection.) I have not tested them: you may want to on your
own system using the data provided below.

The only workaround I can find at the present time is to disable signatures
in the message body (using the signatureLocation=headers preference.) In my
case, this is not a viable workaround since many of our mail clients do not
have a way to bounce messages.


SAMPLE DATA

I have uploaded an archive that contains sample data:

<http://files.iconfactory.net/craig/bugs/dspam.tgz>

This archive contains two files:

capture -- the original message that I used to track down this problem.

capture_dspam_message -- the message after being processed by DSPAM. It was
created with the command "dspam --user dspam_test --deliver=innocent
--stdout" and a preference of "signatureLocation=message".

The problem is in the Content-Type: text/html component: the line is 70,816
characters long.


POSSIBLE SOLUTIONS

The source of the problem is when _ds_decode_block(block) is called in
find_signature. It creates the body in a buffer that is unbounded.

A quick solution, which I've tested and verified, is to take the body and
add CR LF every 1000 characters. The message will then pass through the mail
system without error. But there are layout problems in the user's mail
client because the HTML is not longer valid (e.g. "<SPAN" becomes
"<SP\r\nAN".)

It seems like a better solution would be to encode the block body with
"quoted-printable" and bounded line lengths.

Unfortunately, I haven't figured out where a good place to do this
encoding--there are a lot of different cases to handle (like signed
messages.) It also looks like I'd have to do the encoding logic myself
(EN_QUOTED_PRINTABLE in _ds_encode_block of decode.c is a TODO.)

I'm far from an expert in these areas, so I'm sure there are others on this
list who could better analyze this situation and help me solve it.

-ch



Reply via email to