http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5179





------- Additional Comments From [EMAIL PROTECTED]  2006-11-12 03:55 -------
More response from achowe:

WRT to the spamd protocol, generally its fine; its the documentation that needs
more precision so as to restrict implementations to a well defined model. I
think part of my problem here is the loose model I find in SA when combined with
things like DKIM need very exacting definitions and process, else simple things
like white space cause havoc. Please note that I've NOT read the DKIM spec. to
this date, though Eric Allman did provide me an overview last year (summer
2005), so my approach and arguments are from existing RFC standards, primarily
2821 and 2822.

> I didn't realize that we are in effect treating the newline separator
> between headers and body as part of the body when they use different
> newlines. That may be a flat out bug, and we should at least explore the
> implications of doing it one way vs the other and make an explicit decision.

>From what I have observed and based on Ben's trouble ticket comments, I'm
deducing that SA+DKIM relies on either:

a) The first LF or CRLF found in a file given to the CLI or daemon.

b) Take a sample of the first N lines ending in LF or CRLF and use the most
popular. However, you have to be sure to sample BOTH header and body separately
I think, since there may arise cases where header and body differ.

c) Use the newline that acts as the header/body separator as your newline.
However, the header/body separator can not be consider part of the message body
(it never displays), its more closely tied to the headers, so using it as an
indicator of newline style could be problematic- consider an app. that spits out
RFC 2822 headers with CRLF newlines, but read/includes a unix file containing LF
newlines for the body. The CRLF separator would be RFC compliant while the body
would not; very similar to milter-spamc.

d) Find the first blank line, ie. LF-LF or CRLF-CRLF as an indicator of line
style. If using this method, you must not count the header/body separator that
would be the first blank line found. My comments in c) about mixed sources ie.
careless file inclusion apply.

My thinking is that case c) or d) apply to SA, because when Ben modifies the
milter and alters the header/body separator from CRLF to LF, some of his
messages he processes pass SA+DKIM.

If case a) were true, then my passing of headers with CRLF to SA would have
affected the DKIM result such that Ben's mod would have failed. Now from my
limited knowledge, DKIM is suppose to ignore (or has an option to ignore)
whitespace, which I assume also includes vertical whitespace, not just
horizontal whitespace.

Case b) might apply if SA+DKIM is sampling newlines from both header and body
and finding that the body newlines out weigh the header, which is to be expected
but a skewed assumption.

Some refresher/background on the milter API; a copy of docs can be found here:
http://www.milter.org/milter_api/api.html

The milter API is essentially linear in application, with call-backs to handlers
made in step with the SMTP protocol up until DATA. From my understanding,
sendmail saves the DATA content to a temp/queue file before proceeding with the
latter half of the milter handlers, xxfi_header, xxfi_eof, xxfi_body, xxfi_eom.

The xxfi_header callback passes an already parsed and split header as name and
value, so the milter has no idea of the original form concering white space and
newline termination. Consider:

header:           lots of leading white space
    continued second line

The milter will see via handler    xxfi_header(ctx, name, value):

    "header"
    "lots of leading white space continued second line"

Or the value might also be:

    "lots of leading white space    continued second line"

but I've never verified how sendmail handles unfolding long headers in my milter
work.

In either case, my milter has to rejoin the header and pass it to SpamAssassin.
As you can imagine, if DKIM is whitespace sensitive, then the potential for
failures is huge. I could send

    "header: lots of leading white space continued second line\r\n"

    "header:lots of leading white space continued second line\r\n"

    "header:    lots of leading white space continued second
     line\r\n"

since RFC 2822 allows for zero or more white space after the header colon,
though a single space is recommended. However, even with the above rejoining,
this does not match what sendmail might have seen off the wire from a remote
MTA/MUA.

The xxfi_body handler simply receives unprocessed (64K) chunks of the message
body from sendmail. I'm assuming sendmail has been careful to avoid any CRLF to
LF changes that might occur when saving the message to a temp/queue file.
milter-spamc just passes the body chunks "as is" to spamd.

Now Ben's change involves just removing the CR from the CRLF header/boundary
separator such that my reconstructed RFC 2822 compliant headers appear to work
just fine with DKIM, which would indicate that the DKIM implementation ignores
all white space and newlines in the headers, BUT fails to include the
header/body separator as this was all that was changed by Ben's patch.

So my belief has been that there is something wrong in SA+DKIM if the simple
change in the header/body separator influences the result of the DKIM test and
so my statement that the separator is being incorrectly assumed as part of the
message body rather than a protocol element to be excluded.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to