http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5179
------- Additional Comments From [EMAIL PROTECTED] 2006-11-12 03:55 ------- More response from achowe: WRT to the spamd protocol, generally its fine; its the documentation that needs more precision so as to restrict implementations to a well defined model. I think part of my problem here is the loose model I find in SA when combined with things like DKIM need very exacting definitions and process, else simple things like white space cause havoc. Please note that I've NOT read the DKIM spec. to this date, though Eric Allman did provide me an overview last year (summer 2005), so my approach and arguments are from existing RFC standards, primarily 2821 and 2822. > I didn't realize that we are in effect treating the newline separator > between headers and body as part of the body when they use different > newlines. That may be a flat out bug, and we should at least explore the > implications of doing it one way vs the other and make an explicit decision. >From what I have observed and based on Ben's trouble ticket comments, I'm deducing that SA+DKIM relies on either: a) The first LF or CRLF found in a file given to the CLI or daemon. b) Take a sample of the first N lines ending in LF or CRLF and use the most popular. However, you have to be sure to sample BOTH header and body separately I think, since there may arise cases where header and body differ. c) Use the newline that acts as the header/body separator as your newline. However, the header/body separator can not be consider part of the message body (it never displays), its more closely tied to the headers, so using it as an indicator of newline style could be problematic- consider an app. that spits out RFC 2822 headers with CRLF newlines, but read/includes a unix file containing LF newlines for the body. The CRLF separator would be RFC compliant while the body would not; very similar to milter-spamc. d) Find the first blank line, ie. LF-LF or CRLF-CRLF as an indicator of line style. If using this method, you must not count the header/body separator that would be the first blank line found. My comments in c) about mixed sources ie. careless file inclusion apply. My thinking is that case c) or d) apply to SA, because when Ben modifies the milter and alters the header/body separator from CRLF to LF, some of his messages he processes pass SA+DKIM. If case a) were true, then my passing of headers with CRLF to SA would have affected the DKIM result such that Ben's mod would have failed. Now from my limited knowledge, DKIM is suppose to ignore (or has an option to ignore) whitespace, which I assume also includes vertical whitespace, not just horizontal whitespace. Case b) might apply if SA+DKIM is sampling newlines from both header and body and finding that the body newlines out weigh the header, which is to be expected but a skewed assumption. Some refresher/background on the milter API; a copy of docs can be found here: http://www.milter.org/milter_api/api.html The milter API is essentially linear in application, with call-backs to handlers made in step with the SMTP protocol up until DATA. From my understanding, sendmail saves the DATA content to a temp/queue file before proceeding with the latter half of the milter handlers, xxfi_header, xxfi_eof, xxfi_body, xxfi_eom. The xxfi_header callback passes an already parsed and split header as name and value, so the milter has no idea of the original form concering white space and newline termination. Consider: header: lots of leading white space continued second line The milter will see via handler xxfi_header(ctx, name, value): "header" "lots of leading white space continued second line" Or the value might also be: "lots of leading white space continued second line" but I've never verified how sendmail handles unfolding long headers in my milter work. In either case, my milter has to rejoin the header and pass it to SpamAssassin. As you can imagine, if DKIM is whitespace sensitive, then the potential for failures is huge. I could send "header: lots of leading white space continued second line\r\n" "header:lots of leading white space continued second line\r\n" "header: lots of leading white space continued second line\r\n" since RFC 2822 allows for zero or more white space after the header colon, though a single space is recommended. However, even with the above rejoining, this does not match what sendmail might have seen off the wire from a remote MTA/MUA. The xxfi_body handler simply receives unprocessed (64K) chunks of the message body from sendmail. I'm assuming sendmail has been careful to avoid any CRLF to LF changes that might occur when saving the message to a temp/queue file. milter-spamc just passes the body chunks "as is" to spamd. Now Ben's change involves just removing the CR from the CRLF header/boundary separator such that my reconstructed RFC 2822 compliant headers appear to work just fine with DKIM, which would indicate that the DKIM implementation ignores all white space and newlines in the headers, BUT fails to include the header/body separator as this was all that was changed by Ben's patch. So my belief has been that there is something wrong in SA+DKIM if the simple change in the header/body separator influences the result of the DKIM test and so my statement that the separator is being incorrectly assumed as part of the message body rather than a protocol element to be excluded. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
