Alessandro Vesely wrote: >> Hector wrote: >> 4 - Lines over 998 (1000 with CRLF), this is an invalid RFC5322, but >> its possible some verifiers are designed to do a buffered C14N >> and don't check for RFC5322 line lengths between two memory points >> in the buffer as oppose as a line by line feed into the C14N >> function. Why buffer vs line? speed. > > I imagined the C14N function reads characters one by one. On finding > CRLF it can go back a few bytes to remove end-of-line punctuation. > However you code c14n(), it will be sparklingly faster than sha256(). > > However, distinguishing begin middle of line versus begin/end is > possibly inconsistent, since line breaks may be altered because of > invalidly long lines or RFC3676 rewrapping. > >> I found 98 such buffer hash errors from various domains due to >> having at least 1 super long line. > > Some MUAs consistently keeps paragraphs on a single line.
Or rather on the WRITER side, if set, and they can use: - use QP to send it (not necessarily save it as QP on the local user side). - use automatic word wrapping based on the width set (usually < 70 to stay consistent with console terminal types). But when not, on the READER side, as you know, reading can be difficult under the display device that doesn't word wraps it for you. This is often a design problem for WEB based viewing because it may depend on the HTML tag used to display it. <pre> preformatted - viewer will display as is. <p> paragraph - viewer will word wrap When I read this list mail, for example, via the archive on the web, some participant's mail are not word wrapped. So I often just copy and paste it into my editor, hit ALT-B to word wrap it just so I can read it. This is an old problem where people assumed (or didn't) everyone is using the same reading devices. In our WEB mail viewer, I forget the logic but it has been adjusted over time, and "security" is part of it. The best way (under web) to do it is to use multipart/alternative with text/plain and text/html to allow reading devices to be smart. When you made your suggestion, I thought you and I were thinking the same thing in regards to the C14N/HASH should be on what the user "wrote" and what is "read," i.e. the actual context, with all the "color" and formatting removed. I think the idea has theoretical merit but it is definitely extra processing on both ends (signer and verifier) so it may not be feasible. Just wish to make a few other points regarding the larger buffer readers. Messages with illegal (length) text lines +------------------------------------------------+ | signer total illegal hops | |------------------------------------------------| | coldwatercreek.com 78 78 2 | | livingsocial.com 41 2 1 | | jcprewards.com 23 4 2 | | news.redlobster.com 11 4 1 | | xanthianoutswagger.net 2 2 1 | | resultsmail.com 2 2 1 | | trl3.net 2 2 1 | | numbersoft.info 1 1 1 | +------------------------------------------------+ all spam. Obviously there are a lot of spammers and eMarketers that believe that it (illegal lines) is not checked, and for practical reasons, it isn't, or it isn't done to a latter point. Most SMTP server use a larger buffer data blocks to read the DATA stream. Reading this line by line is extremely inefficient and poor TCP and networking performance. It can be the difference between a 2 second upload to a 1 minute upload. It wouldn't be hard to do a line length check between two <CRLF> points, but for us, this (illegal lines) wasn't detected until DKIM checking was added. So if there anything positive about DKIM is that its helping system do things they probably never bother (didn't need to or want to) check before. The multiple From: header was among them. The main point is that there are many systems with a transport mechanism that really don't care what is in the payload. But RFC5322 compliancy is becoming a more aspect to take into account, and including line lengths should be among them. -- Hector Santos, CTO http://www.santronics.com http://santronics.blogspot.com _______________________________________________ NOTE WELL: This list operates according to http://mipassoc.org/dkim/ietf-list-rules.html