https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5185
--- Comment #23 from Kevin A. McGrail <[email protected]> 2012-02-13 17:09:48 UTC --- > > However, the msgid is still generated including a substantial portion of the > > body of the email. How can they generate a message ID for an email with > > differing content as you propose? > > Just start with 1k of untokenizable text, perhaps hidden in mime or html. > > Since it's pristine_body I guess it could just be 1024 spaces. Perhaps generate the msg_id off of more content than 1024 bytes AND why not strip all whitespace? Is sha1_hex that slow that generating it for the entire message is computationally unwanted? > Consider hashing in relays-external to make it harder to exploit. Perhaps you > might need to remove some information from the first section if it's not > stable > between the temporary and permanent header. Right now, before the change, I believed we used the most recent received header as essentially nothing more than a "unique" timestamp header because it's the only header that is trustable not to have been mutated. Since that header isn't reliably available, I feel removing it is best but perhaps as you said, it's either rcvd header w/ 1024 bytes or no rcvd header & all the body (perhaps sans whitespace). I think if we can get a msg_id that is more unique to the message sans the transport path, it could IMPROVE bayes use. -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
