https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5185

--- Comment #23 from Kevin A. McGrail <[email protected]> 2012-02-13 17:09:48 
UTC ---
> > However, the msgid is still generated including a substantial portion of the
> > body of the email.  How can they generate a message ID for an email with
> > differing content as you propose?
> 
> Just start with 1k of untokenizable text, perhaps hidden in mime or html.
> 
> Since it's pristine_body I guess it could just be 1024 spaces.

Perhaps generate the msg_id off of more content than 1024 bytes AND why not
strip all whitespace? Is sha1_hex that slow that generating it for the entire
message is computationally unwanted?

> Consider hashing in relays-external to make it harder to exploit. Perhaps you
> might need to remove some information from the first section if it's not 
> stable
> between the temporary and permanent header.  

Right now, before the change, I believed we used the most recent received
header as essentially nothing more than a "unique" timestamp header because
it's the only header that is trustable not to have been mutated.

Since that header isn't reliably available, I feel removing it is best but
perhaps as you said, it's either rcvd header w/ 1024 bytes or no rcvd header &
all the body (perhaps sans whitespace).

I think if we can get a msg_id that is more unique to the message sans the
transport path, it could IMPROVE bayes use.

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to