I'd have to go look at the mail archives, assuming we discussed it in email and not just irc ... but I seem to recall it had to do with mails coming in w/ the same message-id and sa-learn seeing them as the same message, thereby bypassing our ability to learn tokens. Since we already generated ids for some mails, it was easy to make it the default w/ some backward compatibility.
Digging through the code + svn logs a bit: ------------------------------------------------------------------------ r6733 | felicity | 2004-02-18 18:26:01 -0500 (Wed, 18 Feb 2004) | 1 line bug 3055: spammers are using the same message id to get around bayes being able to learn different messages. make the hash message-id the default now, but be backwards compatible with the seen db. ------------------------------------------------------------------------ On Wed, Feb 11, 2009 at 10:17:44AM +0000, Justin Mason wrote: > On Tue, Feb 10, 2009 at 19:37, Michael Parker <[email protected]> wrote: > > > > On Feb 10, 2009, at 1:31 PM, Mark Martinec wrote: > >> > >> Bug or feature? > > > > Feature. Theo can talk more to this but I believe we wanted to standardize > > on a generated id instead of using the header value since headers are easily > > forged/duplicated even though the message wasn't the same. > > yeah. we should probably have added a comment to this effect I guess ;) > > Also, Message-IDs are occasionally omitted; it's a SHOULD rather than > a MUST in rfc > 822. this is bad practice, but it happens. in that case we had to > generate an ID > anyway. > > --j. -- Randomly Selected Tagline: Wit, n.: The salt with which the American Humorist spoils his cookery ... by leaving it out. -- Ambrose Bierce, "The Devil's Dictionary"
pgpDDR2n5U9vY.pgp
Description: PGP signature
