I'd have to go look at the mail archives, assuming we discussed it
in email and not just irc ...  but I seem to recall it had to do with
mails coming in w/ the same message-id and sa-learn seeing them as the
same message, thereby bypassing our ability to learn tokens.  Since we
already generated ids for some mails, it was easy to make it the default
w/ some backward compatibility.

Digging through the code + svn logs a bit:

------------------------------------------------------------------------
r6733 | felicity | 2004-02-18 18:26:01 -0500 (Wed, 18 Feb 2004) | 1 line

bug 3055: spammers are using the same message id to get around bayes being
able to learn different messages.  make the hash message-id the default now,
but be backwards compatible with the seen db.
------------------------------------------------------------------------

On Wed, Feb 11, 2009 at 10:17:44AM +0000, Justin Mason wrote:
> On Tue, Feb 10, 2009 at 19:37, Michael Parker <[email protected]> wrote:
> >
> > On Feb 10, 2009, at 1:31 PM, Mark Martinec wrote:
> >>
> >> Bug or feature?
> >
> > Feature.  Theo can talk more to this but I believe we wanted to standardize
> > on a generated id instead of using the header value since headers are easily
> > forged/duplicated even though the message wasn't the same.
> 
> yeah.  we should probably have added a comment to this effect I guess ;)
> 
> Also, Message-IDs are occasionally omitted; it's a SHOULD rather than
> a MUST in rfc
> 822.  this is bad practice, but it happens.  in that case we had to
> generate an ID
> anyway.
> 
> --j.

-- 
Randomly Selected Tagline:
Wit, n.:
        The salt with which the American Humorist spoils his cookery
        ... by leaving it out.
                -- Ambrose Bierce, "The Devil's Dictionary"

Attachment: pgpDDR2n5U9vY.pgp
Description: PGP signature

Reply via email to