[Bug 5185] Bayesian learning uses different message checksums during exiscan_acl and later sa_learn

bugzilla-daemon Tue, 17 Jan 2012 08:08:23 -0800

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5185


Kevin A. McGrail <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]
   Target Milestone|Future                      |3.4.1

--- Comment #9 from Kevin A. McGrail <[email protected]> 2012-01-17 16:07:58 
UTC ---
Comment 4 is extremely well written about the issue and I am intimately
familiar with this type of scenario.

Here's a shorter version of the issue:

- SA generates the message checksum from the received header timestamp and mail
id.

- In a milter environment using a synthesized receiver header to call
SpamAssassin, this causes the same message to get two checksums because the
timestamp is different and (I think) the mail id is different than what the
email will get when the MTA really delivers the email.

- This annoys bayes people trying to unlearn a message because the checksums
don't match.  You can also ending up with "duplicates" trying to do batch
processing on the same message later.

If the milter can get access to the real message ID at the time it synthesizes
the Received header, than perhaps we can create message checksums using only
the day and the message ID.  

My theory is that this will produce a checksum that is distinct enough to be
unique, reduce emails with two different checksums to those emails received
very close to midnight that cross the day barrier between milter scanning and
actual MTA delivery.

If the milters really can't get the real message ID, there may not be a good
solution beyond recommending that those using milters not use sa-learn
midstream.

I'll email David Skoll with Roaring Penguin and ask his input.  He's a milter
expert and might know if we can get the real message id in time for the
synthesized header for sendmail/postfix/exim/etc.

Regards,
KAM

-- 
Configure bugmail: 
https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5185] Bayesian learning uses different message checksums during exiscan_acl and later sa_learn

Reply via email to