https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5185
Kevin A. McGrail <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] Target Milestone|Future |3.4.1 --- Comment #9 from Kevin A. McGrail <[email protected]> 2012-01-17 16:07:58 UTC --- Comment 4 is extremely well written about the issue and I am intimately familiar with this type of scenario. Here's a shorter version of the issue: - SA generates the message checksum from the received header timestamp and mail id. - In a milter environment using a synthesized receiver header to call SpamAssassin, this causes the same message to get two checksums because the timestamp is different and (I think) the mail id is different than what the email will get when the MTA really delivers the email. - This annoys bayes people trying to unlearn a message because the checksums don't match. You can also ending up with "duplicates" trying to do batch processing on the same message later. If the milter can get access to the real message ID at the time it synthesizes the Received header, than perhaps we can create message checksums using only the day and the message ID. My theory is that this will produce a checksum that is distinct enough to be unique, reduce emails with two different checksums to those emails received very close to midnight that cross the day barrier between milter scanning and actual MTA delivery. If the milters really can't get the real message ID, there may not be a good solution beyond recommending that those using milters not use sa-learn midstream. I'll email David Skoll with Roaring Penguin and ask his input. He's a milter expert and might know if we can get the real message id in time for the synthesized header for sendmail/postfix/exim/etc. Regards, KAM -- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.
