Re: [AMaViS-user] Suggestion: Modify Amavis to optionally retain a virgin copy of each message processed...
Ken Morley wrote: > I'm using Postfix 2.4.6, Amavisd-new 2.5.2, ClamAV 0.91.2 and > Mail-SpamAssassin 3.2.3 in a Linux mail filter. I'm having problems > conveniently getting enough ham and spam for Bayes training. I'm aware > that Bayes is more closely related to SA than Amavisd, but please humor > me before sending me off to the SA forums :) > > I am currently using the Postfix always_bcc function to copy each email > coming through the system to postmaster. From postmaster's mailbox, I > manually classify and copy each email into seperate spam- or > ham- files. The problem is that this alters the recipient and adds > a number of X-Amavis headers that could affect Bayes accuracy. > > It seems to me that it would be better if Amavisd could just make an > un-altered copy of every e-mail it processes and place them in seperate > disk files. From that point, it should be fairly easy to write a script > that would allow postmaster to rquickly eview and classify the files. > Then, the script would assign the files an appropriate spam or ham > filename. That would take a lot of effort out of building a corpus. > > Any thoughts on that suggestion? > Besides Mark sugggestion, you can use recipient_bcc_maps instead of always_bcc. The idea is to use a regular expression to "keep" the original recipient. this looks like: /^(.*)@(example\.com)$/ [EMAIL PROTECTED] ('+' being configured as the extension delimiter). This way, you can easily retrieve the "original" recipient. - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
Re: [AMaViS-user] Suggestion: Modify Amavis to optionally retain a virgin copy of each message processed...
Ken, > I'm using Postfix 2.4.6, Amavisd-new 2.5.2, ClamAV 0.91.2 and > Mail-SpamAssassin 3.2.3 in a Linux mail filter. I'm having problems > conveniently getting enough ham and spam for Bayes training. I'm aware > that Bayes is more closely related to SA than Amavisd, but please humor > me before sending me off to the SA forums :) > > I am currently using the Postfix always_bcc function to copy each email > coming through the system to postmaster. From postmaster's mailbox, I > manually classify and copy each email into seperate spam- or > ham- files. The problem is that this alters the recipient and adds > a number of X-Amavis headers that could affect Bayes accuracy. > > It seems to me that it would be better if Amavisd could just make an > un-altered copy of every e-mail it processes and place them in seperate > disk files. From that point, it should be fairly easy to write a script > that would allow postmaster to rquickly eview and classify the files. > Then, the script would assign the files an appropriate spam or ham > filename. That would take a lot of effort out of building a corpus. Use archival quarantining: $archive_quarantine_method = 'local:archive/%m'; or a separate archive for clean and spam: $clean_quarantine_method = 'local:clean/%m.gz'; $spam_quarantine_method = 'local:spam/%m.gz'; and place the following in a SA config file (local.cf): bayes_ignore_header X-Envelope-To-Blocked bayes_ignore_header X-Quarantine-ID bayes_ignore_header X-Amavis-Alert bayes_ignore_header X-Amavis-OS-Fingerprint bayes_ignore_header X-Amavis-PolicyBank bayes_ignore_header X-Virus-Scanned (other header fileds like X-Spam-* are ignored by a SpamAssassin learner by default, including the Delivered-To). The archived message beyond the few prepended header fields is in its pristine form, as received by amavisd. Mark - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/
[AMaViS-user] Suggestion: Modify Amavis to optionally retain a virgin copy of each message processed...
I'm using Postfix 2.4.6, Amavisd-new 2.5.2, ClamAV 0.91.2 and Mail-SpamAssassin 3.2.3 in a Linux mail filter. I'm having problems conveniently getting enough ham and spam for Bayes training. I'm aware that Bayes is more closely related to SA than Amavisd, but please humor me before sending me off to the SA forums :) I am currently using the Postfix always_bcc function to copy each email coming through the system to postmaster. From postmaster's mailbox, I manually classify and copy each email into seperate spam- or ham- files. The problem is that this alters the recipient and adds a number of X-Amavis headers that could affect Bayes accuracy. It seems to me that it would be better if Amavisd could just make an un-altered copy of every e-mail it processes and place them in seperate disk files. From that point, it should be fairly easy to write a script that would allow postmaster to rquickly eview and classify the files. Then, the script would assign the files an appropriate spam or ham filename. That would take a lot of effort out of building a corpus. Any thoughts on that suggestion? Thanks! Ken Morley - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace ___ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/