On 2015-9-22 17:16 , Steffen Kaiser wrote:
> I had SpamAssassin rules allocating about 100MB, the forked children
> only shared the C libraries after some time. That's a problem of
> Perl's way to handle rereferences to data.

It might help not to integrate SpamAssassin, but to use spamc to communicate 
with spamd. However, that only saves memory if you have other stuff in your 
filter rules that take time processing messages, like DNS blacklist checks, 
virus scanners, SPF/DKIM/DMARC processing, SMTP forward lookups, etc.

That way, perl processes doing those other things do not have a big lump of 
SpamAssassin rules sitting in memory (which is usually quite a lot of memory 
due to the way spamassassin works). You'd generally need fewer spamassassin 
slaves than mimedefang slaves (if you don't, then this won't save memory but 
eat a bit more memory instead because of the extra perl processes involved).

On the other hand, it does make things a bit more complex because you have to 
manage another daemon, monitor it, restart when rules change, maintain configs, 
etc. Per-recipient rules might be somewhat harder.

Oh, and stock mimedefang doesn't support it. I've attached the SpamC.pm that we 
use for spamd communication. Also make sure that you set 
$Features{"SpamAssassin"} = 0 in your filter, to prevent Mail::SpamAssassin 
from loading (otherwise your mimedefang slaves would still eat memory for 
spamassassin).

You will need to modify this SpamC.pm as it uses a modular Mimedefang.pm, but 
the changes should be trivial.

-- 
Jan-Pieter Cornet <joh...@xs4all.nl>
"Any sufficiently advanced incompetence is indistinguishable from malice."
    - Grey's Law
package MailFilter::SpamC;

# provide spamc interface to spamassassin, call-compatible with mimedefang
# API
# ... mostly. It actually only provides spam_assassin_check().

use Mimedefang qw(gen_msgid_header synthesize_received_header
                  :global :logging :config);
use IPC::Open2;
use base Exporter;

our @SpamAssassinExtraHeaders;

our @EXPORT_OK = qw(
    spam_assassin_check
    @SpamAssassinExtraHeaders
);

my $spamc = "/usr/bin/spamc";
my @spamc_opts = qw(-F /etc/spamd/spamc.conf);

sub spam_assassin_check {
    ### open communications to spamc
    my $in;
    unless ( open $in, "<", "./INPUTMSG" ) {
        md_syslog('err', "$MsgID: Spamc error: Cannot read INPUTMSG: $!");
        return;
    }
    my($sprd, $spwr);
    my $sp_pid = open2($sprd, $spwr, $spamc, @spamc_opts);
    unless ( $sp_pid ) {
        md_syslog('err', "$MsgID: Spamc error: Cannot fork $spamc: $!");
        return;
    }
    ### note: the lines below duplicate the effect in the real
    ### spam_assassin_check somewhat

    ### build complete headers
    my $hdrs = "Return-Path: $Sender\n" .
        synthesize_received_header();
    $hdrs .= gen_msgid_header() if ($MessageID eq "NOQUEUE");

    ### get message headers, remember if we had a "To:" header
    my($seen_to, $seen_eoh);
    while ( <$in> ) {
        if ( /^$/ ) {
            $seen_eoh++;
            last;
        }
        $seen_to++ if /^To:/i;
        $hdrs .= $_;
    }
    $hdrs .= "To: undisclosed-recipients:;\n" if !$seen_to;
    if ( $AddApparentlyToForSpamAssassin and @Recipients ) {
        $hdrs .= "Apparently-To: " . join(", ", @Recipients) . "\n";
    }
    $hdrs .= join("", @SpamAssassinExtraHeaders);

    ### add header-body separation line that we ate in the loop above
    $hdrs .= "\n";

    ### $hdrs now contains the complete headers as sent to spamc

    ### send headers to spamc
    print $spwr $hdrs;
    ### send rest of message (if there was any left)
    if ( $seen_eoh ) {
        print $spwr $_ while <$in>;
    }
    close $spwr;

    ### wait for result
    my $output = join("", <$sprd>);
    close $sprd;
    waitpid($sp_pid, 0);
    
    if ( $? ) {
        md_syslog('err', "$MsgID: spamc returned non-zero exit code: $?\n");
        return;
    }

    my($hits, $req, $names, $report, %sa_tags);
    ### first line is hits/req
    if ( $output =~ s!\A(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)\r?\n!! ) {
        ($hits, $req) = ($1, $2);
    } else {
        my $sample = $output;
        if ( length($sample) > 80 ) {
            $sample = substr($sample, 0, 80) . "...";
        }
        $sample =~ s{[^ -~]}{sprintf("\\x%02x", ord $1)}ge;
        md_syslog('err',
            "$MsgID: Error: spamc returned invalid output: $sample");
        return;
    }

    ### process rest of output
    while ( $output =~ s/\A(\w+):\s+(.*)\r?\n// ) {
        my($k,$v) = ($1,$2);
        $hits = $v, next if $k eq "Score";
        $req = $v, next if $k eq "Required";
        $names = $v, next if $k eq "Tests";
        $sa_tags{$k} = $v;
    }
    ### anything that is left now is the full report
    $output =~ s/^\s+//;
    $report = $output;

    return($hits, $req, $names, $report, \%sa_tags);
}

1;

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Reply via email to