Re: [Mimedefang] Embedded Perl (continued)

2015-09-23 Thread Jan-Pieter Cornet
On 2015-9-22 17:16 , Steffen Kaiser wrote:
> I had SpamAssassin rules allocating about 100MB, the forked children
> only shared the C libraries after some time. That's a problem of
> Perl's way to handle rereferences to data.

It might help not to integrate SpamAssassin, but to use spamc to communicate 
with spamd. However, that only saves memory if you have other stuff in your 
filter rules that take time processing messages, like DNS blacklist checks, 
virus scanners, SPF/DKIM/DMARC processing, SMTP forward lookups, etc.

That way, perl processes doing those other things do not have a big lump of 
SpamAssassin rules sitting in memory (which is usually quite a lot of memory 
due to the way spamassassin works). You'd generally need fewer spamassassin 
slaves than mimedefang slaves (if you don't, then this won't save memory but 
eat a bit more memory instead because of the extra perl processes involved).

On the other hand, it does make things a bit more complex because you have to 
manage another daemon, monitor it, restart when rules change, maintain configs, 
etc. Per-recipient rules might be somewhat harder.

Oh, and stock mimedefang doesn't support it. I've attached the SpamC.pm that we 
use for spamd communication. Also make sure that you set 
$Features{"SpamAssassin"} = 0 in your filter, to prevent Mail::SpamAssassin 
from loading (otherwise your mimedefang slaves would still eat memory for 
spamassassin).

You will need to modify this SpamC.pm as it uses a modular Mimedefang.pm, but 
the changes should be trivial.

-- 
Jan-Pieter Cornet 
"Any sufficiently advanced incompetence is indistinguishable from malice."
- Grey's Law
package MailFilter::SpamC;

# provide spamc interface to spamassassin, call-compatible with mimedefang
# API
# ... mostly. It actually only provides spam_assassin_check().

use Mimedefang qw(gen_msgid_header synthesize_received_header
  :global :logging :config);
use IPC::Open2;
use base Exporter;

our @SpamAssassinExtraHeaders;

our @EXPORT_OK = qw(
spam_assassin_check
@SpamAssassinExtraHeaders
);

my $spamc = "/usr/bin/spamc";
my @spamc_opts = qw(-F /etc/spamd/spamc.conf);

sub spam_assassin_check {
### open communications to spamc
my $in;
unless ( open $in, "<", "./INPUTMSG" ) {
md_syslog('err', "$MsgID: Spamc error: Cannot read INPUTMSG: $!");
return;
}
my($sprd, $spwr);
my $sp_pid = open2($sprd, $spwr, $spamc, @spamc_opts);
unless ( $sp_pid ) {
md_syslog('err', "$MsgID: Spamc error: Cannot fork $spamc: $!");
return;
}
### note: the lines below duplicate the effect in the real
### spam_assassin_check somewhat

### build complete headers
my $hdrs = "Return-Path: $Sender\n" .
synthesize_received_header();
$hdrs .= gen_msgid_header() if ($MessageID eq "NOQUEUE");

### get message headers, remember if we had a "To:" header
my($seen_to, $seen_eoh);
while ( <$in> ) {
if ( /^$/ ) {
$seen_eoh++;
last;
}
$seen_to++ if /^To:/i;
$hdrs .= $_;
}
$hdrs .= "To: undisclosed-recipients:;\n" if !$seen_to;
if ( $AddApparentlyToForSpamAssassin and @Recipients ) {
$hdrs .= "Apparently-To: " . join(", ", @Recipients) . "\n";
}
$hdrs .= join("", @SpamAssassinExtraHeaders);

### add header-body separation line that we ate in the loop above
$hdrs .= "\n";

### $hdrs now contains the complete headers as sent to spamc

### send headers to spamc
print $spwr $hdrs;
### send rest of message (if there was any left)
if ( $seen_eoh ) {
print $spwr $_ while <$in>;
}
close $spwr;

### wait for result
my $output = join("", <$sprd>);
close $sprd;
waitpid($sp_pid, 0);

if ( $? ) {
md_syslog('err', "$MsgID: spamc returned non-zero exit code: $?\n");
return;
}

my($hits, $req, $names, $report, %sa_tags);
### first line is hits/req
if ( $output =~ s!\A(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)\r?\n!! ) {
($hits, $req) = ($1, $2);
} else {
my $sample = $output;
if ( length($sample) > 80 ) {
$sample = substr($sample, 0, 80) . "...";
}
$sample =~ s{[^ -~]}{sprintf("\\x%02x", ord $1)}ge;
md_syslog('err',
"$MsgID: Error: spamc returned invalid output: $sample");
return;
}

### process rest of output
while ( $output =~ s/\A(\w+):\s+(.*)\r?\n// ) {
my($k,$v) = ($1,$2);
$hits = $v, next if $k eq "Score";
$req = $v, next if $k eq "Required";
$names = $v, next if $k eq "Tests";
$sa_tags{$k} = $v;
}
### anything that is left now is the full report
$output =~ s/^\s+//;
$report = $output;

return($hits, $req, $names, $report, \%sa_tags);
}

1;


signature.asc
Description: OpenPGP digital signature
___
NOTE: If 

Re: [Mimedefang] Embedded Perl (continued)

2015-09-22 Thread Steffen Kaiser

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 22 Sep 2015, Amit Gupta wrote:


My situation is that the number of mimedefang.pl processes jumps to
about 70 during peak loads (we are processing a couple hundres
messages per minute on average).  Our filter file is in need of some
optimizations(since each mimedefang.pl is taking about 125mb of

   ^^^


resident memory), but I'm wondering if using embedded perl will help
in this situation.  I see you mentioned using embedded perl prevents
forking entire processes.. So does this mean each request is handled
by a thread within the main process instead?  So would my RAM
requirements be reduced drastically?


Read Dianne's response about the garbage collector. Unless the script use 
very view different values of your loaded data or use weak references, you 
will not notice any reduction in long run.


I had SpamAssassin rules allocating about 100MB, the forked children only 
shared the C libraries after some time. That's a problem of Perl's way to 
handle rereferences to data.



- -- 
Steffen Kaiser

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEVAwUBVgFwzFGgR0+MU/4GAQJ5gQf7B/MqyaeU97R22AxFCsT2+/se7Aqy8yFK
oMcjXfsyIKG0sUVLbR5fGNALHtw/jpxDFiiikm2z7QzFIhingTUS04/zAwjuqVF2
LhvQ/RgZeGUyq8MHDd4z6sFLH8znbOINpnoIJBhrrE0ewq77gONwi8XRU+F/382z
VW3a0k8t9A2QRLqa2JgE1lsVF+mRM/R7/YCASf2CazscwdUtgd0bFUDbzYhGZvO3
Xm1hajxMjdm+xCMBN5WxsjO/iQ1Q9XF083oQy8A/1GGXJR9R91psU4q+Bsu7V5N8
LFLHKGLZayCms1Eh4qshEPtUJde8AX1CicVvr0u3q6DivQHTeQ08Zw==
=yqjd
-END PGP SIGNATURE-
___
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang


[Mimedefang] Embedded Perl (continued)

2015-09-22 Thread Amit Gupta
Apologies for starting a new thread. I couldn't find any messages in
my inbox to reply to.

Thanks Paul, Bill and Diane for your replies.

My situation is that the number of mimedefang.pl processes jumps to
about 70 during peak loads (we are processing a couple hundres
messages per minute on average).  Our filter file is in need of some
optimizations(since each mimedefang.pl is taking about 125mb of
resident memory), but I'm wondering if using embedded perl will help
in this situation.  I see you mentioned using embedded perl prevents
forking entire processes.. So does this mean each request is handled
by a thread within the main process instead?  So would my RAM
requirements be reduced drastically?

In my peak case, I roughly calculate my RAM usage just for md.pl to be
about 8GB.  If embedded perl makes this go down a lot, it's a major
win for me.

Thanks again for your help
___
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang


Re: [Mimedefang] Embedded Perl (continued)

2015-09-22 Thread Dianne Skoll
On Tue, 22 Sep 2015 07:57:18 -0700
Amit Gupta  wrote:

> My situation is that the number of mimedefang.pl processes jumps to
> about 70 during peak loads (we are processing a couple hundres
> messages per minute on average).

How much RAM do you have?  70 parallel scanners is not outlandish on
busy machines.  Our biggest scanning machine is configured to allow
up to 400 scanners.  It's a pretty powerful machine with 48GB of RAM,
though, and our volume is 5-10x yours.

> I see you mentioned using embedded perl prevents
> forking entire processes.

No... it still forks each time, but it doesn't exec a new program.

> So would my RAM requirements be reduced drastically?

Probably not.  As I said, embedded Perl helps a little bit, but not
dramatically.

Regards,

Dianne.
___
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang


Re: [Mimedefang] Embedded Perl (continued)

2015-09-22 Thread Amit Gupta
We have 16GB of ram, though there are other processes running on this
machine such as DB that will be segmented later.  I'm curious how much
resident memory each of your mimedefang.pl processes uses?  I haven't
been tracking my mimedefang.pl memory usage over time, so I was a
little surprised to see it at 125Mb.  Before I go down a rabbit hole
of minimizing it, i want to make sure it's actually significantly
higher than your situation.

Also, Am I right in thinking  the forking issue is not such a big deal
because the processes are pre-forked and stay running for some amount
of time and eventually get cleared down to your minimum setting.   I
have my min processes set to 10, and max to 100.. And my monitoring
system shows that I have about 20 running mimedefang.pl processes on
average.

On Tue, Sep 22, 2015 at 8:12 AM, Dianne Skoll  wrote:
> On Tue, 22 Sep 2015 07:57:18 -0700
> Amit Gupta  wrote:
>
>> My situation is that the number of mimedefang.pl processes jumps to
>> about 70 during peak loads (we are processing a couple hundres
>> messages per minute on average).
>
> How much RAM do you have?  70 parallel scanners is not outlandish on
> busy machines.  Our biggest scanning machine is configured to allow
> up to 400 scanners.  It's a pretty powerful machine with 48GB of RAM,
> though, and our volume is 5-10x yours.
>
>> I see you mentioned using embedded perl prevents
>> forking entire processes.
>
> No... it still forks each time, but it doesn't exec a new program.
>
>> So would my RAM requirements be reduced drastically?
>
> Probably not.  As I said, embedded Perl helps a little bit, but not
> dramatically.
>
> Regards,
>
> Dianne.
___
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang


Re: [Mimedefang] Embedded Perl (continued)

2015-09-22 Thread Dianne Skoll
On Tue, 22 Sep 2015 08:20:16 -0700
Amit Gupta  wrote:

> We have 16GB of ram, though there are other processes running on this
> machine such as DB that will be segmented later.  I'm curious how much
> resident memory each of your mimedefang.pl processes uses?

About 110MB, but not sure how much of that is shared.

> Also, Am I right in thinking  the forking issue is not such a big deal
> because the processes are pre-forked and stay running for some amount
> of time and eventually get cleared down to your minimum setting.

Forking is not a big deal at all.  execing may be more of a big
deal, but still probably not a major performance factor.

Regards,

Dianne.
___
NOTE: If there is a disclaimer or other legal boilerplate in the above
message, it is NULL AND VOID.  You may ignore it.

Visit http://www.mimedefang.org and http://www.roaringpenguin.com
MIMEDefang mailing list MIMEDefang@lists.roaringpenguin.com
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang