dccproc/dccifd error

2011-12-22 Thread Herbert J. Skuhra
Hi,

I am using perl-5.10.1, amavisd-new 2.7.0, Mail-SpamAssassin-3.3.2 and 
dcc-dccd-1.3.140.

When I receive and scan a message with a 'X-DCC-xxx-Metrics'-header the 
following error is logged to maillog:

Dec 23 01:04:53 mx dccproc[81847]: unrecognized many usage: [-VdAQCHER]  [-h 
homedir] [-m map] [-w whiteclnt] [-T tmpdir][-a IP-address] [-f env_from] 
[-t targets] [-x exitcode][-c type,[log-thold,][spam-thold]] [-g 
[not-]type] [-S header][-i infile] [-o outfile] [-l logdir] [-B 
dnsbl-option][-L ltype,facility.level] ; fatal error

After modifying DCC.pm the error is gone:

--- DCC.pm.bak  2011-12-22 23:03:34.0 +0100
+++ DCC.pm  2011-12-22 23:22:11.0 +0100
@@ -859,7 +859,7 @@
   }
   if ($tag eq dcc:) {
# query instead of report if there is an X-DCC header from upstream
-   unshift(@opts, '-Q', 'many') if defined $permsgstatus-{dcc_raw_x_dcc};
+   unshift(@opts, '-Q') if defined $permsgstatus-{dcc_raw_x_dcc};
   } else {
# learn or report spam
unshift(@opts, '-t', 'many');

Is this the correct fix? Or is my setup broken?

Thanks.

-- 
Herbert


Re: Bayes and MySQL - does it actually work?

2011-12-22 Thread Marc Perkel



On 12/21/2011 10:58 AM, Robert Schetterer wrote:

Am 21.12.2011 19:10, schrieb Kris Deugau:

Marc Perkel wrote:

I've been trying for a long time to get bayes/mysql to actually work.
Running a dedicated server with MySQL. Several servers running SA
configured to talk to it.

I'm running big servers with lots of ram and raid 0 flash drives for
speed. Also using InnoDB. I'm beginning to wonder if it is ever going to
work and if someone is going to fix it?

I'm not sure what official testing has been done, but some testing I did
about a year ago when upgrading the SA cluster here showed pretty much
the same IO load for a global Bayes no matter what combination of
MyISAM, InnoDB, generic SQL, or MySQL-specific SA modules I used.

Enabling MySQL replication also bogged things down pretty badly.

Performance with the database on physical disks simply wasn't keeping up
with more than about double the average message rate (if that...), so I
fell back to the good enough setup of putting the SA database on a
RAMdisk, and tweaking the MySQL init script to reload the database on
startup.  A database dump is done once a day, about a half-hour after a
Bayes expiry run.

This is handling ~250K messages/day, although with some tweaks to
serialize mail delivery a little more to level off the extreme peaks in
messages/second it should probably be able to handle a lot more volume.

We also have several SA instances - on the inbound side, the first pass
has ~25 of the top-scoring only-hits-spam rules (mostly DNSBLs) to skim
off the junk that would usually score 15+ on a full ruleset.  Anything
that gets past that is then passed to a full SA instance with a long
list of local rules targeted at the ones reported as missed spam by
customers.  That first pass tags more than 80% of the junk for far less
processing cost than feeding it all through the full ruleset.

Occasional mail spikes[1] sometimes cause SA to slw
dooowwwnnn due to CPU contention (60+ spamd threads are simply going to
take a while to chew through mail if you've only got 16 logical CPU
cores), but otherwise a pair of dual-socket, quad-core Xeon E5630
machines with 12G of RAM are mostly idle.  (RAM usage is fairly steady
at just over 4G.)  Average scan times are just under a second.

-kgd

[1]  I'm looking at you, Rocket Science Group - hundreds of messages per
second from netblocks all over the US, all nominally operated by (AKA
tagged in WHOIS for) the same group - and quite a lot of it spam.
Unfortunately MailChimp seems to buy rack space, hosting, or managed
email servers from them or I'd drop all of their netblocks in the local
reject-at-the-border DNSBL and be done with it.

Interesting Infos, by the way
anyone knows postgresql performs better i.e with Bayes clusters etc ?
at last using postscreen has helped here stopping bots,so these mails
never reach spamd,
but for sure in large mailsystems a spamassassin setup
has to be configured very carefully ever, and analysed during runtime
to get performance tweaks
however 250K messages/day seems not that much to me
scanning outbound mail with spamd ,was slow here too,i only use
clamav-milter with sanesecurity for that, also for inbound before
spamass-milter

but no flames, for performance issues, a look to the total mailsetup
is needed ever, there is no straight right or wrong most cases
only analysing the bottlenecks will help



Maybe it's time for me to try postgresql. Can you provide a link to how 
to optimize SA for it?


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: dccproc/dccifd error

2011-12-22 Thread darxus
A new DCC.pm from the author of DCC was added to trunk on November 14th:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6698

Looks like it already handles your case:

DCC.pm:863: unshift(@opts, '-Q', 'many') if defined 
$permsgstatus-{dcc_raw_x_dcc};

That will be included in the next spamassassin re release, v3.4.0 (which
doesn't have a specific planned release date).


On 12/23, Herbert J. Skuhra wrote:
 Hi,
 
 I am using perl-5.10.1, amavisd-new 2.7.0, Mail-SpamAssassin-3.3.2 and 
 dcc-dccd-1.3.140.
 
 When I receive and scan a message with a 'X-DCC-xxx-Metrics'-header the 
 following error is logged to maillog:
 
 Dec 23 01:04:53 mx dccproc[81847]: unrecognized many usage: [-VdAQCHER]  
 [-h homedir] [-m map] [-w whiteclnt] [-T tmpdir][-a IP-address] [-f 
 env_from] [-t targets] [-x exitcode][-c type,[log-thold,][spam-thold]] 
 [-g [not-]type] [-S header][-i infile] [-o outfile] [-l logdir] [-B 
 dnsbl-option][-L ltype,facility.level] ; fatal error
 
 After modifying DCC.pm the error is gone:
 
 --- DCC.pm.bak  2011-12-22 23:03:34.0 +0100
 +++ DCC.pm  2011-12-22 23:22:11.0 +0100
 @@ -859,7 +859,7 @@
}
if ($tag eq dcc:) {
 # query instead of report if there is an X-DCC header from upstream
 -   unshift(@opts, '-Q', 'many') if defined 
 $permsgstatus-{dcc_raw_x_dcc};
 +   unshift(@opts, '-Q') if defined $permsgstatus-{dcc_raw_x_dcc};
} else {
 # learn or report spam
 unshift(@opts, '-t', 'many');
 
 Is this the correct fix? Or is my setup broken?
 
 Thanks.
 
 -- 
 Herbert
 

-- 
Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'. - The Color of Magic
http://www.ChaosReigns.com


Re: dccproc/dccifd error

2011-12-22 Thread darxus
On 12/22, dar...@chaosreigns.com wrote:
 DCC.pm:863: unshift(@opts, '-Q', 'many') if defined 
 $permsgstatus-{dcc_raw_x_dcc};

  I am using perl-5.10.1, amavisd-new 2.7.0, Mail-SpamAssassin-3.3.2 and 
  dcc-dccd-1.3.140.

  Dec 23 01:04:53 mx dccproc[81847]: unrecognized many usage: [-VdAQCHER]  
  [-h homedir] [-m map] [-w whiteclnt] [-T tmpdir][-a IP-address] [-f 
  env_from] [-t targets] [-x exitcode][-c type,[log-thold,][spam-thold]] 
  [-g [not-]type] [-S header][-i infile] [-o outfile] [-l logdir] [-B 
  dnsbl-option][-L ltype,facility.level] ; fatal error

  -   unshift(@opts, '-Q', 'many') if defined 
  $permsgstatus-{dcc_raw_x_dcc};
  +   unshift(@opts, '-Q') if defined $permsgstatus-{dcc_raw_x_dcc};

Yeah, I read that backwards.

Maybe it's handled by this?

./lib/Mail/SpamAssassin/Plugin/DCC.pm:692:  $x_dcc =~ s/many/99/ig;

- 
http://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/DCC.pm?view=markup

Info on trunk:  http://wiki.apache.org/spamassassin/DownloadFromSvn

The author did say I believe it is entirely upward compatible. in
November, which was well after the DCC 1.3.140 release, so it probably
works.

I'd be interested to hear how that works if you try it.  Might be worth
posting the results to that bug.

-- 
Whom God wishes to destroy, he first makes mad.
- Euripides (c.480 - 406 BC).
http://www.ChaosReigns.com


Re: Bayes and MySQL - does it actually work?

2011-12-22 Thread Robert Schetterer
Am 23.12.2011 02:45, schrieb Marc Perkel:
 
 
 On 12/21/2011 10:58 AM, Robert Schetterer wrote:
 Am 21.12.2011 19:10, schrieb Kris Deugau:
 Marc Perkel wrote:
 I've been trying for a long time to get bayes/mysql to actually work.
 Running a dedicated server with MySQL. Several servers running SA
 configured to talk to it.

 I'm running big servers with lots of ram and raid 0 flash drives for
 speed. Also using InnoDB. I'm beginning to wonder if it is ever
 going to
 work and if someone is going to fix it?
 I'm not sure what official testing has been done, but some testing I did
 about a year ago when upgrading the SA cluster here showed pretty much
 the same IO load for a global Bayes no matter what combination of
 MyISAM, InnoDB, generic SQL, or MySQL-specific SA modules I used.

 Enabling MySQL replication also bogged things down pretty badly.

 Performance with the database on physical disks simply wasn't keeping up
 with more than about double the average message rate (if that...), so I
 fell back to the good enough setup of putting the SA database on a
 RAMdisk, and tweaking the MySQL init script to reload the database on
 startup.  A database dump is done once a day, about a half-hour after a
 Bayes expiry run.

 This is handling ~250K messages/day, although with some tweaks to
 serialize mail delivery a little more to level off the extreme peaks in
 messages/second it should probably be able to handle a lot more volume.

 We also have several SA instances - on the inbound side, the first pass
 has ~25 of the top-scoring only-hits-spam rules (mostly DNSBLs) to skim
 off the junk that would usually score 15+ on a full ruleset.  Anything
 that gets past that is then passed to a full SA instance with a long
 list of local rules targeted at the ones reported as missed spam by
 customers.  That first pass tags more than 80% of the junk for far less
 processing cost than feeding it all through the full ruleset.

 Occasional mail spikes[1] sometimes cause SA to slw
 dooowwwnnn due to CPU contention (60+ spamd threads are simply going to
 take a while to chew through mail if you've only got 16 logical CPU
 cores), but otherwise a pair of dual-socket, quad-core Xeon E5630
 machines with 12G of RAM are mostly idle.  (RAM usage is fairly steady
 at just over 4G.)  Average scan times are just under a second.

 -kgd

 [1]  I'm looking at you, Rocket Science Group - hundreds of messages per
 second from netblocks all over the US, all nominally operated by (AKA
 tagged in WHOIS for) the same group - and quite a lot of it spam.
 Unfortunately MailChimp seems to buy rack space, hosting, or managed
 email servers from them or I'd drop all of their netblocks in the local
 reject-at-the-border DNSBL and be done with it.
 Interesting Infos, by the way
 anyone knows postgresql performs better i.e with Bayes clusters etc ?
 at last using postscreen has helped here stopping bots,so these mails
 never reach spamd,
 but for sure in large mailsystems a spamassassin setup
 has to be configured very carefully ever, and analysed during runtime
 to get performance tweaks
 however 250K messages/day seems not that much to me
 scanning outbound mail with spamd ,was slow here too,i only use
 clamav-milter with sanesecurity for that, also for inbound before
 spamass-milter

 but no flames, for performance issues, a look to the total mailsetup
 is needed ever, there is no straight right or wrong most cases
 only analysing the bottlenecks will help

 
 Maybe it's time for me to try postgresql. Can you provide a link to how
 to optimize SA for it?
 

sorry no, i have no links beside offical ones,
but i was told from good DB People postgresql
is more handy in Cluster Setups
but as i said , try to limit amount of mails
comming to spamassassin by using other filter tecs before it
this should help anyway, beside of the DB Stuff
-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria