date:20061027

On Fri, Oct 27, 2006 at 08:08:36AM -0400, Patrick Sherrill wrote:
 How can you test new plugins?

Load the plugin and include any associated configs, then see what happens.

(the question is extremely vague, so this answer is probably not very useful.)

-- 
Randomly Selected Tagline:
What the hell is this?  For crying out loud, somebody throw a pie!
 - Peter Griffin on Family Guy


pgpPVrzzsjYRk.pgp
Description: PGP signature

RE: High CPU running SA in a VMware VM

The I/O rate is pretty low. The files going through expiration are only about 5 MB, and it only takes one of these to drive the CPU up. I think there are over 100,000 tokens in the file, each with a timestamp, and I believe there must be some sorting going on, so I suspect that is where the issue is.Thanks,Ian"Gary W. Smith" [EMAIL PROTECTED] wrote:What does the IO usage look like on the server? We ran a couple of our backup SA instances on VMWare but they database is on a remote SQL server. So the only IO is logging. We have several VM Instances for a variety of things. Did you pre-allocate the disk space? If not you might consider do that first and defragging the disk.From: Sammy Anderson [mailto:[EMAIL PROTECTED] Sent: Thursday, October 26, 2006 3:52 PM To: users@spamassassin.apache.org Subject: High CPU running SA in a VMware VMWe recently migrated our SpamAssassin installation from a physical 3.6 GHz system running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as the guest OS and SA 3.1.7. Each user has their own Bayes files (Berkeley DB) and these were copied from the old to the new server. Now whenever an expiry process runs on a user's database, the CPU spikes, sometimes for a minute or longer. We did not notice spikes on the old server, but it is really hammering the VM. Has anyone else experienced this problem? For now I have disabled Bayes altogether because of the unacceptable load.
--SA Do you Yahoo!? Get on board. You're invited to try the new Yahoo! Mail.
Do you Yahoo!?
Get on board. You're invited to try the new Yahoo! Mail.

Re: High CPU running SA in a VMware VM

The guest has more memory than it is using, so it isn't doing any paging or swapping.As for the ESX 2.5.4 box, it isn't swapping either. There is currently enough physical RAM for the few VM's running.[EMAIL PROTECTED] wrote: On Thu, 26 Oct 2006 15:52:17 -0700 (PDT) Sammy Anderson wrote:We recently migrated our SpamAssassin installation from a physical 3.6 GHz system running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as the guest OS and SA 3.1.7. Each user has their own Bayes files (Berkeley DB) and these were copied from the old to the new server. Now whenever an expiry process runs on a user's database, the CPU spikes, sometimes for a minute or longer. We did not notice spikes on the old
server, but it is really hammering the VM. Has anyone else experienced this problem? For now I have disabled Bayes altogether because of the unacceptable load.Perhaps memory started to spill into the swap on either the VM or guest OS.I don't know what version of VMWare you are using. I'm using v5.2.2 runningunder Windows. In the memory preferences I have mine set so all the virtualmachine memory has to fit into the reserved host ram. I've done small testswith SA before and haven't had any problems. Then again, I haven't foundanything I can use to put a load on a test install. My test bed is on aduo-core 3.2ghz with four gig of ram. The VM has a full gig of ram allocatedand is running the release version of FreeBSD 6.1. __Do You Yahoo!?Tired of spam? Yahoo! Mail has the best
spam protection around http://mail.yahoo.com

Re: mcafee-spamassassin-rules

On Fri, Oct 27, 2006 at 12:25:53PM +0200, Johann Spies wrote:
 just as well try and use those rules.  However, they were written for
 version 2.6 and 3.0.3-2sarge1 is complaining about those rules.

My recollection is that they're using a pre-3.0 version of SA, with (I'd
imagine) a number of modifications.

 Is there a way to utilize their updates with the later versions of
 spamassassin?  Or do I have to use there version of spamassassin to do
 so?  Would that be advisable?

It's hard to say since they could have modified their SA in any number
of ways.  You'd want to go through the config line by line and see what
can be used directly, what could be used with modification, and what
can't be used because it requires proprietary changes.  It's also worth
keeping in mind that spam detection isn't just about rules, it's also about
the engine, so just because rules work well with their code doesn't mean it'll
work well on the standard code.

It's also worth noting that hypothetically, if I was a company releasing
updates based on an open-source product, I may have incentive to avoid
making those updates useful on said product, otherwise people would
download my updates and not pay me for the software.

-- 
Randomly Selected Tagline:
the real ttys became pseudo ttys and vice-versa. - Today's BOFH Excuse


pgpMs1tMea3I6.pgp
Description: PGP signature

Re: what's the matter here? Text::Wrap

On Fri, Oct 27, 2006 at 05:15:57PM +0800, Xueron Nee wrote:
 When I use CPAN to upgade my SA from 3.1.4 to current version, it prints
 many warnings like these:
 
 t/rcvd_parser...ok 40/53(?:(?=[\s,]))* matches null string many 
 times in regex; marked by -- HERE in m/\G(?:(?=[\s,]))* -- HERE \Z/ at 
 /usr/lib/perl5/5.8.5/Text/Wrap.pm line 46.
[...]
 
 Seems there is something wrong with Text::Wrap.

Yep.

 # perl -MText::Wrap -e 'print $Text::Wrap::VERSION;'
 2006.0711
 
 cpan install Text::Wrap
 Text::Wrap is up to date.

Yeah, you need to downgrade since they haven't fixed this bug yet.

See http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5056 for more info.

-- 
Randomly Selected Tagline:
I like work; it fascinates me; I can sit and look at it funny...


pgp8W7a5JYlMx.pgp
Description: PGP signature

'spamassassin --revoke' and 'razor-revoke' are interchangeable?

2006-10-27 Thread Leon Kolchinsky

Hello all,

Could someone tell me if 'spamassassin --revoke' and 'razor-revoke' are 
interchangeable?

What exactly happening when I revoke the 'false negative' message? 
Its details reported to razor2 DB and BAYESIAN DB as ham? 
Are these messages being resend to the original recipients?


Can I use the following syntax on my Cyrus system?:
spamassassin --revoke /ham_folder/*
or
/usr/lib/razor-revoke /ham_folder/*
sa-learn --showdots --ham /ham_folder/*



Regards,
Leon Kolchinsky

Re: Scoring base64 blob messages

On Thu, Oct 26, 2006 at 12:19:23PM -0400, Peter H. Lemieux wrote:
 No, because there are going to be a lot of mails that would hit that.
 
 Really?  Maybe it's because I live in the US, but I can't think of a 
 legitimate message I've ever received consisting only of a base64 blob. 

You look at a lot of raw messages?  ;)

 Our of curiosity, how frequently does this appear in the SA ham corpus? 

Well, there isn't a SA corpus, so there's no answer to that question.  As
for how often it happens in my corpus, I don't know I'd have to write a rule
and run it against the messages.

 Rather than making anyone else do the work for me, is there something I 
 can read about how to determine the frequency of different message 
 features appearing in the corpus?

You can generate some rules and use mass-check to run against your own corpus
to gather some statistics.  I'm willing to run some rules for you against my
corpus if you want.  I just don't have time to come up with the rules right
now.

-- 
Randomly Selected Tagline:
strrev(strcpy(xus yti +7,varg)-7)[0]='G'


pgpF2Hq77D2uV.pgp
Description: PGP signature

Re: Scoring base64 blob messages

2006-10-27 Thread Stuart Johnston


Peter H. Lemieux wrote:

Theo Van Dinter wrote:

On Thu, Oct 26, 2006 at 09:46:28AM -0400, Peter H. Lemieux wrote:
Also is there an SA rule that scores messages that contain only a 
single base64 part (as opposed to a base64-encoded attachment)?  I 
doubt many legitimate messages arrive with only a single base64 part.


No, because there are going to be a lot of mails that would hit that.


Really?  Maybe it's because I live in the US, but I can't think of a 
legitimate message I've ever received consisting only of a base64 blob. 
Our of curiosity, how frequently does this appear in the SA ham corpus? 
Rather than making anyone else do the work for me, is there something I 
can read about how to determine the frequency of different message 
features appearing in the corpus?


Most messages sent from a Blackberry would hit this rule, for example.

RE: I'm thinking about suing Microsoft

2006-10-27 Thread Michael Beckmann

I think there is a problem where a version of XP downloads the security 
patches automatically, but does not install them. This does not lead to 
increased security, because most users are gnorant of security patches and 
would never install them manually.


Michael

--On Montag, 23. Oktober 2006 16:46 -0400 Rose, Bobby 
[EMAIL PROTECTED] wrote:




But windows patches are free.  Even if you are using an illegal copy of
windows, you can still manually download and install the patches.  It's
Microsoft Update where they mostly have the genuine windows verification
code.  Even Redhat forces you to pay subscriptions for their autoupdate
management stuff.

-Original Message-
From: Marc Perkel [mailto:[EMAIL PROTECTED]
Sent: Monday, October 23, 2006 3:59 PM
To: Jo
Cc: Duane Hill; users@spamassassin.apache.org
Subject: Re: I'm thinking about suing Microsoft



Popularity is a factor. But the real vulnerability is that Windows can
be more secure if it has the patches. If Linux for example restricted
it's seurity patches to only licensed users they would have the same
problem. I'm not saying either that MS should be compelled to distribute
any upgrades for free. Just secutiry fixes.

Re: Per Domain Whitelisting

2006-10-27 Thread Daryl C. W. O'Shea


Roman Sozinov wrote:



Peter H. Lemieux wrote:

jasonegli wrote:

For example let's say that domain xyz.com wants to allow all messages
from
yahoo.com, but domain 123.com does not. Is there a way to allow FROM
[EMAIL PROTECTED] TO [EMAIL PROTECTED]?
Obtuse SMTPD (http://sd.inodes.org/) can handle this at the SMTP level. 
I think it may be possible to add this to MailScanner 
(http://www.mailscanner.info/) through it's custom rules; its default 
whitelists/blacklists, however, are global.




What about spamassassin? Does it have possibility Per Domain Whitelisting?


Of course it does.  It supports per user preferences, so if you pass 
nothing but domain names it thus supports per domain preferences.


Daryl

Re: How to test new plugins

2006-10-27 Thread Patrick Sherrill

I guess what I'm looking for is a way to test the plug-ins/configuration 
against a separate instance of sa that would read the new cfs without 
restarting existing daemons (we're using amavis-new).

Pat...

- Original Message - 
From: Theo Van Dinter [EMAIL PROTECTED]

To: users@spamassassin.apache.org
Sent: Friday, October 27, 2006 11:00 AM
Subject: Re: How to test new plugins

Re: Scoring base64 blob messages

2006-10-27 Thread Daryl C. W. O'Shea


Peter H. Lemieux wrote:

Theo Van Dinter wrote:

On Thu, Oct 26, 2006 at 09:46:28AM -0400, Peter H. Lemieux wrote:


Also is there an SA rule that scores messages that contain only a 
single base64 part (as opposed to a base64-encoded attachment)?  I 
doubt many legitimate messages arrive with only a single base64 part.


No, because there are going to be a lot of mails that would hit that.


Really?  Maybe it's because I live in the US, but I can't think of a 
legitimate message I've ever received consisting only of a base64 blob.


I'm not sure what to say to that. ;)


Our of curiosity, how frequently does this appear in the SA ham corpus? 


Ticketmaster sends out *a lot* of their mail this way.  I'm sure it's 
partly in an attempt to avoid having their mail FP against crappy filters.



Daryl

Re: How to test new plugins

On Fri, Oct 27, 2006 at 12:40:57PM -0400, Patrick Sherrill wrote:
 I guess what I'm looking for is a way to test the plug-ins/configuration 
 against a separate instance of sa that would read the new cfs without 
 restarting existing daemons (we're using amavis-new).

You can copy the /etc/mail/spamassassin directory to somewhere else,
then change the pre and cf files in that dir.  Then you can test
spamassassin/spamd/etc with the --siteconfigpath option to override its
default value.  :)

(for spamd, if you already have a running copy at port 783, you'd want to run
it and spamc via a different port, of course.)

-- 
Randomly Selected Tagline:
linux: because a PC is a terrible thing to waste
 ([EMAIL PROTECTED] put this on Tshirts in '93)


pgpBiW2Klr9zW.pgp
Description: PGP signature

Re: Scoring base64 blob messages

On Fri, Oct 27, 2006 at 11:44:48AM -0400, Daryl C. W. O'Shea wrote:
 Ticketmaster sends out *a lot* of their mail this way.  I'm sure it's 
 partly in an attempt to avoid having their mail FP against crappy filters.

I'd also imagine that sometimes it's just easier to do this than try to pay
attention to what is being sent and determine if encoding is necessary.
Programmers tend to be lazy after all. :)

-- 
Randomly Selected Tagline:
There are two major products to come out of Berkeley: LSD and UNIX.  We
 don't believe this to be a coincidence.  - Unknown


pgpFdvR1uEW9A.pgp
Description: PGP signature

Re: spamd scan problem

2006-10-27 Thread Peter Teunissen



On 27-okt-2006, at 11:40, Frank van den Diepstraten wrote:

ok I understand that, but I wan't to know if this causes the  
problem. So I
want to trie it out without that razor thing... But I can't find  
the config

where it's enabled in.



Hi Frank,


To disable razor, add the following to your local.cf:

use_razor2  0

Peter


-Oorspronkelijk bericht-
Van: John Andersen [mailto:[EMAIL PROTECTED]
Verzonden: vrijdag 27 oktober 2006 11:36
Aan: users@spamassassin.apache.org
Onderwerp: Re: FW: spamd scan problem


On Friday 27 October 2006 01:32, Frank van den Diepstraten wrote:

But now the question is where I can
disable this razor thing...


No no, you want to ENABLE it on the good system.

Razor is wounderfull.  It just takes a little bit of time, but not
a great deal of CPU load.

Razor catches a lot of spam with almost a non-existant
false positive rate.

--
_
John Andersen

RE: mcafee-spamassassin-rules

2006-10-27 Thread Chris Santerre

Title: RE: mcafee-spamassassin-rules






 It's also worth noting that hypothetically, if I was a 
 company releasing
 updates based on an open-source product, I may have incentive to avoid
 making those updates useful on said product, otherwise people would
 download my updates and not pay me for the software.


Wouldn't that be against the open source lic? 


I'm sure they don't use open source rules either. *giggle*


--Chris

Re: I'm thinking about suing Microsoft

2006-10-27 Thread Jay Chandler

You have to explicitly choose that option.  Are you suggesting we shouldn't be able to choose that?  I'm not a big fan of trusting MS patches, as they tend to break things periodically...On Oct 27, 2006, at 8:47 AM, Michael Beckmann wrote:I think there is a problem where a version of XP downloads the security patches automatically, but does not install them. This does not lead to increased security, because most users are gnorant of security patches and would never install them manually.Michael--On Montag, 23. Oktober 2006 16:46 -0400 "Rose, Bobby" [EMAIL PROTECTED] wrote: But windows patches are free.  Even if you are using an illegal copy ofwindows, you can still manually download and install the patches.  It'sMicrosoft Update where they mostly have the genuine windows verificationcode.  Even Redhat forces you to pay subscriptions for their autoupdatemanagement stuff.-Original Message-From: Marc Perkel [mailto:[EMAIL PROTECTED]]Sent: Monday, October 23, 2006 3:59 PMTo: JoCc: Duane Hill; users@spamassassin.apache.orgSubject: Re: I'm thinking about suing MicrosoftPopularity is a factor. But the real vulnerability is that Windows canbe more secure if it has the patches. If Linux for example restrictedit's seurity patches to only licensed users they would have the sameproblem. I'm not saying either that MS should be compelled to distributeany upgrades for free. Just secutiry fixes.   -- Jay ChandlerNetwork Administrator, Chapman University714-628-7249 / [EMAIL PROTECTED]"Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never does quite what I want.  I wish Christopher Robin was here." -- Peter Da Silva in a.s.r.

RE: High CPU running SA in a VMware VM

2006-10-27 Thread Ring, John C

From: Sammy Anderson [mailto:[EMAIL PROTECTED] 

We recently migrated our SpamAssassin installation from a physical 3.6
GHz system
running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as
the guest OS
and SA 3.1.7.

I just did the same thing last week, except we're using RHEL 3 and ESX
2.5.2, and the physical box it used to be on was far less powerful then
yours.

Each user has their own Bayes files (Berkeley DB) and these were copied
from the old to
the new server.  Now whenever an expiry process runs on a user's
database, the CPU
spikes, sometimes for a minute or longer.

Hmm.  We're using ours as a site-wide MTA to be able to reject incoming
mails at SMTP time, so no user DBs on the box, but we are running with
Bayes checking on (Berkeley DB), autolearning off, and manual Bayes
feeding only a few times a day.  Because of that, I don't have practice
with a heavy Bayes load, but how certain are you that it's Bayes hitting
the CPU; did you run sa-learn (or spamassassin) with network reporting
turned off to see if that makes a difference?

I ask because pyzor did keep our CPU at a constant 75% until I turned it
off; now it varies from 25% to 75% over the day, which is a lot more
acceptable :)

Another thought, albeit perhaps not directly related, is are you running
spamd with --robin-robin?  When I did that, it reduced the CPU load with
the trade-off of using a little more memory, which seems to be the
better trade-off, especially for a VM on ESX.

-- 
John C. Ring, Jr. 
[EMAIL PROTECTED] 
Network Engineer
Union Switch  Signal Inc.

If men were angels, no government would be necessary. If angels were to
govern men, neither external nor internal controls on government would
be necessary. -- James Madison

Re: ImageInfo vs FuzzyOCR performance?

2006-10-27 Thread Kenneth Porter

--On Friday, October 27, 2006 6:29 AM -0700 Jeff Chan [EMAIL PROTECTED] 
wrote:



Does anyone have any recent feedback about the performance of
ImageInfo versus FuzzyOCR about detecting stock image spams (or
any others)?  Does FuzzyOCR catch significantly more spams than
ImageInfo?


The last I checked, ImageInfo simply reads some header info from the image. 
It's pretty lightweight, probably more so than any Perl-based regex in SA. 
FuzzyOCR is much more compute-intensive, since it has to perform image 
processing (through gocr, as well as conversions necessary to get the input 
into the format that gocr expects).

Re: [OT] Filter Server Specs

2006-10-27 Thread Clifton Royston

On Fri, Oct 27, 2006 at 02:42:49PM +, Duane Hill wrote:
 Currently, we are looking to install a server that will be doing content 
 filtering for our main e-mail server. I thought I would toss this out to 
 everyone to get some feedback on if the server would be adequate.
 
 The server is a Dell PowerEdge 6850 with the following:
 
  - Four 2.6 GHz/800Mhz/4mb Cache Dual-Core Intel Zeon 7110M processors
  - Eight GB DDR2 400Mhz ram
  - Four 300GB, 3Gbps, SAS, 10K RPM Hard Drives running Raid-5 on a 
 PERC5/i controller
 
 Our main e-mail server services over 500 domains with an account total 
 of around 40,000.
 
 The current filter server we have can not do any content filtering 
 outside of itself (i.e. the MTA) because of CPU load (i.e. 
 SpamAssassin). Any message scanning where the message size is over 1.5K 
 will kill the CPU. The current filter server we have in place is 
 rejecting an average 2.4 million per day with just the common 
 blacklisting and some other things that are set in place.
 
  I *think* this should handle your load.  Personally from my years of
ISP experience, I'd strongly favor going the road of multiple identical
servers in parallel rather than putting all your eggs in one basket. 
E.g. use two 4 CPU servers rather than one 8 CPU (4x dualcore) server.
The difference is that if it comes up just short, or if load jumps up
again, it's easier to add a 3rd server and cut it into the mail path
than to upgrade a server which is handling all your filtering.

  You also don't need fast hard drives on a filtering server; it's
almost all gonna be pushing the CPU and RAM.

 The other thing I would like to know is what kind of an operating system 
 would one install on this new server?

  This'll get you into a religious war for sure...  I would favor
FreeBSD latest (6.x), but any version of Linux with a good package
system and a recent 2.6 kernel is a good choice - maybe better than
FreeBSD at using 8 CPUs.  Reasonable possibilities include CentOS,
Gentoo, Debian.  I'm not a big Linux head, others may have stronger
opinions on that front.

  -- Clifton

-- 
Clifton Royston  --  [EMAIL PROTECTED] / [EMAIL PROTECTED]
   President  - I and I Computing * http://www.iandicomputing.com/
 Custom programming, network design, systems and network consulting services

RE: High CPU running SA in a VMware VM

I'm pretty sure it is that, because when I turn of bayes altogether, the spikes go away. I also ran sa-learn --force-expire and it PEGS the VM. With bayes debugging enabled, I see lines like this in my syslog:bayes: expired old bayes database entries in 236 seconds: 152268 entries kept, 9457 deletedWe have about 140 users, each with a 5 MB bayes_toks file, so there is a need to expire somebody all throughout the day. Each user is virtual, they don't really have an account on the box, but the directories correspond to each user address. And we do auto-learn, with opportunistic expiry.Good thought about --round-robin, I am willing to use a little more memory if it saves on CPU."Ring, John C" [EMAIL PROTECTED] wrote: From: Sammy Anderson
[mailto:[EMAIL PROTECTED] We recently migrated our SpamAssassin installation from a physical 3.6GHz systemrunning RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 asthe guest OSand SA 3.1.7.I just did the same thing last week, except we're using RHEL 3 and ESX2.5.2, and the physical box it used to be on was far less powerful thenyours.Each user has their own Bayes files (Berkeley DB) and these were copiedfrom the old tothe new server. Now whenever an expiry process runs on a user'sdatabase, the CPUspikes, sometimes for a minute or longer.Hmm. We're using ours as a site-wide MTA to be able to reject incomingmails at SMTP time, so no user DBs on the box, but we are running withBayes checking on (Berkeley DB), autolearning off, and manual Bayesfeeding only a few times a day. Because of that, I don't have practicewith a heavy Bayes load,
but how certain are you that it's Bayes hittingthe CPU; did you run sa-learn (or spamassassin) with network reportingturned off to see if that makes a difference?I ask because pyzor did keep our CPU at a constant 75% until I turned itoff; now it varies from 25% to 75% over the day, which is a lot moreacceptable :)Another thought, albeit perhaps not directly related, is are you runningspamd with --robin-robin? When I did that, it reduced the CPU load withthe trade-off of using a little more memory, which seems to be thebetter trade-off, especially for a VM on ESX.-- John C. Ring, Jr. [EMAIL PROTECTED] Network EngineerUnion Switch Signal Inc."If men were angels, no government would be necessary. If angels were togovern men, neither external nor internal controls on government wouldbe necessary." -- James Madison
Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail.

Re: High CPU running SA in a VMware VM

2006-10-27 Thread Anders Norrbring

Sorry about top-posting, but I just catched the topic, and found it a 
bit interesting...


I run my SMTP server entirely in a VMware VM, and have *never* seen a 
high CPU usage on that particular machine.  I run Postfix, Amavis-new 
2.4.3, SA 3.1.7 and quite some plug-ins.


Bayes and quarantine are all in a MySQL database stored on another VM, 
no big load there either...
At peaks, I have a 2-4% CPU usage and 20-65% memory usage on eash VM, 
all reported by Virtual Center 1.4.


So, naturally I'm curious about why there would be a high CPU load from 
using SA My guess is that it's something else causing it.


--

Anders Norrbring
Norrbring Consulting

Sammy Anderson skrev:
I'm pretty sure it is that, because when I turn of bayes altogether, the 
spikes go away.  I also ran sa-learn --force-expire and it PEGS the VM.  
With bayes debugging enabled, I see lines like this in my syslog:


bayes: expired old bayes database entries in 236 seconds: 152268 entries 
kept, 9457 deleted


We have about 140 users, each with a 5 MB bayes_toks file, so there is a 
need to expire somebody all throughout the day.  Each user is virtual, 
they don't really have an account on the box, but the directories 
correspond to each user address.  And we do auto-learn, with 
opportunistic expiry.


Good thought about --round-robin, I am willing to use a little more 
memory if it saves on CPU.


*/Ring, John C [EMAIL PROTECTED]/* wrote:

 From: Sammy Anderson [mailto:[EMAIL PROTECTED]
 
 We recently migrated our SpamAssassin installation from a physical 3.6
GHz system
 running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as
the guest OS
 and SA 3.1.7.

I just did the same thing last week, except we're using RHEL 3 and ESX
2.5.2, and the physical box it used to be on was far less powerful then
yours.

 Each user has their own Bayes files (Berkeley DB) and these were
copied
from the old to
 the new server. Now whenever an expiry process runs on a user's
database, the CPU
 spikes, sometimes for a minute or longer.

Hmm. We're using ours as a site-wide MTA to be able to reject incoming
mails at SMTP time, so no user DBs on the box, but we are running with
Bayes checking on (Berkeley DB), autolearning off, and manual Bayes
feeding only a few times a day. Because of that, I don't have practice
with a heavy Bayes load, but how certain are you that it's Bayes hitting
the CPU; did you run sa-learn (or spamassassin) with network reporting
turned off to see if that makes a difference?

I ask because pyzor did keep our CPU at a constant 75% until I turned it
off; now it varies from 25% to 75% over the day, which is a lot more
acceptable :)

Another thought, albeit perhaps not directly related, is are you running
spamd with --robin-robin? When I did that, it reduced the CPU load with
the trade-off of using a little more memory, which seems to be the
better trade-off, especially for a VM on ESX.

-- 
John C. Ring, Jr.

[EMAIL PROTECTED]
Network Engineer
Union Switch  Signal Inc.

If men were angels, no government would be necessary. If angels were to
govern men, neither external nor internal controls on government would
be necessary. -- James Madison



Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail. 
http://us.rd.yahoo.com/evt=42297/*http://advision.webevents.yahoo.com/mailbeta 



smime.p7s
Description: S/MIME Cryptographic Signature

URIXBL?

2006-10-27 Thread Jeff Hardy

Hello all,

I've been diddling with some tests and wondered why there is a spamhaus
URIBL_SBL, but not URIBL_XBL (or better yet, combined URIBL_SBL-XBL).  I
can create this myself easy enough, but wondered if there was a reason
XBL is not included.  Thanks.

-Jeff

Re: mcafee-spamassassin-rules

On Fri, Oct 27, 2006 at 01:38:32PM -0400, Chris Santerre wrote:
  It's also worth noting that hypothetically, if I was a 
  company releasing
  updates based on an open-source product, I may have incentive to avoid
  making those updates useful on said product, otherwise people would
  download my updates and not pay me for the software.
 
 Wouldn't that be against the open source lic? 

Not that I'm aware of, why would it be?  If I produce something on my
own (like new rules) and publish it, I'm not bound by someone else's
licensing.  In this case, if I'm following the code license and make
modifications such that new rules that I produce are in a proprietary
format, then that's perfectly valid.  With SA 3, I could even make the
config parsing a plugin and not have to modify any of the base code.

-- 
Randomly Selected Tagline:
I came here to kick butt and chew gum, and I'm all out of gum.
  - They Live (movie)


pgpq3zHGcsyJy.pgp
Description: PGP signature

Re: URIXBL?

2006-10-27 Thread Justin Mason


Jeff Hardy writes:
 Hello all,
 
 I've been diddling with some tests and wondered why there is a spamhaus
 URIBL_SBL, but not URIBL_XBL (or better yet, combined URIBL_SBL-XBL).  I
 can create this myself easy enough, but wondered if there was a reason
 XBL is not included.  Thanks.

Basically, it didn't work well ;)  Try it out -- it doesn't correlate
well with spam.

--j.

Re: URIXBL?

2006-10-27 Thread Stuart Johnston


Jeff Hardy wrote:

Hello all,

I've been diddling with some tests and wondered why there is a spamhaus
URIBL_SBL, but not URIBL_XBL (or better yet, combined URIBL_SBL-XBL).  I
can create this myself easy enough, but wondered if there was a reason
XBL is not included.  Thanks.


XBL is mostly infected PCs.  These systems are used to send spam but not 
generally to host spam domains.

Re: URIXBL?

2006-10-27 Thread Jeff Hardy

On Fri, 2006-10-27 at 20:38 +0100, Justin Mason wrote:
 Jeff Hardy writes:
  Hello all,
  
  I've been diddling with some tests and wondered why there is a spamhaus
  URIBL_SBL, but not URIBL_XBL (or better yet, combined URIBL_SBL-XBL).  I
  can create this myself easy enough, but wondered if there was a reason
  XBL is not included.  Thanks.
 
 Basically, it didn't work well ;)  Try it out -- it doesn't correlate
 well with spam.
 
 --j.

Fair enough I'll test away.  BTW, for anyone else coming across this
post:

 warn: config: error: rule 'URIBL_SBL-XBL' has invalid characters (not
Alphanumeric + Underscore + starting with a non-digit)

Have to get rid of that hyphen.  Thank you 'spamassassin -D all ...'  :)
Thanks for the reply.

-Jeff

Re: MailScanner versus Amavisd-new with postfix

2006-10-27 Thread Martin Hepworth


Jeff Chan wrote:

Not to start any flamewars, but does anyone have strong opinions
on MailScanner versus Amavisd-new for use with postfix (and of
course SpamAssassin and ClamAV)?

In the old days it seemed Amavisd-new may have integrated better
with postfix, but is that no longer the case?  Some folks say
MailScanner is faster and leaner.

What gives?

Jeff C.

Jeff

can't say I've compared the two, but I run MailScanner and it does have 
a couple of neat features recently - it's own MD5 cache of recent spam 
which speeds things up alot, and the inbuilt phishing testing (yeah ok 
this has been in a while).


it also glues  SA, 12 anti-virus engines, and it's own tests (like 
executables which has saved me a few times before the av people have 
updates).


horses for courses, but it's nice to have a choice of amavis-new OR 
MailScanner.


--
Martin Hepworth
Senior Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

**

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   

**

RE: MailScanner versus Amavisd-new with postfix

2006-10-27 Thread Dan Horne

 -Original Message-
 From: Jeff Chan [mailto:[EMAIL PROTECTED] 
 Sent: Friday, October 27, 2006 9:54 AM
 To: SpamAssassin Users
 Subject: MailScanner versus Amavisd-new with postfix

 Not to start any flamewars, but does anyone have strong 
 opinions on MailScanner versus Amavisd-new for use with 
 postfix (and of course SpamAssassin and ClamAV)?

 In the old days it seemed Amavisd-new may have integrated 
 better with postfix, but is that no longer the case?  Some 
 folks say MailScanner is faster and leaner.

 What gives?

 Jeff C.
 --
 Jeff Chan
 mailto:[EMAIL PROTECTED]
 http://www.surbl.org/

Wietse Venema says that MailScanner uses unsupported methods to
manipulate the queue that could (and has) lead to lost email.  I don't
know the full details, but it has been discussed much on the postfix
list.  My impression is that the condition is rare, but it does happen.

Just a heads up.

-DH

CONFIDENTIALITY NOTICE:
This email message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message.

SPAM-FREE 1.0(2476)

RE: MailScanner versus Amavisd-new with postfix

2006-10-27 Thread Kurt Buff

note: I don't use mailscanner, so am only relaying what I saw on the postfix
list.

My understanding (based on foggy memory - search the list archives for a
better answer) is that MailScanner dipped into postfix queues using either
undocumented postfix APIs or by bypassing postfix entirely and directly
manipulating files on disk. This led to instances of documented mail loss.
Wietse therefore said that it wasn't safe to use.

I've also recently read (I believe also on the postfix list, but am not
sure) that MailScanner has remedied this behavior, and that it is now safe
to use with postfix, but you'll need to confirm for yourself if that is
true.

Kurt

| -Original Message-
| From: Jeff Chan [mailto:[EMAIL PROTECTED]
| Sent: Friday, October 27, 2006 06:54
| To: SpamAssassin Users
| Subject: MailScanner versus Amavisd-new with postfix
| 
| 
| Not to start any flamewars, but does anyone have strong opinions
| on MailScanner versus Amavisd-new for use with postfix (and of
| course SpamAssassin and ClamAV)?
| 
| In the old days it seemed Amavisd-new may have integrated better
| with postfix, but is that no longer the case?  Some folks say
| MailScanner is faster and leaner.
| 
| What gives?
| 
| Jeff C.
| -- 
| Jeff Chan
| mailto:[EMAIL PROTECTED]
| http://www.surbl.org/
|

RE: High CPU running SA in a VMware VM

2006-10-27 Thread Mark

 -Original Message-
 From: Anders Norrbring [mailto:[EMAIL PROTECTED]
 Sent: vrijdag 27 oktober 2006 20:58
 To: users@spamassassin.apache.org
 Subject: Re: High CPU running SA in a VMware VM

 I run my SMTP server entirely in a VMware VM, and have *never* seen a
 high CPU usage on that particular machine. I run Postfix, Amavis-new
 2.4.3, SA 3.1.7 and quite some plug-ins.

 Bayes and quarantine are all in a MySQL database stored on
 another VM, no big load there either...

I concur. I've been using Vmware, as a shadow/test server, for the
production FreeBSD one, for years; never had any such issue.
Vmware rocks! :)

I would run any of the db_dump or db_upgrade utils for BerkeleyDB; or
reinstall DB_File (and make darn sure it's compiled against the correct
BerkeleyDB libs). At any rate, I myself would probably be more inclined to
look into a BerkeleyDB issue than a Vmware one.

- Mark

Re: ImageInfo vs FuzzyOCR performance?

2006-10-27 Thread Jorge Valdes


Jeff Chan wrote:

Does anyone have any recent feedback about the performance of
ImageInfo versus FuzzyOCR about detecting stock image spams (or
any others)?  Does FuzzyOCR catch significantly more spams than
ImageInfo?

Cheers,

Jeff C.
  
I maybe biased, as I help in FuzzyOcr development, but do use both.  
ImageInfo is fine and will get you part of the way there, but FuzzyOcr 
hits more often. Daily scanning ~8Kmsg/day, FuzzyOcr hits ~1600 times 
and ImageInfo hits  150 times on average. On my system, here are the 
top10 rule hits from yesterday:


SPAM Results:
  3936 Message(s) 49.83%
19.399 Average Score

  3343 Time(s)7.50%   84.93% Hit Rule: BAYES_99
  3068 Time(s)6.88%   77.95% Hit Rule: HTML_MESSAGE
  1655 Time(s)3.71%   42.05% Hit Rule: FUZZY_OCR
  1527 Time(s)3.42%   38.80% Hit Rule: SARE_GIF_ATTACH
  1411 Time(s)3.16%   35.85% Hit Rule: URIBL_BLACK
  1274 Time(s)2.86%   32.37% Hit Rule: URIBL_BLACK_OVERLAP
  1271 Time(s)2.85%   32.29% Hit Rule: MIME_HTML_ONLY
  1215 Time(s)2.72%   30.87% Hit Rule: URIBL_JP_SURBL
  1187 Time(s)2.66%   30.16% Hit Rule: RCVD_IN_BL_SPAMCOP_NET
  1184 Time(s)2.66%   30.08% Hit Rule: SARE_GIF_STOX


Jorge Valdes

Re: domainkeys unverified - solved

2006-10-27 Thread Chris Purves


Chris Purves wrote:
I just got the domainkeys plugin set up, but it's not working the way I 
expect.


In messages from Yahoo I see:

0.0 DK_SIGNED Domain Keys: message has an unverified signature

but I never see DK_VERIFIED

Is there something I need to configure?  I didn't apply the patch, 
because I'm assuming it's been incorporated into 3.1.4.




In the end, with the help of Mark Martinec, I was able to determine that 
the problem was with my ISP provided DNS namerservers not allowing full 
TXT records to be returned (they were truncated).


I installed bind9 and used localhost as my primary nameserver and now I 
can get DK_VERIFIED.



Symptoms for this problem were:

DK_VERIFIED does not fire for Yahoo! e-mails (multiple part TXT record)
DK_VERIFIED does fire for Gmail e-mail (single part TXT record)
Perl modules Mail::DomainKeys and Mail::DKIM will fail during make test



--
Chris

Re: Scoring base64 blob messages

2006-10-27 Thread Peter H. Lemieux


Theo Van Dinter wrote:

On Thu, Oct 26, 2006 at 12:19:23PM -0400, Peter H. Lemieux wrote:

No, because there are going to be a lot of mails that would hit that.
Really?  Maybe it's because I live in the US, but I can't think of a 
legitimate message I've ever received consisting only of a base64 blob. 


You look at a lot of raw messages?  ;)


Doesn't everybody?

Seriously, I do look at a lot of raw messages; for instance, I review the 
full text of nearly every spam message that doesn't get caught by my 
filters and shows up in my inbox.  Obviously I don't get much mail from 
Blackberry users or Ticketmaster!


Rather than making anyone else do the work for me, is there something I 
can read about how to determine the frequency of different message 
features appearing in the corpus?



Well, there isn't a SA corpus, so there's no answer to that question.


Ah, I hadn't read this page before:
http://wiki.apache.org/spamassassin/HandClassifiedCorpora
My recollection was that 2.x used a centrally-defined corpus rather than 
a variety of developers' corpora (see, I read the wiki).  Either things 
changed with the switch in scoring algorithms in 3.x, or my recollection 
is shoddy.  Probably the latter.



You can generate some rules and use mass-check to run against your own corpus
to gather some statistics.  I'm willing to run some rules for you against my
corpus if you want.  I just don't have time to come up with the rules right
now.


Thanks for the offer, Theo, but don't spend your valuable time on this. 
I'll give it shot some day when I've got some spare moments.  If I do get 
some candidate rules, I'll pass them along to you for testing.



Thanks again!
Peter

Re: Scoring base64 blob messages

On Fri, Oct 27, 2006 at 05:24:58PM -0400, Peter H. Lemieux wrote:
 Well, there isn't a SA corpus, so there's no answer to that question.
 
 Ah, I hadn't read this page before:
   http://wiki.apache.org/spamassassin/HandClassifiedCorpora
 My recollection was that 2.x used a centrally-defined corpus rather than 
 a variety of developers' corpora (see, I read the wiki).  Either things 
 changed with the switch in scoring algorithms in 3.x, or my recollection 
 is shoddy.  Probably the latter.

Yeah, sorry.  We've had separate corpora since I started with SA several years
ago.  There was a public corpus of mail made available which could be
confusing your memory. :)

-- 
Randomly Selected Tagline:
I pity the shul that won't let Krusty in now. Spin me clown!
 - Mr. T, The Simpsons, Today, I Am a Klown


pgp927l5OrmB0.pgp
Description: PGP signature

Re: domainkeys unverified - solved

2006-10-27 Thread Peter H. Lemieux


Chris Purves wrote:
In the end, with the help of Mark Martinec, I was able to determine that 
the problem was with my ISP provided DNS namerservers not allowing full 
TXT records to be returned (they were truncated).


Was this something that the ISP cooked up, or was it intrinsic to the DNS 
server software they are using?  If the latter, it would be good to know 
which server they were running.  It might be a useful addition to the 
FAQ/wiki.


Peter

Re: High CPU running SA in a VMware VM

On Fri, Oct 27, 2006 at 09:10:28PM +, Mark wrote:
  I run my SMTP server entirely in a VMware VM, and have *never* seen a
  high CPU usage on that particular machine. I run Postfix, Amavis-new
  2.4.3, SA 3.1.7 and quite some plug-ins.
 
 I would run any of the db_dump or db_upgrade utils for BerkeleyDB; or
 reinstall DB_File (and make darn sure it's compiled against the correct
 BerkeleyDB libs). At any rate, I myself would probably be more inclined to
 look into a BerkeleyDB issue than a Vmware one.

Yeah, I doubt there's an issue with VMware specifically (ESX++).  My guess is
that if you're seeing different behavior between a physical host and virtual
host, there's something different in the virtual host -- different OS, libs,
perl modules, etc.

Obviously that won't be the case if you virtualized a physical machine, but I
seem to recall from the start of the thread that you migrated the data but not
the OS.

-- 
Randomly Selected Tagline:
My wife and I were happy for years.  Then we met.


pgpofuBWMG1My.pgp
Description: PGP signature

Re: High CPU running SA in a VMware VM

You are correct, this was a new build, with a later version of SA and migrated Bayes files. It could very well be the case that Berkeley DB needs to be patched, or the data converted in some fashion.I will say that in a VM environment, we tried to build gcc, and it took MUCH longer than on a physical box with the same processors. VMware analyzed our data, and they determined that we should disable NPTL and use LinuxThreads instead (kb 1470). This did help substantially, and though slower than the physical machine, it was acceptable. I have tried this for SA, and it does seem to cut down the CPU required, so there is some hope.Theo Van Dinter [EMAIL PROTECTED] wrote: On Fri, Oct 27, 2006 at 09:10:28PM +, Mark wrote: I run my SMTP server entirely in a VMware VM, and
have *never* seen a high CPU usage on that particular machine. I run Postfix, Amavis-new 2.4.3, SA 3.1.7 and quite some plug-ins. I would run any of the "db_dump" or db_upgrade" utils for BerkeleyDB; or reinstall DB_File (and make darn sure it's compiled against the correct BerkeleyDB libs). At any rate, I myself would probably be more inclined to look into a BerkeleyDB issue than a Vmware one.Yeah, I doubt there's an issue with VMware specifically (ESX++). My guess isthat if you're seeing different behavior between a physical host and virtualhost, there's something different in the virtual host -- different OS, libs,perl modules, etc.Obviously that won't be the case if you virtualized a physical machine, but Iseem to recall from the start of the thread that you migrated the data but notthe OS.-- Randomly Selected Tagline:My wife and I were happy for
years. Then we met.
All-new Yahoo! Mail - Fire up a more powerful email and get things done faster.

Re: domainkeys unverified - solved

2006-10-27 Thread Justin Mason


Peter H. Lemieux writes:
 Chris Purves wrote:
  In the end, with the help of Mark Martinec, I was able to determine that 
  the problem was with my ISP provided DNS namerservers not allowing full 
  TXT records to be returned (they were truncated).
 
 Was this something that the ISP cooked up, or was it intrinsic to the DNS 
 server software they are using?  If the latter, it would be good to know 
 which server they were running.  It might be a useful addition to the 
 FAQ/wiki.

yes, definitely -- this is worth knowing about...

--j.

Re: High CPU running SA in a VMware VM

I manually ran sa-learn --force-expire, and it hammered the box.  Here is a debug and timing information (for just a 5 MB file!):[18002] dbg: bayes: tie-ing to DB file R/O /home/ian/.spamassassin/bayes_toks  [18002] dbg: bayes: tie-ing to DB file R/O /home/ian/.spamassassin/bayes_seen  [18002] dbg: bayes: found bayes db version 3  [18002] dbg: bayes: DB journal sync: last sync: 1161899721  [18002] dbg: bayes: opportunistic call found journal sync due  [18002] dbg: bayes: bayes journal sync starting  [18002] dbg: bayes: tie-ing to DB file R/W /home/ian/.spamassassin/bayes_toks  [18002] dbg: bayes: tie-ing to DB file R/W /home/ian/.spamassassin/bayes_seen  [18002] dbg: bayes: found bayes db version 3  [18002] dbg: bayes: synced databases from journal in 0 seconds: 792 unique entries (974 total entries)  [18002] dbg: bayes: bayes journal sync completed  [18002] dbg: bayes: bayes journal sync starting  [18002]
 dbg: bayes: bayes journal sync completed  [18002] dbg: bayes: expiry starting  [18002] dbg: bayes: expiry check keep size, 0.75 * max: 112500  [18002] dbg: bayes: token count: 161725, final goal reduction size: 49225  [18002] dbg: bayes: first pass? current: 1161986180, Last: 1161862273,  atime: 691200, count: 10015, newdelta: 140627, ratio: 4.91512730903645,  period: 43200  [18002] dbg: bayes: can't use estimation method for expiry, unexpected result, calculating optimal atime delta (first pass)  [18002] dbg: bayes: expiry max exponent: 9  -- about 20 seconds elapsed  [18002] dbg: bayes: atime token reduction  [18002] dbg: bayes:  ===  [18002] dbg: bayes: 43200 144256  [18002] dbg: bayes: 86400 133029  [18002] dbg: bayes: 172800 111350  [18002] dbg: bayes: 345600 72306  [18002] dbg: bayes: 691200 9457  [18002] dbg: bayes: 1382400 0  [18002] dbg: bayes: 2764800 0  [18002] dbg:
 bayes: 5529600 0  [18002] dbg: bayes: 11059200 0  [18002] dbg: bayes: 22118400 0  [18002] dbg: bayes: first pass decided on 691200 for atime delta  -- about 40 seconds elapsed [a sort going on here???]  [18002] dbg: bayes: untie-ing  [18002] dbg: bayes: untie-ing db_toks  [18002] dbg: bayes: untie-ing db_seen  [18002] dbg: bayes: files locked, now unlocking lock  expired old bayes database entries in 60 seconds = YIKES  152268 entries kept, 9457 deleted  token frequency: 1-occurrence tokens: 68.79%  token frequency: less than 8 occurrences: 18.63%  [18002] dbg: bayes: expiry completed  .  real 1m6.157s  user 0m56.044s = WOW!  sys 0m2.370sAnders Norrbring [EMAIL PROTECTED] wrote: 
 Sorry about top-posting, but I just catched the topic, and found it a bit interesting...I run my SMTP server entirely in a VMware VM, and have *never* seen a high CPU usage on that particular machine.  I run Postfix, Amavis-new 2.4.3, SA 3.1.7 and quite some plug-ins.Bayes and quarantine are all in a MySQL database stored on another VM, no big load there either...At peaks, I have a 2-4% CPU usage and 20-65% memory usage on eash VM, all reported by Virtual Center 1.4.So, naturally I'm curious about why there would be a high CPU load from using SA My guess is that it's something else causing it.-- Anders NorrbringNorrbring ConsultingSammy Anderson skrev: I'm pretty sure it is that, because when I turn of bayes altogether, the  spikes go away.  I also ran sa-learn --force-expire and it PEGS the VM.   With bayes debugging enabled, I see lines like this in my
 syslog:  bayes: expired old bayes database entries in 236 seconds: 152268 entries  kept, 9457 deleted  We have about 140 users, each with a 5 MB bayes_toks file, so there is a  need to expire somebody all throughout the day.  Each user is virtual,  they don't really have an account on the box, but the directories  correspond to each user address.  And we do auto-learn, with  opportunistic expiry.  Good thought about --round-robin, I am willing to use a little more  memory if it saves on CPU.  */"Ring, John C" /* wrote:   From: Sammy Anderson [mailto:[EMAIL PROTECTED]We recently migrated our SpamAssassin installation from a physical 3.6 GHz system  running RHEL 4 and SA 3.0.4 to a VMware VM (ESX 2.5.4) with RHEL 4 as the guest
 OS  and SA 3.1.7.  I just did the same thing last week, except we're using RHEL 3 and ESX 2.5.2, and the physical box it used to be on was far less powerful then yours.   Each user has their own Bayes files (Berkeley DB) and these were copied from the old to  the new server. Now whenever an expiry process runs on a user's database, the CPU  spikes, sometimes for a minute or longer.  Hmm. We're using ours as a site-wide MTA to be able to reject incoming mails at SMTP time, so no user DBs on the box, but we are running with Bayes checking on (Berkeley DB), autolearning off, and manual Bayes feeding only a few times a day. Because of that, I don't have practice with a heavy Bayes load, but how certain are you that it's Bayes hitting the CPU; did
 you

RE: domainkeys unverified - solved

2006-10-27 Thread Mark

 -Original Message-
 From: Chris Purves [mailto:[EMAIL PROTECTED]
 Sent: vrijdag 27 oktober 2006 23:20
 To: users@spamassassin.apache.org
 Subject: Re: domainkeys unverified - solved

 In the end, with the help of Mark Martinec, I was able to
 determine that the problem was with my ISP provided DNS
 namerservers not allowing full TXT records to be returned
 (they were truncated).

 Symptoms for this problem were:

 DK_VERIFIED does not fire for Yahoo! e-mails (multiple part
 TXT record)

Interesting.

nslookup -q=txt lima._domainkey.yahoogroups.com

k=rsa;
p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCTpDT1pbK0xwkd
ZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W
LLNOf9hkMeicUH48TGkEoCAcaSjJz/b3NMrOy9l1U7gQIDAP//

I get two parts, too. Is that their correct public key, when concatinated?
Though I do not get both parts in random order, I wonder if I would not
have the same issue, then.

- Mark

Re: High CPU running SA in a VMware VM

On Fri, Oct 27, 2006 at 03:01:45PM -0700, Sammy Anderson wrote:
 I manually ran sa-learn --force-expire, and it hammered the box.   Here is a 
 debug and timing information (for just a 5 MB file!):
   
   [18002] dbg: bayes: token count: 161725, final goal reduction size: 49225

want to get rid of (max) 49225 tokens

   [18002] dbg: bayes: can't use estimation method for expiry, unexpected 
 result, calculating optimal atime delta (first pass)

have to do step 1 and can't estimate

   [18002] dbg: bayes: expiry max exponent: 9
   -- about 20 seconds elapsed

it's going through every token in your db

   [18002] dbg: bayes: atime token reduction
   [18002] dbg: bayes:  ===
   [18002] dbg: bayes: 43200 144256
   [18002] dbg: bayes: 86400 133029
   [18002] dbg: bayes: 172800 111350
   [18002] dbg: bayes: 345600 72306
   [18002] dbg: bayes: 691200 9457
   [18002] dbg: bayes: 1382400 0
[...]
   [18002] dbg: bayes: first pass decided on 691200 for atime delta

691200 wins the Price Is Right (9457 is the closest without going over)

   -- about 40 seconds elapsed [a sort going on here???]

It's creating a new DB file, going back through every token in the original
DB, and for any that are newer than 9457 seconds ago, it copies the entry to
the new DB.

   expired old bayes database entries in 60 seconds = YIKES

yep.  expiry is relatively resource intensive and slow w/ DBMs, but
there's no other good way to do it (or at least, no one has suggested
a really better way to do it...)

-- 
Randomly Selected Tagline:
I believe it's not butter, I just can't believe it's $1.59!


pgpFcu5EsuOzk.pgp
Description: PGP signature

Re: Rules to reject bounce messages for mail not sent by me

2006-10-27 Thread Jo Rhett


On Oct 27, 2006, at 3:58 AM, Justin Mason wrote:

Nick Gilbert writes:

PS. Will setting up SPF on my domain name have any effect for things
like this? Will it discourage spammers from using my domain or reduce
the number of bounce messages I/we get?


nope.  they don't bother checking, and the systems sending bounces
aren't the ones that are being kept up-to-date enough to check SPF
either.


Umm... not in my experience.  Every time we turn on SPF for a domain,  
the amount of backscatter goes to about a third of the previous  
amount.  Every time I've been involved anyway.


--
Jo Rhett
Senior Network Engineer
Network Consonance

Re: domainkeys unverified - solved

2006-10-27 Thread Chris Purves

Mark wrote:

-Original Message-
From: Chris Purves [mailto:[EMAIL PROTECTED]
Sent: vrijdag 27 oktober 2006 23:20
To: users@spamassassin.apache.org
Subject: Re: domainkeys unverified - solved

In the end, with the help of Mark Martinec, I was able to
determine that the problem was with my ISP provided DNS
namerservers not allowing full TXT records to be returned
(they were truncated).

Symptoms for this problem were:

DK_VERIFIED does not fire for Yahoo! e-mails (multiple part
TXT record)

Interesting.

nslookup -q=txt lima._domainkey.yahoogroups.com

k=rsa;
p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCTpDT1pbK0xwkd
ZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W
LLNOf9hkMeicUH48TGkEoCAcaSjJz/b3NMrOy9l1U7gQIDAP//

I get two parts, too. Is that their correct public key, when concatinated?
Though I do not get both parts in random order, I wonder if I would not
have the same issue, then.

What you get is correct.  In my case, when it's not working I get:

[EMAIL PROTECTED]:~$ nslookup -q=txt lima._domainkey.yahoogroups.com
Server: 64.59.184.13
Address:64.59.184.13#53

Non-authoritative answer:
lima._domainkey.yahoogroups.com text = k=rsa\; 
p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCTpDT1pbK0xwkdZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W

Authoritative answers can be found from:

[EMAIL PROTECTED]:~$

I'm missing the second part of the Answer and Authority is empty. 
Using dig -t txt ... the Additional section is also emtpy.

--
Chris

Re: domainkeys unverified - solved

2006-10-27 Thread Chris Purves


Peter H. Lemieux wrote:

Chris Purves wrote:
In the end, with the help of Mark Martinec, I was able to determine 
that the problem was with my ISP provided DNS namerservers not 
allowing full TXT records to be returned (they were truncated).


Was this something that the ISP cooked up, or was it intrinsic to the 
DNS server software they are using?  If the latter, it would be good to 
know which server they were running.  It might be a useful addition to 
the FAQ/wiki.



I still have to contact them, but I'll post back with my results.


--
Chris

Re: High CPU running SA in a VMware VM

And there is one of these for each user, this is just for one  user. Sounds like we may have to abandon Bayes or possibly use  mysql. Not sure we are ready to invest in setting that all up...Theo Van Dinter [EMAIL PROTECTED] wrote:  On Fri, Oct 27, 2006 at 03:01:45PM -0700, Sammy Anderson wrote:  I manually ran sa-learn --force-expire, and it hammered the box. Here  is a debug and timing information (for just a 5 MB file!):  [18002] dbg: bayes: token count: 161725, final goal reduction size: 49225want to get rid of (max) 49225 tokens  [18002] dbg: bayes: can't use estimation method for expiry, unexpected  result, calculating optimal atime delta (first pass)have to do step 1 and can't estimate   [18002] dbg: bayes: expiry max exponent: 9   --
 about 20 seconds elapsedit's going through every token in your db   [18002] dbg: bayes: atime token reduction   [18002] dbg: bayes:  ===   [18002] dbg: bayes: 43200 144256   [18002] dbg: bayes: 86400 133029   [18002] dbg: bayes: 172800 111350   [18002] dbg: bayes: 345600 72306   [18002] dbg: bayes: 691200 9457   [18002] dbg: bayes: 1382400 0[...]   [18002] dbg: bayes: first pass decided on 691200 for atime delta691200 wins the Price Is Right (9457 is the closest without going over)   -- about 40 seconds elapsed [a sort going on here???]It's creating a new DB file, going back through every token in the originalDB, and for any that are newer than 9457 seconds ago, it copies the entry tothe new DB.   expired old bayes database entries in 60 seconds = YIKESyep.  expiry is relatively resource intensive and
 slow w/ DBMs, butthere's no other good way to do it (or at least, no one has suggesteda really better way to do it...)-- Randomly Selected Tagline:I believe it's not butter, I just can't believe it's $1.59! 

Get your email and see which of your friends are online - Right on the  new Yahoo.com

RE: domainkeys unverified - solved

2006-10-27 Thread Mark

 -Original Message-
 From: Chris Purves [mailto:[EMAIL PROTECTED] 
 Sent: zaterdag 28 oktober 2006 0:49
 To: users@spamassassin.apache.org
 Subject: Re: domainkeys unverified - solved

  DK_VERIFIED does not fire for Yahoo! e-mails (multiple part
  TXT record)

  Interesting.

  nslookup -q=txt lima._domainkey.yahoogroups.com

  k=rsa;

  p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCT
  pDT1pbK0xwkd
  ZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W
  LLNOf9hkMeicUH48TGkEoCAcaSjJz/b3NMrOy9l1U7gQIDAP//

  I get two parts, too. Is that their correct public key, 
  when concatinated?

 What you get is correct. In my case, when it's not working I get:

 [EMAIL PROTECTED]:~$ nslookup -q=txt lima._domainkey.yahoogroups.com
 Server: 64.59.184.13
 Address:64.59.184.13#53

 Non-authoritative answer:
 lima._domainkey.yahoogroups.com text = k=rsa\; 
 p=MHwwDQYJKoZIhvcNAQEBBQADawAwaAJhAL10WHRWMSb9Tnl+k4Kzpc18rDCT
 pDT1pbK0xwkdZIZkaP8NB75qa/S57xccZlIwbI22Ooy/IY+8WxQtvE2z4W

 Authoritative answers can be found from:

 [EMAIL PROTECTED]:~$

 I'm missing the second part of the Answer and Authority is empty.

Thanks. :) I was getting worried. I'm not quite ready to go to BIND 9 yet
(don't all y'all shoot me now), so I'm happy to hear it's working.

- Mark

Re: High CPU running SA in a VMware VM

2006-10-27 Thread Rick Macdougall


Sammy Anderson wrote:

And there is one of these for each user, this is just for one  user.  Sounds 
like we may have to abandon Bayes or possibly use  mysql.  Not sure we are 
ready to invest in setting that all up...



Bayes in MySQL is a snap to setup and it really runs rings around the 
dbm setup in a real world situation.


I switched over two clients this morning and neither of them had MySQL 
installed.  Installed from source (php 5 requirements etc) and still had 
both installs done before lunch.


Regards,

Rick

RE: ImageInfo vs FuzzyOCR performance?

2006-10-27 Thread Michael Scheidell

 -Original Message-
 From: Jorge Valdes [mailto:[EMAIL PROTECTED] 
 Sent: Friday, October 27, 2006 5:12 PM
 To: users@spamassassin.apache.org
 Subject: Re: ImageInfo vs FuzzyOCR performance?
 
  SPAM Results:
3936 Message(s) 49.83%
  19.399 Average Score
  
3343 Time(s)7.50%   84.93% Hit Rule: BAYES_99
3068 Time(s)6.88%   77.95% Hit Rule: HTML_MESSAGE
1655 Time(s)3.71%   42.05% Hit Rule: FUZZY_OCR
1527 Time(s)3.42%   38.80% Hit Rule: SARE_GIF_ATTACH
1411 Time(s)3.16%   35.85% Hit Rule: URIBL_BLACK
1274 Time(s)2.86%   32.37% Hit Rule: URIBL_BLACK_OVERLAP
1271 Time(s)2.85%   32.29% Hit Rule: MIME_HTML_ONLY
1215 Time(s)2.72%   30.87% Hit Rule: URIBL_JP_SURBL
1187 Time(s)2.66%   30.16% Hit Rule: RCVD_IN_BL_SPAMCOP_NET
1184 Time(s)2.66%   30.08% Hit Rule: SARE_GIF_STOX
 
What do you use to get those stats?

RE: ImageInfo vs FuzzyOCR performance?

2006-10-27 Thread Rob McEwen

Jeff Chan wrote:
 Does anyone have any recent feedback about the performance of
 ImageInfo versus FuzzyOCR about detecting stock image spams (or
 any others)?  Does FuzzyOCR catch significantly more spams than
 ImageInfo?

But one of the things that ImageInfo does to avoid FPs is assign a higher
score to image-only spam where the ratio of screen-space/amount-of-text is
high. But notice how more of this type of spam lately has more gibberish
text at the bottom lately? This messes that formula up and creates a VERY
small ImageInfo score. I know that the spammers might have been doing this
to get around bayes... but I suspect that they were really trying to get
around ImageInfo because this change-up seemed to happen soon after
ImageInfo was introduced.

Nevertheless, I've found that manually readjusting those ratios has helped
to catch more spam. (And I'm reluctant to mention this in the first place
because if they are adjusted at the SARE site, then the spammers will only
readjust accordingly!)

Rob McEwen
PowerView Systems

SA TIMED OUT



I upgraded to SA 3.1.4 last night and now I have two issues that I'm 
trying to resolve:


(1)
spamassassin -D --lint is giving me an error:
[2533] warn: config: failed to parse line, skipping: dcc_timeout 18


(2)
In the logs I'm seeing a good number of the following type of entry:
Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT, 
backtrace: at 
/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm line 
363\n\teval {...} called at 
/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm line 
363\n\tMail::SpamAssassin::DnsResolver::poll_responses('Mail::SpamAssassin::DnsResolver=HASH(0x4005820)', 
72) called at 
/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/Plugin/URIDNSBL.pm 
line 
710\n\tMail::SpamAssassin::Plugin::URIDNSBL::complete_lookups('Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x1ff8200)', 
'HASH(0x4cbdad0)', 72) called at 
/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/Plugin/URIDNSBL.pm 
line 
412\n\tMail::SpamAssassin::Plugin::URIDNSBL::check_post_dnsbl('Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x1ff8200)', 
'HASH(0x6816dd0)') called at 
/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/PluginHandler.pm line 
159\n\teval {...} called at /usr/lib/perl5/vendor_perl/5.8.8/Mail/Sp...



I've checked the archives and maybe I missed something, but I wasn't 
able to find anything that seemed relavent.


Thanks for any pointers.
Mike


[EMAIL PROTECTED] ~]# spamassassin -V
SpamAssassin version 3.1.4
  running on Perl version 5.8.8

--

 Let the machine do the dirty work.  - Elements of Programming Style
  15:35:01 up 16:21,  0 users,  load average: 0.32, 0.31, 0.28

 Linux Registered User #241685  http://counter.li.org

Re: MailScanner versus Amavisd-new with postfix

2006-10-27 Thread Mark Martinec

Jeff,

 Not to start any flamewars, but does anyone have strong opinions
 on MailScanner versus Amavisd-new for use with postfix (and of
 course SpamAssassin and ClamAV)?

Of course I'm biased, but I'd be worried running program with
about 400 cases of calling system routines (I/O, file system, etc.)
without checking resulting status or failing to report errors.
MailScanner works while everything is in order. When unexpected
happens (e.g. disk full, I/O or file system errors, depleted system 
resources), then unpredictable things are bound to result, and
possibly go by unnoticed for some time or prove difficult to diagnose.

  Mark

Re: spamassassin --lint fails with rules in local.cf

2006-10-27 Thread Alain Wolf

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 26.10.2006 14:35, * Dylan Bouterse wrote:
 I have added some rules in my local.cf file (for adding scores for some
 SARE rules) but when I run spamassassin -lint (or when I run
 rules_du_jour which does the same) it says the rules in my local.cf file
 are non-existent, but spamassassin ultimately runs fine. What am I doing
 wrong?
 
 Dylan
 
 

Oops, just stumbled upon the release announcemnet of SpamAssassin 3.1.7

http://www.nabble.com/ANNOUNCE%3A-Apache-SpamAssassin-3.1.7-available%21-tf2415849.html

3.1.7 is a quick-fix release; it contains only a fix for one bug,
introduced accidentally in 3.1.6:

- - bug 5119: if admins had set rule scores in the site configuration in
  /etc, sa-update would fail.  Back out this change

Don't know if Dylan is already using 3.1.7.

We are on 3.1.6 because there is no updated FreeBSD-Port out yet.
So I wait.

Greetings
Alain


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFQqajV5MZZmyxvGgRAncZAJwIvkSSCc3KX0jaMXxmVlQ3cYqZmgCgjFzS
ZXC3XFWGXreL8fc/c2lhoUg=
=aE61
-END PGP SIGNATURE-

Re: SA TIMED OUT

2006-10-27 Thread Matt Kettler

M. Lewis wrote:

 I upgraded to SA 3.1.4 last night and now I have two issues that I'm
 trying to resolve:

 (1)
 spamassassin -D --lint is giving me an error:
 [2533] warn: config: failed to parse line, skipping: dcc_timeout 18
If you've not edited /etc/mail/spamassassin/v310.pre to load the dcc
plugin, dcc is disabled by default (it's not free for everyone to use,
so disabled pending your decision that your use falls under DCC's
license.. most folks do, but check the license.

Without any DCC support loaded, the dcc_timeout option is meaningless to SA.



 (2)
 In the logs I'm seeing a good number of the following type of entry:
 Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT,
 backtrace: at
 /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm line
 363\n\teval {...} called at


Sounds like your DNS is slow, and you've got a short  $sa_timeout in
your amavis configs. But I'm no amavis expert.

RE: SA TIMED OUT

2006-10-27 Thread Gary V

I upgraded to SA 3.1.4 last night and now I have two issues that I'm trying 
to resolve:


(1)
spamassassin -D --lint is giving me an error:
[2533] warn: config: failed to parse line, skipping: dcc_timeout 18



You need to enable (uncomment) the DCC plugin in v310.pre


(2)
In the logs I'm seeing a good number of the following type of entry:
Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT, backtrace: 
at ...


I've checked the archives and maybe I missed something, but I wasn't able 
to find anything that seemed relavent.


Thanks for any pointers.
Mike


The newer version takes longer to scan (quite noticable on a low powered 
system). Newer versions of amavisd-new allow scans to take longer without 
timomg out where older versions have a default of $sa_timeout = 30; which 
should be included in amavisd.conf and raised to something like 60 seconds. 
I also suggest moving Bayes to SQL, and if not, then set lock_method = flock 
in local.cf if appropriate.

http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html#miscellaneous_options

_
Try Search Survival Kits: Fix up your home and better handle your cash with 
Live Search! 
http://imagine-windowslive.com/search/kits/default.aspx?kit=improvelocale=en-USsource=hmtagline

RE: SA TIMED OUT

2006-10-27 Thread Gary V


spamassassin -D --lint is giving me an error:
[2533] warn: config: failed to parse line, skipping: dcc_timeout 18


BTW, as Matt says, your DNS may be slow. If DCC doesn't respond within 10 
seconds, I would imagine it's unlikely it will respond - so I wouldn't waste 
time waiting around another 8 seconds. Many people find a local caching DNS 
server really helps on net tests.


Gary V

_
Stay in touch with old friends and meet new ones with Windows Live Spaces 
http://clk.atdmt.com/MSN/go/msnnkwsp007001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=createwx_url=/friends.aspxmkt=en-us

Re: SA TIMED OUT


Matt Kettler wrote:

M. Lewis wrote:

I upgraded to SA 3.1.4 last night and now I have two issues that I'm
trying to resolve:

(1)
spamassassin -D --lint is giving me an error:
[2533] warn: config: failed to parse line, skipping: dcc_timeout 18

If you've not edited /etc/mail/spamassassin/v310.pre to load the dcc
plugin, dcc is disabled by default (it's not free for everyone to use,
so disabled pending your decision that your use falls under DCC's
license.. most folks do, but check the license.

Without any DCC support loaded, the dcc_timeout option is meaningless to SA.



This was indeed the problem. Error gone now.





(2)
In the logs I'm seeing a good number of the following type of entry:
Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT,
backtrace: at
/usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/DnsResolver.pm line
363\n\teval {...} called at



Sounds like your DNS is slow, and you've got a short  $sa_timeout in
your amavis configs. But I'm no amavis expert.


Actually I rebuilt this machine last night and forgot to turn on the 
cacheing NS. That made a difference!


Thanks Matt!


--

 May the bugs of many programs nest on your hard drive.
  22:45:01 up  3:13,  0 users,  load average: 0.10, 0.17, 0.17

 Linux Registered User #241685  http://counter.li.org

Re: SA TIMED OUT

Gary V wrote:
I upgraded to SA 3.1.4 last night and now I have two issues that I'm
trying to resolve:

(1)
spamassassin -D --lint is giving me an error:
[2533] warn: config: failed to parse line, skipping: dcc_timeout 18

You need to enable (uncomment) the DCC plugin in v310.pre

Done and the error is gone now.

(2)
In the logs I'm seeing a good number of the following type of entry:
Oct 27 15:40:21 moe amavis[2548]: (02548-01-2) (!)SA TIMED OUT,
backtrace: at ...

I've checked the archives and maybe I missed something, but I wasn't
able to find anything that seemed relavent.

Thanks for any pointers.
Mike

The newer version takes longer to scan (quite noticable on a low powered
system). Newer versions of amavisd-new allow scans to take longer
without timomg out where older versions have a default of $sa_timeout =
30; which should be included in amavisd.conf and raised to something
like 60 seconds. I also suggest moving Bayes to SQL, and if not, then
set lock_method = flock in local.cf if appropriate.
http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html#miscellaneous_options

Thanks Gary for the explanation. I will check into all of these.

Thanks,
Mike

_
Try Search Survival Kits: Fix up your home and better handle your cash
with Live Search!
http://imagine-windowslive.com/search/kits/default.aspx?kit=improvelocale=en-USsource=hmtagline

May the bugs of many programs nest on your hard drive.
22:45:01 up 3:13, 0 users, load average: 0.10, 0.17, 0.17

Linux Registered User #241685 http://counter.li.org

Re: SA TIMED OUT