Re: HarrisPoll

2007-02-17 Thread Jeff Chan
On Saturday 17 February 2007 23:29, Jeff Chan wrote:
> I should have addded, we are removing the Harris Poll domain
> hpolsurveys.com from the blacklist.

Actually checking more closely, this domain is not on any SURBL
blacklists.  If you got this result recently, then you may be suffering
from the DNS bug:

  http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3997

If you're using a SpamAssassin version before 3.1, you should upgrade to
3.1+ since it's fixed in these versions.

Or are you using a DNS proxy?

What happens if you:

  dig hpolsurveys.com.multi.surbl.org

or ping it?



Re: HarrisPoll

2007-02-17 Thread Jeff Chan
I should have addded, we are removing the Harris Poll domain
hpolsurveys.com from the blacklist.


Re: HarrisPoll

2007-02-17 Thread Jeff Chan
On Saturday 17 February 2007 10:44, LuKreme wrote:
> On 17-Feb-2007, at 06:39, Michael Scheidell wrote:
> >> -Original Message-
> >> From: LuKreme [mailto:[EMAIL PROTECTED]
> >> Sent: Friday, February 16, 2007 1:26 PM
> >> To: users@spamassassin.apache.org
> >> Subject: HarrisPoll
> >>
> >>
> >> Where does the WS-SURBL info come from?  I ask because the Harris
> >> Poll email is getting tagged with it.  As far as I know, I've
> >> never received spam from them, so I'd like to check out the actual
> >> rbl.
> >
> > If harrispoll emailed you then you DID get spam from them.
>
> I get mail from them all the time.
>
> > We get it all the time, and I don't know ANY user that signed up
> > for it.
>
> I did.
>
> > Did you sign up for it on purpose? ;-)
>
> Yep.  ALthough I recently (Friday) unsubbed.


It's a false positive (an error).  Please report all false positives on
SURBL lists to:

  [EMAIL PROTECTED]

Ideally have the owner of the domain contact us with the information at:
  
  http://www.surbl.org/lists.html#removal
  


Re: Google Summer of Code 2007 ...

2007-02-17 Thread hamann . w


>> Not quite. Those show how many times *others* have seen it, not how
>> many times *I* have seen it. Also, these have hysteresis so if you are
>> unfortunately to be at the start of the spam run and receive multiple
>> mails all with the same body then Razor, DCC and Pyzor might not
>> help. Though if this were implemented then there would have to a
>> whitelist for mailing lists to which multiple users have subscribed.
>> 

Hi,

ixhash, which also works that way, definitely started its life as an inhouse 
mail counter.
You could probably use ixhash or razor along with your own server rather than 
the public one

Wolfgang




Re: Google Summer of Code 2007 ...

2007-02-17 Thread Graham Murray
Theo Van Dinter <[EMAIL PROTECTED]> writes:
> Doesn't SA have at least 3 of those already?  Razor, DCC, and Pyzor.

Not quite. Those show how many times *others* have seen it, not how
many times *I* have seen it. Also, these have hysteresis so if you are
unfortunately to be at the start of the spam run and receive multiple
mails all with the same body then Razor, DCC and Pyzor might not
help. Though if this were implemented then there would have to a
whitelist for mailing lists to which multiple users have subscribed.


Re: FuzzyOCR

2007-02-17 Thread John Thompson
On 2007-02-12, Sujit Choudhury <[EMAIL PROTECTED]> wrote:

> Is there an easy way to get everything needed for FuzzyOCR?  Has
> somebody built a complete install, so that we don't have to go to
> various sites to built various bits of FuzzyOCR?

On FreeBSD when you install from the ports collection (p5-FuzzyOCR), it 
fetches, builds and installs all the pieces it needs automatically.

-- 

John ([EMAIL PROTECTED])



Re: [2] How can I configure spamassassin to filter spam jpgs?

2007-02-17 Thread John Thompson
On 2007-02-15, NIbbLLe <[EMAIL PROTECTED]> wrote:

> The problem is that we are running spamassassin through plesk 7 and we are
> running it on a Windows machine. I went to the FuzzyOCR site, I see the only
> files that they have is .tar (for linux) .  Do you maybe have any
> suggestions on how I can install the plugin on the Windows machine?  If not
> do you maybe know of another product I can use?

You could use VMWare to run it in a linux virtual machine.

-- 

John ([EMAIL PROTECTED])



Re: Google Summer of Code 2007 ...

2007-02-17 Thread Mark Martinec
On Saturday February 17 2007 03:01, Quinn Comendant wrote:
> How about an extensive statistics reporting tool, ..., that
> can show how well a current spamassassin installation is performing
> and where it needs improvements.

Well, not exactly by your words, but in the same spirit,
this time belonging to SA itsef:

Instrument SA with a couple of performance measuring probes,
providing some easier way to spot where bottlenecks lie.
Just something simple enough to tell, look, currently waiting
for Razor server response (or some RBL) is taking 80% of
elapsed time. Or, Bayes db is very sluggish, it is taking
5 seconds to provide a result.

A timing breakdown by subtasks is not that much work to provide,
but provides great insight into troubleshooting and performance
improvements.

Here is an example of a timing breakdown as currently provided
in the log (at log level 2) by amavisd-new, without getting into
specific details, except to say the numbers are elapsed time
for each subtask in milliseconds (and in percents, just for the
section, and then a cumulative percent of all sections so far):

TIMING [total 1840 ms] - SMTP pre-DATA-flush: 4 (0%)0, SMTP DATA: 95 (5%)5, 
check_init: 1 (0%)5, sql-enter: 69 (4%)9, mime_decode: 16 (1%)10,
get-file-type2: 26 (1%)11, parts_decode: 1 (0%)12, check_header: 3 (0%)12, 
AV-scan-1: 14 (1%)12, AV-scan-2: 20 (1%)14, spam-wb-list: 5 (0%)14,
SA call: 1517 (82%)96, update_cache: 3 (0%)97, decide_mail_destiny: 6 (0%)97,
^
write-header: 15 (1%)98, save-to-local-mailbox: 1 (0%)98,
prepare-dsn: 3 (0%)98, main_log_entry: 12 (1%)99, sql-update: 20 (1%)100,
update_snmp: 2 (0%)100, SMTP pre-response: 1 (0%)100, SMTP response: 1 (0%)
100, unlink-2-files: 1 (0%)100, rundown: 0 (0%)100

It tells at a glance that message checking and I/O for this particular
message took 1840 ms in total, that receiving a message over SMTP
for example took 5% of this, virus scaners were very quick (14 and 20 ms),
and SA call took 1517 ms, which is (82%) of all elapsed time,
all sections up to SA (cumulative) took 96% of total elapsed time.

Now, something of this relatively simple timing breakdown, but
drilled down into a SA call, telling the administrator where is it
worth spending his effort, or why all a sudden SA takes 10 seconds
instead of the usual 2.

  Mark


Re: Google Summer of Code 2007 ...

2007-02-17 Thread Theo Van Dinter
On Sat, Feb 17, 2007 at 06:56:28PM -0500, Tim B. wrote:
> How about a "How many times have I seen this message body" plugin...
> 
> So each time SA see's the same or similar enough message body, it 
> increases the score. 

Doesn't SA have at least 3 of those already?  Razor, DCC, and Pyzor.

-- 
Randomly Selected Tagline:
"I love deadlines.  I like the whooshing sound they make as they fly by."
   - Douglas Adams


pgpEbUumExLWy.pgp
Description: PGP signature


Re: Google Summer of Code 2007 ...

2007-02-17 Thread Tim B.

Justin Mason wrote:

Theo Van Dinter writes:
  

I'm assuming that there will be a Google Summer of Code 2007 going on, and
that the ASF will be involved again.  So it's a good time to start thinking
about things we'd like to put up as possible projects.

We still have a number of items from last year that we could use again.
Anything else that we'd like people to code up?



Also, any suggestions from outside the dev team?  Anyone got good ideas
for new SpamAssassin features that would be good to pay someone to work on
for 3 months?

--j.

  

How about a "How many times have I seen this message body" plugin...

So each time SA see's the same or similar enough message body, it 
increases the score. 



Re: Google Summer of Code 2007 ...

2007-02-17 Thread Chris St. Pierre

On Fri, 16 Feb 2007, Quinn Comendant wrote:


How about an extensive statistics reporting tool, possible
web-based, that can show how well a current spamassassin
installation is performing and where it needs improvements. It could
provide trends in different classes of spam and how each is
marked. Also show info on whether expensive (as in cpu time) rules
and plugins are actually doing any good.


I don't know that this belongs in SA itself.  It'd be a nice add-on,
but SA already does logging that should be quite sufficient to write
something like this.

Not to mention, the best measure of the success of a spam filtering
plan is user satisfaction.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University
--
Never send mail to [EMAIL PROTECTED]



Re: HarrisPoll

2007-02-17 Thread LuKreme

On 17-Feb-2007, at 06:39, Michael Scheidell wrote:

-Original Message-
From: LuKreme [mailto:[EMAIL PROTECTED]
Sent: Friday, February 16, 2007 1:26 PM
To: users@spamassassin.apache.org
Subject: HarrisPoll


Where does the WS-SURBL info come from?  I ask because the Harris
Poll email is getting tagged with it.  As far as I know, I've never
received spam from them, so I'd like to check out the actual rbl.


If harrispoll emailed you then you DID get spam from them.


I get mail from them all the time.

We get it all the time, and I don't know ANY user that signed up  
for it.


I did.


Did you sign up for it on purpose? ;-)


Yep.  ALthough I recently (Friday) unsubbed.

--
"I used to hate the sun, because it'd shone on everything I'd done.   
Made me feel that all that I had done was overfill the ashtray of my  
life."





Re: SA not working?

2007-02-17 Thread David Obando
Matt Kettler schrieb am 17.02.2007 15:08:
> David Obando wrote:
>   
>> Dear all,
>>
>> I installed SA on a Debian Etch system together with Postfix and Amavis.
>> Strangely SA doens't score mails at all, but I don't see why.
>>
>> See the output of a spam mail I checked manually, When I run a check on
>> the same mail on a different machine, it is scored:
>>
>> [EMAIL PROTECTED] tmp]# spamassassin -D < spam
>> 
>
> .
>
> Ok, we know SA works when invoked as the "spamassassin" script.. What's
> your amavis configuration for SpamAssassin like?
>
> Is your @local_domains_acl set correctly?
>
> What is your tag_level set to? (note: this doesn't mean tagged as spam,
> it means has any SA type headers added at all, set to -999 if you want
> sane behavior)
>
> See also the amavis faq:
>
> http://www.ijs.si/software/amavisd/#faq-spam
>
>
>   
Hi,

I don't think that SA is working because no test are made! When I
SA-check a GTUBE mail then it is not scored but it should be scored with
at least 1000 points!

The problem doesn't have to do with amavis but I post you my configs:

my @local_domains_acl is:

05-domain_id:@local_domains_acl = ( ".$mydomain" );

My tag levels:
20-debian_defaults:$sa_tag_level_deflt  = 2.0;  # add spam info headers
if at, or above that level
20-debian_defaults:$sa_tag2_level_deflt = 8.31; # add 'spam detected'
headers at that level
50-user:$sa_tag_level_deflt  = -10; # zeige Spam-Infos im Mail-Header
immer an

Regards,
David

-- 
The day microsoft makes something that doesn't suck is the day they start 
making vacuum cleaners.
gpg --keyserver pgp.mit.edu --recv-keys 1920BD87
Key fingerprint = 3326 32CE 888B DFF1 DED3  B8D2 105F 29CB 1920 BD87



Re: Export and append Bayes DB

2007-02-17 Thread Sam Przyswa

Michael Parker a écrit :

Sam Przyswa wrote:
  

Hi,

Is it possible to export a Bayes DB from a server and then append (not
restore) it to others servers ?




No, you generally can't combine two bayes databases that way.  Best bet
is to pick the most complete one and use it.

For more details see a really long post on the users mailing list from
me awhile back.
  


Ok, thanks.

Sam.



--
Ce message a été vérifié par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a été trouvé.



Re: Google Summer of Code 2007 ...

2007-02-17 Thread Matthew Wilson
Raul Dias writes:
**snip
> If I remember correctly spamd was using something between 2 to 5% of
> memory reported by top (45 process max).
>
> If it was really shared, it would have not collapsed.
>
> My bet is that the model used on Linux is copy on write.  So after a
> fork, when the child spamd changes a value, the kernel makes its own
> copy of the memory. (please correct me if I am wrong).  To make it worse
> perl script (AFAIK) is data and not code which makes harder to reuse
> (espcially with evals around).
>
> Even if sharing does happen it is not enough.
>
> OTOH, with an I/O model, the total memory used would be:
>  - the perl interpreter and libraries (this is trully shared on a fork
> model).
>  - the compiled perl code and perl libraries.
>  - one copy of the parsed rules and compiled regular expressions and non
>message/scanner related data.

Yeah.  It's the lists and rules and regexes that do it for me.

>  - one M::SA::PerMsgStatus object for each simultaneous scanned message
>(this is a place to put a limit on).
>
>> Still, if someone tries it and can demo increased efficiency...
>> go for it ;)
>
> This might require some internal changes to SA. Every Sync call would
> have to be changed to Async (NON BLOCKING). This might include SQL
> calls, DNS calls, exec ing external apps and even file I/O.


An async version of Net::DNS is
http://search.cpan.org/~msergeant/ParaDNS-1.1/



Re: Bayes db size....

2007-02-17 Thread Dave Koontz
Is there a consensus on this need?  I deal with the seen db issue by
scheduled deletion of that file.  That said,  with SA becoming more and
more prominent all the time, I suspect the Average Joe will miss this
oddity until they wind up with a sluggish system, out of drive space or
other related issues.

I was mostly curious of the logic on NOT doing maintenance on the Seen
and AWL db files.  If there is a consensus this needs to occur, then
perhaps I can take the time to create a proper patch.  I just want to
make sure I am not missing something fundamental here

Michael Parker wrote:
> Dave Koontz wrote:
>   
>> I am sure this has been asked numerous times before, but what is the logic
>> in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
>> have been removed from the DB there is little to no use for 'unlearning' any
>> associated messages.  Besides on a busy system, this seen file gets large
>> very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.
>>
>> 
>
> Patches welcome.
>
> Michael
>
>   
>



Re: SA not working?

2007-02-17 Thread Matt Kettler
David Obando wrote:
> Dear all,
>
> I installed SA on a Debian Etch system together with Postfix and Amavis.
> Strangely SA doens't score mails at all, but I don't see why.
>
> See the output of a spam mail I checked manually, When I run a check on
> the same mail on a different machine, it is scored:
>
> [EMAIL PROTECTED] tmp]# spamassassin -D < spam

.

Ok, we know SA works when invoked as the "spamassassin" script.. What's
your amavis configuration for SpamAssassin like?

Is your @local_domains_acl set correctly?

What is your tag_level set to? (note: this doesn't mean tagged as spam,
it means has any SA type headers added at all, set to -999 if you want
sane behavior)

See also the amavis faq:

http://www.ijs.si/software/amavisd/#faq-spam




Re: Bayes db size....

2007-02-17 Thread Michael Parker
Dave Koontz wrote:
> I am sure this has been asked numerous times before, but what is the logic
> in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
> have been removed from the DB there is little to no use for 'unlearning' any
> associated messages.  Besides on a busy system, this seen file gets large
> very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.
> 

Patches welcome.

Michael


> 
> -Original Message-
> From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 16, 2007 7:19 PM
> To: spam mailling list
> Subject: Re: Bayes db size
> 
> On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
>> So you're saying that right now seen isn't capped like tokens right?
> 
> seen has no max size nor expiry features.
> 
> --
> Randomly Selected Tagline:
> "Like any French restaurant in America, it was overpriced, noisy, moody,
> and would put you in mortal danger if you had an accident with anything
> larger than a croissant." - Unknown about the Renault LeCar
> 
> 



RE: HarrisPoll

2007-02-17 Thread Michael Scheidell


> -Original Message-
> From: LuKreme [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 16, 2007 1:26 PM
> To: users@spamassassin.apache.org
> Subject: HarrisPoll
> 
> 
> Where does the WS-SURBL info come from?  I ask because the Harris  
> Poll email is getting tagged with it.  As far as I know, I've never  
> received spam from them, so I'd like to check out the actual rbl.

If harrispoll emailed you then you DID get spam from them.

We get it all the time, and I don't know ANY user that signed up for it.

Did you sign up for it on purpose? ;-)


Re: Google Summer of Code 2007 ...

2007-02-17 Thread Raul Dias
On Sat, 2007-02-17 at 11:21 +, Justin Mason wrote:
> Raul Dias writes:
> > On Sat, 2007-02-17 at 02:07 +0100, Mark Martinec wrote:
> > > On Saturday February 17 2007 01:49, Matthew Wilson wrote:
> > > > I was/am primarily concerned with RAM usage for high-concurrency
> > > > situations.
> > > 
> > > Ok. Still, in my experience about 30 (maybe 50) SA processes can
> > > fully utilize today's CPU & I/O, and it's probably no big deal
> > > to provide about 2 GB of memory to cater for such system.
> > > Also, and unfortunately, multithreading in Perl is rather
> > > cumbersome and not significantly less expensive than fully
> > > individual processes.
> > 
> > After experiencing with the sa-blacklist.cf some time ago with 45
> > process brought my system to its knees with 3.5GB (out of memory).  
> > 
> > I agree about the thread model.
> > 
> > But sticking to a async I/O model is a valid point.  If implemented
> > correctly it will save a lot of memory and even improve performance a
> > little.
> > 
> > Having separeted process saves the need to have to check for garbage
> > after filtering a message, which will cause the code to have to be
> > recheck.  
> > 
> > However, for uniprocessor systems, having multiple process running is
> > actually more expansive than a async I/O one.  For multiple process
> > system, just keep one process for cpu or less.
> > 
> > In the past I have played a lot with perl-loop (any loopers around?)
> > which was the only way to go.  It is too low level for most people, but
> > perhaps POE is the way to go today (which can use perl-loop as its
> > base).
> 
> I'm dubious about the benefits for SpamAssassin...
> 
> An async model works very well for network-bound and I/O-bound servers;
> however, SpamAssassin is mainly CPU-bound, since the network and I/O parts
> are already mostly run async during the scan operation.
> 
> Also, the multiple spamd processes share quite a lot of RAM with each
> other -- there's a bug in how linux reports "shared" memory which makes it
> appear much worse than it is. read the FAQ for more details.

yep, but ...


01:01:37 kernel: Out of Memory: Killed process 10024 (spamd).
01:01:51 kernel: Out of Memory: Killed process 10044 (spamd).
01:02:05 kernel: Out of Memory: Killed process 10612 (spamd).
01:02:19 kernel: Out of Memory: Killed process 10038 (spamd).
01:02:32 kernel: Out of Memory: Killed process 10602 (spamd).
01:02:45 kernel: Out of Memory: Killed process 10398 (spamd).
01:03:04 kernel: Out of Memory: Killed process 10020 (spamd).
01:03:29 kernel: Out of Memory: Killed process 10015 (spamd).
01:03:42 kernel: Out of Memory: Killed process 10237 (spamd).
01:04:00 kernel: Out of Memory: Killed process 11037 (spamd).
01:04:18 kernel: Out of Memory: Killed process 10478 (spamd).
01:04:34 kernel: Out of Memory: Killed process 11065 (spamd).
01:04:40 kernel: Out of Memory: Killed process 10405 (spamd).
...and it goes...

If I remember correctly spamd was using something between 2 to 5% of
memory reported by top (45 process max).

If it was really shared, it would have not collapsed.

My bet is that the model used on Linux is copy on write.  So after a
fork, when the child spamd changes a value, the kernel makes its own
copy of the memory. (please correct me if I am wrong).  To make it worse
perl script (AFAIK) is data and not code which makes harder to reuse
(espcially with evals around).

Even if sharing does happen it is not enough.

OTOH, with an I/O model, the total memory used would be:
 - the perl interpreter and libraries (this is trully shared on a fork 
model).
 - the compiled perl code and perl libraries.
 - one copy of the parsed rules and compiled regular expressions and non
   message/scanner related data.
 - one M::SA::PerMsgStatus object for each simultaneous scanned message 
   (this is a place to put a limit on).

> Still, if someone tries it and can demo increased efficiency...
> go for it ;)

This might require some internal changes to SA. Every Sync call would
have to be changed to Async (NON BLOCKING). This might include SQL
calls, DNS calls, exec ing external apps and even file I/O.

-Raul Dias


> --j.



SA not working?

2007-02-17 Thread David Obando
Dear all,

I installed SA on a Debian Etch system together with Postfix and Amavis.
Strangely SA doens't score mails at all, but I don't see why.

See the output of a spam mail I checked manually, When I run a check on
the same mail on a different machine, it is scored:

[EMAIL PROTECTED] tmp]# spamassassin -D < spam
[20449] dbg: logger: adding facilities: all
[20449] dbg: logger: logging level is DBG
[20449] dbg: generic: SpamAssassin version 3.1.7-deb
[20449] dbg: config: score set 0 chosen.
[20449] dbg: util: running in taint mode? yes
[20449] dbg: util: taint mode: deleting unsafe environment variables,
resetting PATH
[20449] dbg: util: PATH included '/usr/local/sbin', keeping
[20449] dbg: util: PATH included '/usr/local/bin', keeping
[20449] dbg: util: PATH included '/usr/sbin', keeping
[20449] dbg: util: PATH included '/usr/bin', keeping
[20449] dbg: util: PATH included '/sbin', keeping
[20449] dbg: util: PATH included '/bin', keeping
[20449] dbg: util: PATH included '/usr/bin/X11', keeping
[20449] dbg: util: final PATH set to:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11
[20449] dbg: message:  MIME PARSER START 
[20449] dbg: message: main message type: text/plain
[20449] dbg: message: parsing normal part
[20449] dbg: message: added part, type: text/plain
[20449] dbg: message:  MIME PARSER END 
[20449] dbg: dns: is Net::DNS::Resolver available? yes
[20449] dbg: dns: Net::DNS version: 0.59
[20449] dbg: config: using "/etc/spamassassin" for site rules pre files
[20449] dbg: config: using "/var/lib/spamassassin/3.001007" for sys
rules pre files
[20449] dbg: config: read file
/var/lib/spamassassin/3.001007/saupdates_openprotect_com.pre
[20449] dbg: config: using "/var/lib/spamassassin/3.001007" for default
rules dir
[20449] dbg: config: read file
/var/lib/spamassassin/3.001007/saupdates_openprotect_com.cf
[20449] dbg: config: using "/etc/spamassassin" for site rules dir
[20449] dbg: config: using "/root/.spamassassin" for user state dir
[20449] dbg: config: using "/root/.spamassassin/user_prefs" for user
prefs file
[20449] dbg: config: read file /root/.spamassassin/user_prefs
[20449] dbg: plugin: fixed relative path:
/var/lib/spamassassin/3.001007/saupdates_openprotect_com/loadplugins.pre
[20449] dbg: config: using
"/var/lib/spamassassin/3.001007/saupdates_openprotect_com/loadplugins.pre"
for included file
[20449] dbg: config: read file
/var/lib/spamassassin/3.001007/saupdates_openprotect_com/loadplugins.pre
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::SPF from @INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::SPF=HASH(0x99dd224)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::Hashcash from @INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::Hashcash=HASH(0x99e74c0)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::RelayCountry
from @INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::RelayCountry=HASH(0x99c20b8)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::Razor2 from @INC
[20449] dbg: razor2: razor2 is available, version 2.81
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::Razor2=HASH(0x9a14b84)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::SpamCop from @INC
[20449] dbg: reporter: network tests on, attempting SpamCop
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::SpamCop=HASH(0x9d03cdc)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::URIDNSBL from @INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x9d3caa0)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::Pyzor from @INC
[20449] dbg: pyzor: network tests on, attempting Pyzor
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::Pyzor=HASH(0x9d5953c)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::AWL from @INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::AWL=HASH(0x9d73998)
[20449] dbg: plugin: loading
Mail::SpamAssassin::Plugin::AutoLearnThreshold from @INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::AutoLearnThreshold=HASH(0x9d810c8)
[20449] dbg: plugin: loading
Mail::SpamAssassin::Plugin::WhiteListSubject from @INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::WhiteListSubject=HASH(0x9d8dc5c)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::MIMEHeader from
@INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::MIMEHeader=HASH(0x9d98eec)
[20449] dbg: plugin: loading Mail::SpamAssassin::Plugin::ReplaceTags
from @INC
[20449] dbg: plugin: registered
Mail::SpamAssassin::Plugin::ReplaceTags=HASH(0x9da5e20)
[20449] dbg: plugin: fixed relative path:
/var/lib/spamassassin/3.001007/saupdates_openprotect_com/70_sare_adult.cf
[20449] dbg: config: using
"/var/lib/spamassassin/3.001007/saupdates_openprotect_com/70_sare_adult.cf"
for included file
[20449] dbg: config: read file
/var/lib/spamassassin/3.001007/saupdates_openprotect_com/70_sare_adult.cf
[20449] dbg: plugin: fixed relativ

Re: Google Summer of Code 2007 ...

2007-02-17 Thread Justin Mason

Raul Dias writes:
> On Sat, 2007-02-17 at 02:07 +0100, Mark Martinec wrote:
> > On Saturday February 17 2007 01:49, Matthew Wilson wrote:
> > > I was/am primarily concerned with RAM usage for high-concurrency
> > > situations.
> > 
> > Ok. Still, in my experience about 30 (maybe 50) SA processes can
> > fully utilize today's CPU & I/O, and it's probably no big deal
> > to provide about 2 GB of memory to cater for such system.
> > Also, and unfortunately, multithreading in Perl is rather
> > cumbersome and not significantly less expensive than fully
> > individual processes.
> 
> After experiencing with the sa-blacklist.cf some time ago with 45
> process brought my system to its knees with 3.5GB (out of memory).  
> 
> I agree about the thread model.
> 
> But sticking to a async I/O model is a valid point.  If implemented
> correctly it will save a lot of memory and even improve performance a
> little.
> 
> Having separeted process saves the need to have to check for garbage
> after filtering a message, which will cause the code to have to be
> recheck.  
> 
> However, for uniprocessor systems, having multiple process running is
> actually more expansive than a async I/O one.  For multiple process
> system, just keep one process for cpu or less.
> 
> In the past I have played a lot with perl-loop (any loopers around?)
> which was the only way to go.  It is too low level for most people, but
> perhaps POE is the way to go today (which can use perl-loop as its
> base).

I'm dubious about the benefits for SpamAssassin...

An async model works very well for network-bound and I/O-bound servers;
however, SpamAssassin is mainly CPU-bound, since the network and I/O parts
are already mostly run async during the scan operation.

Also, the multiple spamd processes share quite a lot of RAM with each
other -- there's a bug in how linux reports "shared" memory which makes it
appear much worse than it is. read the FAQ for more details.

Still, if someone tries it and can demo increased efficiency...
go for it ;)

--j.


RE: Bayes db size....

2007-02-17 Thread Dave Koontz
I am sure this has been asked numerous times before, but what is the logic
in having auto expiry on the bayes DB, and not seen?  Seems that once tokens
have been removed from the DB there is little to no use for 'unlearning' any
associated messages.  Besides on a busy system, this seen file gets large
very fast.  I'd vote for auto expiry and maintenance on seen as well as AWL.


-Original Message-
From: Theo Van Dinter [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 16, 2007 7:19 PM
To: spam mailling list
Subject: Re: Bayes db size

On Fri, Feb 16, 2007 at 06:17:36PM -0600, Robert Nicholson wrote:
> So you're saying that right now seen isn't capped like tokens right?

seen has no max size nor expiry features.

--
Randomly Selected Tagline:
"Like any French restaurant in America, it was overpriced, noisy, moody,
and would put you in mortal danger if you had an accident with anything
larger than a croissant." - Unknown about the Renault LeCar