Re: simultaneous sa-learn processes

2005-07-11 Thread JamesDR

Chavdar Videff wrote:

Hi List,

Our mailserver server serves about 100 users. Our config: 
Sendmail+Procmail+SpamAssassin.

The question is:
If I got it right, we should run sa-learn for each user in order to benefit 
from bayes. We intend to run a cron job for each user and do it at night by 
supplying a daily snapshot of our spam and ham collections to sa-learn.

Can our mailserver handle it (256 MB RAM, Celeron 400 Mhz)?
A weekly collection run for 1 user usually eats 100% of CPU load. My concern 
is whether the system is going to crash or just do the job slower and if you 
can point out how many sa-learn tasks could we run simultaneously with our 
setup.
All hints will be appreciated, for we scheduled an initial load for 16 users 
of the big collection of spam received so far.


Thanks guys

Chavdar Videff


What kind of Bayes db are you using? We use MySQL here and haven't seen 
SA-Learn use up that much cpu... I've run it manually up to 10 processes 
at once without any noticeable slowing of the machine. (p2 450mhz, 256mb)


--
Thanks,
James



RE: simultaneous sa-learn processes

2005-07-11 Thread Sander Holthaus - Orange XL
JamesDR wrote:
 Chavdar Videff wrote:
 Hi List,
 
 Our mailserver server serves about 100 users. Our config:
 Sendmail+Procmail+SpamAssassin.
 The question is:
 If I got it right, we should run sa-learn for each user in order to
 benefit from bayes. We intend to run a cron job for each user and do
 it at night by supplying a daily snapshot of our spam and ham
 collections to sa-learn. Can our mailserver handle it (256 MB RAM,
 Celeron 400 Mhz)?

Why would you want to setup Bayes on a per user basis if you are going to
feeed it system-wide hams and spams? Especially feeding it systemwide hams
is odd.
 
 A weekly collection run for 1 user usually eats 100% of CPU load. My
 concern is whether the system is going to crash or just do the job
 slower and if you can point out how many sa-learn tasks could we run
 simultaneously with our setup.

Systems shouldn't crash under high load, so that's not a real concern. If it
does happen, you have a more serious problems elswhere. What would be more
of a concern is how it is going to affect other processes running on your
system. Slower is not a problem, but if you really put the load on your box
from a lot of processes, you might start seeing time-outs.

 All hints will be appreciated, for we scheduled an initial load for
 16 users of the big collection of spam received so far.

If your are going to simultaniously learn spam and ham for 16 users, and
want to keep running your mailserver/spamassassin too (it take you also have
a virusscanner running somewhere), I would consider at least running the
sa-learn processes under nice to keep them from stalling more essential
services. But, depending on your System setup (OS, DB, etc) you might want
to cut down a little on the number of processes run simultaniously. 

 
 Thanks guys
 
 Chavdar Videff
 
 
 What kind of Bayes db are you using? We use MySQL here and
 haven't seen SA-Learn use up that much cpu... I've run it
 manually up to 10 processes at once without any noticeable
 slowing of the machine. (p2 450mhz, 256mb)




Re: simultaneous sa-learn processes

2005-07-11 Thread Chavdar Videff
On Monday 11 July 2005 14:50, JamesDR wrote:
 Chavdar Videff wrote:
  Hi List,
 
  Our mailserver server serves about 100 users. Our config:
  Sendmail+Procmail+SpamAssassin.
  The question is:
  If I got it right, we should run sa-learn for each user in order to
  benefit from bayes. We intend to run a cron job for each user and do it
  at night by supplying a daily snapshot of our spam and ham collections to
  sa-learn. Can our mailserver handle it (256 MB RAM, Celeron 400 Mhz)?
  A weekly collection run for 1 user usually eats 100% of CPU load. My
  concern is whether the system is going to crash or just do the job slower
  and if you can point out how many sa-learn tasks could we run
  simultaneously with our setup.
  All hints will be appreciated, for we scheduled an initial load for 16
  users of the big collection of spam received so far.
 
  Thanks guys
 
  Chavdar Videff

 What kind of Bayes db are you using? We use MySQL here and haven't seen
 SA-Learn use up that much cpu... I've run it manually up to 10 processes
 at once without any noticeable slowing of the machine. (p2 450mhz, 256mb)

I guess it is BerkeleyDB, the default installation on Debian. The ineteresting 
part is that while testing cron on one user the cpu fall was not noticeable. 

Chavdar Videff


Re: simultaneous sa-learn processes

2005-07-11 Thread Kai Schaetzl
Chavdar Videff wrote on Mon, 11 Jul 2005 13:40:14 +0300:

 If I got it right, we should run sa-learn for each user in order to benefit 
 from bayes. We intend to run a cron job for each user and do it at night by 
 supplying a daily snapshot of our spam and ham collections to sa-learn.

Do I understand you correctly? You use Bayes for each user, but you want to 
sa-learn each of them the same daily corpus? This means the only difference in 
the user's Bayes db's will be auto-learned mail or mail learned by those users 
(if anything of that is possible/allowed with your setup). Doesn't look too 
useful to me. If most of the db content is the same then you could just use a 
site-wide db. Also, Bayes gets better the more mail it gets. If your users 
don't get many mail their individual Bayes db's won't be very effective. I'm 
all for using site-wide Bayes unless you users get really a lot of mail (I'd 
say at least 100 mails per user per day).

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





Re: simultaneous sa-learn processes

2005-07-11 Thread Chavdar Videff
On Monday 11 July 2005 15:31, Kai Schaetzl wrote:
 Chavdar Videff wrote on Mon, 11 Jul 2005 13:40:14 +0300:
  If I got it right, we should run sa-learn for each user in order to
  benefit from bayes. We intend to run a cron job for each user and do it
  at night by supplying a daily snapshot of our spam and ham collections to
  sa-learn.

 Do I understand you correctly? You use Bayes for each user, but you want to
 sa-learn each of them the same daily corpus? This means the only difference
 in the user's Bayes db's will be auto-learned mail or mail learned by those
 users (if anything of that is possible/allowed with your setup). Doesn't
 look too useful to me. If most of the db content is the same then you could
 just use a site-wide db. Also, Bayes gets better the more mail it gets. If
 your users don't get many mail their individual Bayes db's won't be very
 effective. I'm all for using site-wide Bayes unless you users get really a
 lot of mail (I'd say at least 100 mails per user per day).

 Kai
I thought it was installed site-wide, however the only bayes db's I find on 
the system are in each user's ~/.spamassassin folder. And indeed, the only 
way I can make bayes learn is by teaching it on a per-user basis. For quite a 
few months I collected spam, feeded it to sa-learn and finially reading this 
list relized that all I did was teach root's database. Everybody else did not 
benefit from bayes which was screwd because of autolearning a lot of spam to 
be ham. 
If there is a way to set up a single bayes database I would prefer that, for 
the scenario I am posting about does not make me happy (running 100 sa-learns 
at night).
Thanks
Chavdar



Re: simultaneous sa-learn processes

2005-07-11 Thread Kai Schaetzl
Chavdar Videff wrote on Mon, 11 Jul 2005 16:13:44 +0300:

 If there is a way to set up a single bayes database I would prefer that

There is one, just look in the SA documentation. (documentation for 
local.cf should do.)

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





Re: simultaneous sa-learn processes

2005-07-11 Thread jdow
From: Chavdar Videff [EMAIL PROTECTED]

 On Monday 11 July 2005 14:50, JamesDR wrote:
  Chavdar Videff wrote:
   Hi List,
  
   Our mailserver server serves about 100 users. Our config:
   Sendmail+Procmail+SpamAssassin.
   The question is:
   If I got it right, we should run sa-learn for each user in order to
   benefit from bayes. We intend to run a cron job for each user and do
it
   at night by supplying a daily snapshot of our spam and ham collections
to
   sa-learn. Can our mailserver handle it (256 MB RAM, Celeron 400 Mhz)?
   A weekly collection run for 1 user usually eats 100% of CPU load. My
   concern is whether the system is going to crash or just do the job
slower
   and if you can point out how many sa-learn tasks could we run
   simultaneously with our setup.
   All hints will be appreciated, for we scheduled an initial load for 16
   users of the big collection of spam received so far.
  
   Thanks guys
  
   Chavdar Videff
 
  What kind of Bayes db are you using? We use MySQL here and haven't seen
  SA-Learn use up that much cpu... I've run it manually up to 10 processes
  at once without any noticeable slowing of the machine. (p2 450mhz,
256mb)

 I guess it is BerkeleyDB, the default installation on Debian. The
ineteresting
 part is that while testing cron on one user the cpu fall was not
noticeable.

If feeding individual user Bayes feed with ham samples and spam samples
submitted by the particular user for HER Bayes. If you have them all
working off the same Bayes corpus then there is little or no gain to
using per user Bayes.

{^_^}




Re: simultaneous sa-learn processes

2005-07-11 Thread Robert Menschel
Hello Chavdar,

Monday, July 11, 2005, 3:40:14 AM, you wrote:

CV Hi List,

CV Our mailserver server serves about 100 users. Our config: 
CV Sendmail+Procmail+SpamAssassin.
CV The question is:
CV If I got it right, we should run sa-learn for each user in order to benefit
CV from bayes. We intend to run a cron job for each user and do it at night by
CV supplying a daily snapshot of our spam and ham collections to sa-learn.
CV Can our mailserver handle it (256 MB RAM, Celeron 400 Mhz)?
CV A weekly collection run for 1 user usually eats 100% of CPU load. My concern
CV is whether the system is going to crash or just do the job slower and if you
CV can point out how many sa-learn tasks could we run simultaneously with our
CV setup.
CV All hints will be appreciated, for we scheduled an initial load for 16 users
CV of the big collection of spam received so far.

As indicated in another email, doing a user-level learn of system-wide
collected ham/spam doesn't make much sense.  And if you take your
current system-wide collection and sa-learn it 100 times, you'll use
100 times more resources than learning it once.

On the other hand, if you meant that you'd sa-learn each individual
user's ham/spam for that user only, then move to the next, then
provided you do these one after the other sequentially (not all 100 at
once), you should not increase your system load at all.  (You will
increase your disk storage, since each user's database will take up
some disk space.)

As discussed in a couple of Bugzilla entries, you should probably
limit the size of your sa-learn runs -- limit them to a few hundred
emails at a time, or maybe a few meg combined size. A massive sa-learn
run of thousands of emails, dozens of meg in one run, can bring a
resource-limited system to its knees.

Bob Menschel