Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread Robert Blayzor
For the past several months I have been trying to find a way to make 
maintaining the SpamAssassin bayes database more effective on our SA  servers.  
We have several SA servers, all running bayes globally on the server, not per 
user.

Bayes generally does a good job but on a fairly busy server bayes can be less 
effective based on how you set the database up to learn/expire, etc.  So far 
I've followed just about every suggestion on trying to effectively maintain 
bayes, while bayes still works, it's not without some major problems mainly the 
one being when large syncs happen, the bayes token database can be locked out 
from other SA children for up to 10 minutes per sync.

Basically we have setup our servers for learn to journal and we sync the 
journal to the main bayes database about once an hour.  We've found that this 
process can take 8 to 10 minutes, give or take.

We recently moved the bayes database into a RAM disk to see if that would help, 
and while reads/seeks have sped up considerably, sync has not.  Expire does not 
seem to be a problem.

Correct me if I'm wrong, but when you have bayes_learn_to_journal enabled and 
then you run a sync, sa-learn basically moves bayes_journal to 
bayes_journal.old and then starts merging/adding tokens into bayes_toks.  When 
this happens, bayes_toks is locked for the entire time until the sync 
completes.  So that, for us means the bayes database is locked for about 10 
minutes an hour.  Expires do not seem to run that long.  In fact, expires 
finish about a minute.. which is acceptable.

Would it make more sense that when you do a learn_to_journal and a sync to make 
a copy of the bayes_toks database, say to bayes_toks.new and merge/add tokens 
from the journal to that?  Then, once the sync is complete you can lock and 
copy the .new to the current and continue.  This should only lockout the 
database from updates for only seconds (if that) rather than locking it out 
during the entire learn/add process.  I assume an expire could actually use the 
same logic for those of us using manually running expire/sync in cron and 
periodically rather than via auto methods.

Thoughts?  I guess my thought is to keep a read only version of bayes_toks at 
almost the whole time avoiding any lock contentions from the database being 
synced/expired.


Our current bayes config:

use_bayes1
bayes_auto_learn 1
bayes_auto_expire0
bayes_learn_to_journal   1
bayes_journal_max_size   0
bayes_expiry_max_db_size 100
lock_method  flock


SA 3.3.1 on FreeBSD 6.4
Perl 5.10

-- 
Robert Blayzor
INOC, LLC
rblay...@inoc.net
http://www.inoc.net/~rblayzor/






Re: Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread Robert Blayzor
On Nov 1, 2010, at 1:54 PM, Michael Scheidell wrote:
 then you will probably always have delays.


Hence my suggestion for making copies of the database to be worked on during 
the sync/expire process.  Then there should be virtually no delay other than 
lock/copy which should be virtually seconds instead of several minutes.

-- 
Robert Blayzor
INOC, LLC
rblay...@inoc.net
http://www.inoc.net/~rblayzor/






Re: Using Pzyor with high volume

2008-05-02 Thread Robert Blayzor

On May 1, 2008, at 10:02 PM, Michael Hutchinson wrote:
Anyway, just thought you ought to know about the high volume thing.  
You

might get your end running sweet and fast, but it may cause rejected
lookups when you're scanning mail.




I'm pretty much putting Pyzor on the back burner for now.  Even with  
the ReadyExec method, I don't want to call an exec over NFS  
constantly... it's expensive on a large scale.  I could do something  
like create a memory disk and exec out of that, but it's just to much  
cobbing up.  I really hoped that something could be memory resident,  
ie: just loaded at start time, then just work.


Both dcc and razor2 both seem to be doing a good job now.

--
Robert Blayzor, BOFH
INOC, LLC
[EMAIL PROTECTED]
http://www.inoc.net/~rblayzor/

Mac OS X. Because making Unix user-friendly is easier than debugging  
Windows.









Using Pzyor with high volume

2008-04-30 Thread Robert Blayzor
In regards to Pyzor.  I'm wondering if anyone out there is using this  
at any large scale.  Unlike the razor-agent which appears to be a Perl  
module that gets loaded at startup, I'm  concerned about SA having to  
exec the python interpreter and having that setup/teardown time for  
each and every message.


Adding salt to the wound, our SA servers run on diskless servers; so  
having it have to run over NFS makes for a double whammy.


Is there a better way to implement Pyzor or is it not even worth the  
trouble?


TIA

--
Robert Blayzor, BOFH
INOC, LLC
[EMAIL PROTECTED]
http://www.inoc.net/~rblayzor/

Mac OS X. Because making Unix user-friendly is easier than debugging  
Windows.









Re: Using Pzyor with high volume

2008-04-30 Thread Robert Blayzor

On Apr 30, 2008, at 11:59 AM, Ben Poliakoff wrote:

Seems to be just the sort of thing to address your concern (short of
a perl implementation of the pyzor client).  I should note that *I*
haven't used the ReadyExec stuff in my environment [1] (where  
executing
the pyzor client hasn't been much of a resource drain), but I've  
thought

about it.


Yeah, I did run over this, but haven't had to much experience in  
installing/maintaining that.  That's why I'm trying to weigh the value  
of Pyzor vs. having to complicate the installation any more.  A Perl  
agent of Pyzor would be ideal.




[1] My environment supports about 2000 users scanning roughly 45000 -
7/day currently spread across two older linux boxes.



My setup is over 10X that, which is why this is a concern! ;-)

--
Robert Blayzor, BOFH
INOC, LLC
[EMAIL PROTECTED]
http://www.inoc.net/~rblayzor/

Mac OS X. Because making Unix user-friendly is easier than debugging  
Windows.









Re: Using Pzyor with high volume

2008-04-30 Thread Robert Blayzor

On Apr 30, 2008, at 2:04 PM, Jason J. Ellingson wrote:

Yup... I got the server portion running... The trick now is to get
SpamAssassin to use readyexec /tmp/pyzor instead of just pyzor...
Any suggestions?  I was looking at modifying Pyzor.pm in the
SpamAssassin perl directory.



My guess..

   pyzor_path STRING
   This option tells SpamAssassin specifically where to find  
the
   pyzor client instead of relying on SpamAssassin to find  
it in the
   current PATH.  Note that if taint mode is enabled in the  
Perl
   interpreter, you should use this, as the current PATH will  
have

   been cleared.



So...

pyzor_path readyexec --stop /tmp/pyzor


May work...  Even though ready exec is more lightweight than actually  
calling python each time, I'm still hoping that a non exec based  
plugin can appear someday. (again, if it's worth the trouble to do so).



--
Robert Blayzor, BOFH
INOC, LLC
[EMAIL PROTECTED]
http://www.inoc.net/~rblayzor/

Mac OS X. Because making Unix user-friendly is easier than debugging  
Windows.









Re: Global Bayes

2008-03-24 Thread Robert Blayzor


On Mar 24, 2008, at 11:08 AM, Mike Fahey wrote:

Just upgraded to 3.2.4.

I am running spamassasin as a normal user, not root.

I keep seeing this in the log files.

bayes: cannot open bayes databases /var/sabayes/.spamassassin/ 
bayes_* R/W: lock failed: File exists


There are about 20 lock files in the directory.

Is spamassassin not cleaning up the lock files properly or is it  
even working?


Looks like there have been changes here since version 3.2.0




I don't know of any specific changes between versions, but... whenever  
I noticed this happen it was almost always due to disk space,  
permissions or the fact you have autoexpire turned on.  Double check  
the permissions on your folders and make sure the user you run  
SpamAssassin under has the right privs required.  Stop SA, clean up  
the files, and try restarting.


A good idea (if you're running global bayes) is to turn off auto- 
expire and run a sa-learn force expire at a normal interval.  We've  
been running this way for years and it seems to perform just fine  
under 3.2.4.


--
Robert Blayzor
INOC
[EMAIL PROTECTED]
http://www.inoc.net/~rblayzor/

Mac OS X. Because making Unix user-friendly is easier than debugging  
Windows.







Re: SPF is hopelessly broken and must die!

2006-12-13 Thread Robert Blayzor
Marc Perkel wrote:
 SPF catches no spam - but does create false positives. It's less than
 useless. It's dangerous.


SPF's job is not to catch spam, period!  No matter how many times you
claim it's supposed to catch spam, you could never be more wrong.
It's sole purpose is to allow domain owners to publish valid mail
sources for their domains.  That's it's *only* purpose.  How and what
you decide to do with that published information from the TXT records is
totally up to the receiver.

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: 0x66F90BFC @ http://pgp.mit.edu
Key fingerprint = 6296 F715 038B 44C1 2720  292A 8580 500E 66F9 0BFC

State-of-the-art: What we could do with enough money.


Re: SPF is hopelessly broken and must die!

2006-12-13 Thread Robert Blayzor
Marc Perkel wrote:
 From openspf.org
 
 http://old.openspf.org/aspen.html


Also from the SPF FAQ:

Sender Policy Framework (SPF) is an attempt to control forged e-mail.
SPF is not directly about stopping spam – junk email. It is about giving
domain owners a way to say which mail sources are legitimate for their
domain and which ones aren't. While not all spam is forged, virtually
all forgeries are spam. SPF is not anti-spam in the same way that flour
is not food: it is part of the solution.


-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: 0x66F90BFC @ http://pgp.mit.edu
Key fingerprint = 6296 F715 038B 44C1 2720  292A 8580 500E 66F9 0BFC

You are in a dark room with a compiler, vi, an internet connection, and
a thermos of coffee.
 :Your Move ?


Re: SPF is hopelessly broken and must die!

2006-12-13 Thread Robert Blayzor
Marc Perkel wrote:
 SPF is not anti-spam in the same way that flour is not food: it is part of 
 the solution.
 
 The solution - to what? SPAM!


part of the solution, not the solution.  Big difference.
Controlling forgeries is just one step at taking one of the tools out of
the tool bag.  And those tool bags are pretty full, you have to start
somewhere.

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: 0x66F90BFC @ http://pgp.mit.edu
Key fingerprint = 6296 F715 038B 44C1 2720  292A 8580 500E 66F9 0BFC

Windows NT: Insert wallet into Drive A: and press any key to empty


Re: DCC worth it?

2006-10-19 Thread Robert Blayzor
Jeff Moss wrote:
 pain in the butt.  In particular dealing with its log files.  By default
 it creates thousands of them a day.  There is a way to cut that down to
 hundreds a day by editing the configuration file.  But you still have
 to run a cron job to keep them from eating your hard drive.


Not true.  You can disable logging completely in your conf file.
Something to the effect of leaving the following options empty...

DCCM_LOGDIR=
DCCM_LOG_AT=


-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: 0x66F90BFC @ http://pgp.mit.edu
Key fingerprint = 6296 F715 038B 44C1 2720  292A 8580 500E 66F9 0BFC

Any sufficiently advanced bug is indistinguishable from a feature.  -
Kulawiec


Re: Any comments of the SpamHaus lawsuit?

2006-10-11 Thread Robert Blayzor
Fabien GARZIANO wrote:
 Ok. I dun want to dive into a useless political debate... But personnally I 
 don't trust either bush team as it appears to me to be a new kind of 
 dictatorship. But the question is not 'should the US be trusted' but 'should 
 something like the internet be under the control of 1 country. My answer is 
 no.


What the hell does the bush team have to do with the Internet?  The US
government has mostly, to date, been hands off on any Internet policy.
While they still reserve the authority to intervene at any time.  Lets
not forget where the Internet started... by American innovation.  Giving
  control of Internet policy to an international body would be a waste
of time.  Just look at the record of the UN...  Yeah sure, lets give
control of the Internet to Russia, China and Korea... and you complain
about spam now???  LOL!

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: 0x66F90BFC @ http://pgp.mit.edu
Key fingerprint = 6296 F715 038B 44C1 2720  292A 8580 500E 66F9 0BFC

If I had it all to do over again, I'd spell creat with an e.  -
Kernighan


Re: Any comments of the SpamHaus lawsuit?

2006-10-11 Thread Robert Blayzor
Chris Santerre wrote:
 US products? What is that? I think the last US proiduct I purchased was
 an american flag. Come to think of itit might have been made
 somewhere else!


Yeah, America RD's everything; everyone else in the world just clones it.

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: 0x66F90BFC @ http://pgp.mit.edu
Key fingerprint = 6296 F715 038B 44C1 2720  292A 8580 500E 66F9 0BFC

Please send all spam to my main address, [EMAIL PROTECTED]


Re: X-Spam Status

2006-01-22 Thread Robert Blayzor
Spam Ass wrote:
 The only time I have run into an email not being tagged is when the
 email was over a certain size.  I believe the default max size is
 256kb.  This can be changed on a per user or global basis though.


Other times this will happen is when you're using spamd/spamc and a
timeout occurs between the client and the server.  If that happens spamc
returns the original message as unscanned.  If you have a high volume
server environment you have to do a lot of timeout tweaking to insure
most of your emails are scanned relatively quickly without deadlocking
the mail server or running the spamd box out of resources. ;-)

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: 0x66F90BFC @ http://pgp.mit.edu
Key fingerprint = 6296 F715 038B 44C1 2720  292A 8580 500E 66F9 0BFC

Any sufficiently advanced bug is indistinguishable from a feature.  -
Kulawiec


Re: spamd --max-spare ignored

2005-10-25 Thread Robert Blayzor
[EMAIL PROTECTED] wrote:
 I'm running spamd with --max-spare, but as soon as I start it, it spawns 
 --max-children children and keeps it there.
 
 I'm running 3.10 with these options:
 
 /usr/bin/spamd \
   --daemonize \
   --username=spamd \
   --round-robin \
   --max-children=20 \
   --max-spare=5 \
   --socketpath=/var/run/spam/spamd.sock \
   --pidfile=/var/run/spam/spamd.pid
 
 Are any of my settings incorrect?  Or could this be a bug?


Because you have specified --round-robin.  That tells spamd to use the
old way of forking processes.

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: http://www.inoc.net/~dev/
Key fingerprint = 1E02 DABE F989 BC03 3DF5  0E93 8D02 9D0B CB1A A7B0

A list is only as strong as its weakest link.  - Don Knuth


Re: server reached --max-clients setting

2005-10-04 Thread Robert Blayzor
JamesDR wrote:
   I got a lot of this messages in my maillog.

 Oct  4 08:27:13 server spamd[482]: prefork: server reached
 --max-clients setting, consider raising it
 Oct  4 08:27:13 server spamd[482]: prefork: server reached
 --max-clients setting, consider raising it
 Oct  4 08:28:53 server spamd[482]: prefork: server reached
 --max-clients setting, consider raising it



 Since you say your man page doesn't have it, I'll throw you a bone:
 
 -m number , --max-children=number


--max-children != --max-clients.  It's either a typo in the code or some
stealth option somewhere. ;-)

-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: http://www.inoc.net/~dev/
Key fingerprint = 1E02 DABE F989 BC03 3DF5  0E93 8D02 9D0B CB1A A7B0

Years of development: We finally got one to work.


Re: spamd children run as root (again)

2005-04-27 Thread Robert Blayzor
Brandon Kuczenski wrote:
 I've seen this question posted a couple times in the mailing list
 archives (from October 2004) but no resolution.  The question again:
 
 I'm running SpamAssassin 3.0.2 on FreeBSD 4.10 in spamc/spamd format
 with the '-u spamd' flag.  Problem is, all the child processes are
 running as root:


This has been a problem since 3.0.0 and I even submitted a patch in the
PR...  Dunno why this PR is being ignored by the devs...

http://bugzilla.spamassassin.org/show_bug.cgi?id=3897


-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor\@(inoc.net|gmail.com)
PGP: http://www.inoc.net/~dev/
Key fingerprint = 1E02 DABE F989 BC03 3DF5  0E93 8D02 9D0B CB1A A7B0

Pinky, you've left the lens cap of your mind on again.
 - The Brain