Re: Bayes dbm sync/expire speedup suggestion

2010-11-02 Thread RW
On Tue, 02 Nov 2010 01:02:48 +
Martin Gregorie mar...@gregorie.org wrote:

 On Mon, 2010-11-01 at 23:05 +, RW wrote:
  On Mon, 01 Nov 2010 22:09:03 +
  Martin Gregorie mar...@gregorie.org wrote:
  
   On Mon, 2010-11-01 at 21:15 +, RW wrote:
I don't think it's a matter of locking-out updates - presumably
token updates that occur after the start of a sync should go
into the new journal file. 

   It may easily be a locking problem: To quote from the MySQL
   manual:
  
  
  SQL backends don't use journalling.
 
 But you're still using a batch operation to play the journal file into
 the database and, if the OP isn't using InnoDB its going to hold a
 table lock while it runs. That's going to block SA queries unless
 (ugh!) MySQL defaults to allowing dirty reads.

The OP isn't using SQL, and AFAIK SA doesn't use a journal file with
SQL.


Re: Bayes dbm sync/expire speedup suggestion

2010-11-02 Thread Matus UHLAR - fantomas
   On Mon, 2010-11-01 at 21:15 +, RW wrote:
I don't think it's a matter of locking-out updates - presumably
token updates that occur after the start of a sync should go into
the new journal file. 

  On Mon, 01 Nov 2010 22:09:03 +
  Martin Gregorie mar...@gregorie.org wrote:
   It may easily be a locking problem: To quote from the MySQL manual:

 On Mon, 2010-11-01 at 23:05 +, RW wrote:
  SQL backends don't use journalling.

On 02.11.10 01:02, Martin Gregorie wrote:
 But you're still using a batch operation to play the journal file into
 the database and, if the OP isn't using InnoDB its going to hold a table
 lock while it runs. That's going to block SA queries unless (ugh!) MySQL
 defaults to allowing dirty reads.

iiuc, that meant there should be no journal file when using sql for bayes. 
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.


Re: Bayes dbm sync/expire speedup suggestion

2010-11-02 Thread Martin Gregorie
On Tue, 2010-11-02 at 08:40 +, RW wrote:
 On Tue, 02 Nov 2010 01:02:48 +
 Martin Gregorie mar...@gregorie.org wrote:
 
  On Mon, 2010-11-01 at 23:05 +, RW wrote:
   On Mon, 01 Nov 2010 22:09:03 +
   Martin Gregorie mar...@gregorie.org wrote:
   
On Mon, 2010-11-01 at 21:15 +, RW wrote:
 I don't think it's a matter of locking-out updates - presumably
 token updates that occur after the start of a sync should go
 into the new journal file. 
 
It may easily be a locking problem: To quote from the MySQL
manual:
   
   
   SQL backends don't use journalling.
  
  But you're still using a batch operation to play the journal file into
  the database and, if the OP isn't using InnoDB its going to hold a
  table lock while it runs. That's going to block SA queries unless
  (ugh!) MySQL defaults to allowing dirty reads.
 
 The OP isn't using SQL, and AFAIK SA doesn't use a journal file with
 SQL.

I thought he said he was using a MySQL Bayes database. OK, I'll shut up
now.

Martin




Re: Bayes dbm sync/expire speedup suggestion

2010-11-02 Thread Martin Gregorie
 On 02.11.10 01:02, Martin Gregorie wrote:
  But you're still using a batch operation to play the journal file into
  the database and, if the OP isn't using InnoDB its going to hold a table
  lock while it runs. That's going to block SA queries unless (ugh!) MySQL
  defaults to allowing dirty reads.
 
 iiuc, that meant there should be no journal file when using sql for bayes. 

Partly correct. If your DB is using row locking I think there's no point
in using a periodic batch update.

However, performance will be dire if you do that with table locking.
As the only MySQL table storage that does row locking is InnoDB, you
must create your table(s) using InnoDB and avoid the use of MyISAM.

If you're already using MyISAM you should dump the table(s), recreate
them as InnoDB and then reload them. 


Martin




Re: Bayes dbm sync/expire speedup suggestion

2010-11-02 Thread Henrik K
On Tue, Nov 02, 2010 at 02:03:40PM +, Martin Gregorie wrote:
 
 If you're already using MyISAM you should dump the table(s), recreate
 them as InnoDB and then reload them. 

Um, alter table foo engine=InnoDB;



Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread Robert Blayzor
For the past several months I have been trying to find a way to make 
maintaining the SpamAssassin bayes database more effective on our SA  servers.  
We have several SA servers, all running bayes globally on the server, not per 
user.

Bayes generally does a good job but on a fairly busy server bayes can be less 
effective based on how you set the database up to learn/expire, etc.  So far 
I've followed just about every suggestion on trying to effectively maintain 
bayes, while bayes still works, it's not without some major problems mainly the 
one being when large syncs happen, the bayes token database can be locked out 
from other SA children for up to 10 minutes per sync.

Basically we have setup our servers for learn to journal and we sync the 
journal to the main bayes database about once an hour.  We've found that this 
process can take 8 to 10 minutes, give or take.

We recently moved the bayes database into a RAM disk to see if that would help, 
and while reads/seeks have sped up considerably, sync has not.  Expire does not 
seem to be a problem.

Correct me if I'm wrong, but when you have bayes_learn_to_journal enabled and 
then you run a sync, sa-learn basically moves bayes_journal to 
bayes_journal.old and then starts merging/adding tokens into bayes_toks.  When 
this happens, bayes_toks is locked for the entire time until the sync 
completes.  So that, for us means the bayes database is locked for about 10 
minutes an hour.  Expires do not seem to run that long.  In fact, expires 
finish about a minute.. which is acceptable.

Would it make more sense that when you do a learn_to_journal and a sync to make 
a copy of the bayes_toks database, say to bayes_toks.new and merge/add tokens 
from the journal to that?  Then, once the sync is complete you can lock and 
copy the .new to the current and continue.  This should only lockout the 
database from updates for only seconds (if that) rather than locking it out 
during the entire learn/add process.  I assume an expire could actually use the 
same logic for those of us using manually running expire/sync in cron and 
periodically rather than via auto methods.

Thoughts?  I guess my thought is to keep a read only version of bayes_toks at 
almost the whole time avoiding any lock contentions from the database being 
synced/expired.


Our current bayes config:

use_bayes1
bayes_auto_learn 1
bayes_auto_expire0
bayes_learn_to_journal   1
bayes_journal_max_size   0
bayes_expiry_max_db_size 100
lock_method  flock


SA 3.3.1 on FreeBSD 6.4
Perl 5.10

-- 
Robert Blayzor
INOC, LLC
rblay...@inoc.net
http://www.inoc.net/~rblayzor/






Re: Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread Michael Scheidell

On 11/1/10 10:28 AM, Robert Blayzor wrote:

lock_method  flock

Switch to the special mysql bayes.  it will also allow you to expire 
based on time (with some added table).

sync is dynamic but don't forget the cronjob to expire bayes daily.

--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
*| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best in Email Security,2010: Network Products Guide
   * King of Spam Filters, SC Magazine 2008

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
__  


Re: Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread Michael Scheidell



On 11/1/10 1:52 PM, Robert Blayzor wrote:

On Nov 1, 2010, at 10:38 AM, Michael Scheidell wrote:

Switch to the special mysql bayes.  it will also allow you to expire based on 
time (with some added table).
sync is dynamic but don't forget the cronjob to expire bayes daily.


Unfortunately mysql is not an option in our setup, and running a standalone 
instance of mysql on the server itself just to serve bayes seems like a lot of 
additional overhead.


then you will probably always have delays.


--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
*| *SECNAP Network Security Corporation

   * Certified SNORT Integrator
   * 2008-9 Hot Company Award Winner, World Executive Alliance
   * Five-Star Partner Program 2009, VARBusiness
   * Best in Email Security,2010: Network Products Guide
   * King of Spam Filters, SC Magazine 2008


__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/

__  

Re: Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread Robert Blayzor
On Nov 1, 2010, at 1:54 PM, Michael Scheidell wrote:
 then you will probably always have delays.


Hence my suggestion for making copies of the database to be worked on during 
the sync/expire process.  Then there should be virtually no delay other than 
lock/copy which should be virtually seconds instead of several minutes.

-- 
Robert Blayzor
INOC, LLC
rblay...@inoc.net
http://www.inoc.net/~rblayzor/






Re: Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread RW
On Mon, 1 Nov 2010 10:28:43 -0400
Robert Blayzor rblayzor.b...@inoc.net wrote:


 Would it make more sense that when you do a learn_to_journal and a
 sync to make a copy of the bayes_toks database, say to
 bayes_toks.new and merge/add tokens from the journal to that?
 Then, once the sync is complete you can lock and copy the .new to the
 current and continue.  This should only lockout the database from
 updates for only seconds (if that) rather than locking it out during
 the entire learn/add process.  

I don't think it's a matter of locking-out updates - presumably
token updates that occur after the start of a sync should go into the
new journal file. 

As I understand it there are two locks here, one is a lock-file that
locks-out other administrative actions (syncs, expires, backups etc),
the other is the reader-writer lock in gdb.  I suspect that the spamd
children are locked-out trying to get a read-lock on the DB rather
than on the lock-file. Either way, I think what you are saying makes
sense. 



Re: Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread Martin Gregorie
On Mon, 2010-11-01 at 21:15 +, RW wrote:
 I don't think it's a matter of locking-out updates - presumably
 token updates that occur after the start of a sync should go into the
 new journal file. 
 
It may easily be a locking problem: To quote from the MySQL manual:

MySQL uses table-level locking for MyISAM, MEMORY and MERGE tables,
page-level locking for BDB tables, and row-level locking for InnoDB
tables.

IOW, if the OP is *not* using InnoDB for his token tables, then all
locks are table locks and all update accesses, regardless of whether
they are hitting a few rows such as adding tokens, updating counts in
token or are processing the entire table, such as expiring tokens or
processing the journal, will lock the entire table. As a result locking
contention will be the norm rather than the exception and database
performance will be sub-optimal.

Migrating the tables to InnoDB storage should help considerably by
minimising lock contention since table locks will be replaced by row
locks. 

Its probable that changing over to direct token updating will also help
performance, since it eliminates the periodic batch process that applies
a large list of pending changes to the table. This process is both
relatively long running and, worse, if it is a single transaction, will
end up locking a fairly large proportion of the table. This in turn
increases the probability that a row requested by SA will be locked by
the batch update. OTOH, if SA applies the changes directly individual
row locks are only held for the minimum time and the probability of
locking conflicts are minimised because each update or query is only
accessing a few rows in a large table.


Martin




Re: Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread RW
On Mon, 01 Nov 2010 22:09:03 +
Martin Gregorie mar...@gregorie.org wrote:

 On Mon, 2010-11-01 at 21:15 +, RW wrote:
  I don't think it's a matter of locking-out updates - presumably
  token updates that occur after the start of a sync should go into
  the new journal file. 
  
 It may easily be a locking problem: To quote from the MySQL manual:


SQL backends don't use journalling.


Re: Bayes dbm sync/expire speedup suggestion

2010-11-01 Thread Martin Gregorie
On Mon, 2010-11-01 at 23:05 +, RW wrote:
 On Mon, 01 Nov 2010 22:09:03 +
 Martin Gregorie mar...@gregorie.org wrote:
 
  On Mon, 2010-11-01 at 21:15 +, RW wrote:
   I don't think it's a matter of locking-out updates - presumably
   token updates that occur after the start of a sync should go into
   the new journal file. 
   
  It may easily be a locking problem: To quote from the MySQL manual:
 
 
 SQL backends don't use journalling.

But you're still using a batch operation to play the journal file into
the database and, if the OP isn't using InnoDB its going to hold a table
lock while it runs. That's going to block SA queries unless (ugh!) MySQL
defaults to allowing dirty reads.


Martin