Re: Bayes dbm sync/expire speedup suggestion
On Tue, Nov 02, 2010 at 02:03:40PM +, Martin Gregorie wrote: > > If you're already using MyISAM you should dump the table(s), recreate > them as InnoDB and then reload them. Um, alter table foo engine=InnoDB;
Re: Bayes dbm sync/expire speedup suggestion
> On 02.11.10 01:02, Martin Gregorie wrote: > > But you're still using a batch operation to play the journal file into > > the database and, if the OP isn't using InnoDB its going to hold a table > > lock while it runs. That's going to block SA queries unless (ugh!) MySQL > > defaults to allowing dirty reads. > > iiuc, that meant there should be no journal file when using sql for bayes. > Partly correct. If your DB is using row locking I think there's no point in using a periodic batch update. However, performance will be dire if you do that with table locking. As the only MySQL table storage that does row locking is InnoDB, you must create your table(s) using InnoDB and avoid the use of MyISAM. If you're already using MyISAM you should dump the table(s), recreate them as InnoDB and then reload them. Martin
Re: Bayes dbm sync/expire speedup suggestion
On Tue, 2010-11-02 at 08:40 +, RW wrote: > On Tue, 02 Nov 2010 01:02:48 + > Martin Gregorie wrote: > > > On Mon, 2010-11-01 at 23:05 +, RW wrote: > > > On Mon, 01 Nov 2010 22:09:03 + > > > Martin Gregorie wrote: > > > > > > > On Mon, 2010-11-01 at 21:15 +, RW wrote: > > > > > I don't think it's a matter of locking-out updates - presumably > > > > > token updates that occur after the start of a sync should go > > > > > into the new journal file. > > > > > > > > > It may easily be a locking problem: To quote from the MySQL > > > > manual: > > > > > > > > > SQL backends don't use journalling. > > > > > But you're still using a batch operation to play the journal file into > > the database and, if the OP isn't using InnoDB its going to hold a > > table lock while it runs. That's going to block SA queries unless > > (ugh!) MySQL defaults to allowing dirty reads. > > The OP isn't using SQL, and AFAIK SA doesn't use a journal file with > SQL. I thought he said he was using a MySQL Bayes database. OK, I'll shut up now. Martin
Re: Bayes dbm sync/expire speedup suggestion
> > > On Mon, 2010-11-01 at 21:15 +, RW wrote: > > > > I don't think it's a matter of locking-out updates - presumably > > > > token updates that occur after the start of a sync should go into > > > > the new journal file. > > On Mon, 01 Nov 2010 22:09:03 + > > Martin Gregorie wrote: > > > It may easily be a locking problem: To quote from the MySQL manual: > On Mon, 2010-11-01 at 23:05 +, RW wrote: > > SQL backends don't use journalling. On 02.11.10 01:02, Martin Gregorie wrote: > But you're still using a batch operation to play the journal file into > the database and, if the OP isn't using InnoDB its going to hold a table > lock while it runs. That's going to block SA queries unless (ugh!) MySQL > defaults to allowing dirty reads. iiuc, that meant there should be no journal file when using sql for bayes. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Linux - It's now safe to turn on your computer. Linux - Teraz mozete pocitac bez obav zapnut.
Re: Bayes dbm sync/expire speedup suggestion
On Tue, 02 Nov 2010 01:02:48 + Martin Gregorie wrote: > On Mon, 2010-11-01 at 23:05 +, RW wrote: > > On Mon, 01 Nov 2010 22:09:03 + > > Martin Gregorie wrote: > > > > > On Mon, 2010-11-01 at 21:15 +, RW wrote: > > > > I don't think it's a matter of locking-out updates - presumably > > > > token updates that occur after the start of a sync should go > > > > into the new journal file. > > > > > > > It may easily be a locking problem: To quote from the MySQL > > > manual: > > > > > > SQL backends don't use journalling. > > > But you're still using a batch operation to play the journal file into > the database and, if the OP isn't using InnoDB its going to hold a > table lock while it runs. That's going to block SA queries unless > (ugh!) MySQL defaults to allowing dirty reads. The OP isn't using SQL, and AFAIK SA doesn't use a journal file with SQL.
Re: Bayes dbm sync/expire speedup suggestion
On Mon, 2010-11-01 at 23:05 +, RW wrote: > On Mon, 01 Nov 2010 22:09:03 + > Martin Gregorie wrote: > > > On Mon, 2010-11-01 at 21:15 +, RW wrote: > > > I don't think it's a matter of locking-out updates - presumably > > > token updates that occur after the start of a sync should go into > > > the new journal file. > > > > > It may easily be a locking problem: To quote from the MySQL manual: > > > SQL backends don't use journalling. > But you're still using a batch operation to play the journal file into the database and, if the OP isn't using InnoDB its going to hold a table lock while it runs. That's going to block SA queries unless (ugh!) MySQL defaults to allowing dirty reads. Martin
Re: Bayes dbm sync/expire speedup suggestion
On Mon, 01 Nov 2010 22:09:03 + Martin Gregorie wrote: > On Mon, 2010-11-01 at 21:15 +, RW wrote: > > I don't think it's a matter of locking-out updates - presumably > > token updates that occur after the start of a sync should go into > > the new journal file. > > > It may easily be a locking problem: To quote from the MySQL manual: SQL backends don't use journalling.
Re: Bayes dbm sync/expire speedup suggestion
On Mon, 2010-11-01 at 21:15 +, RW wrote: > I don't think it's a matter of locking-out updates - presumably > token updates that occur after the start of a sync should go into the > new journal file. > It may easily be a locking problem: To quote from the MySQL manual: MySQL uses table-level locking for MyISAM, MEMORY and MERGE tables, page-level locking for BDB tables, and row-level locking for InnoDB tables. IOW, if the OP is *not* using InnoDB for his token tables, then all locks are table locks and all update accesses, regardless of whether they are hitting a few rows such as adding tokens, updating counts in token or are processing the entire table, such as expiring tokens or processing the journal, will lock the entire table. As a result locking contention will be the norm rather than the exception and database performance will be sub-optimal. Migrating the tables to InnoDB storage should help considerably by minimising lock contention since table locks will be replaced by row locks. Its probable that changing over to direct token updating will also help performance, since it eliminates the periodic batch process that applies a large list of pending changes to the table. This process is both relatively long running and, worse, if it is a single transaction, will end up locking a fairly large proportion of the table. This in turn increases the probability that a row requested by SA will be locked by the batch update. OTOH, if SA applies the changes directly individual row locks are only held for the minimum time and the probability of locking conflicts are minimised because each update or query is only accessing a few rows in a large table. Martin
Re: Bayes dbm sync/expire speedup suggestion
On Mon, 1 Nov 2010 10:28:43 -0400 Robert Blayzor wrote: > Would it make more sense that when you do a learn_to_journal and a > sync to make a copy of the bayes_toks database, say to > "bayes_toks.new" and merge/add tokens from the journal to that? > Then, once the sync is complete you can lock and copy the .new to the > current and continue. This should only lockout the database from > updates for only seconds (if that) rather than locking it out during > the entire learn/add process. I don't think it's a matter of locking-out updates - presumably token updates that occur after the start of a sync should go into the new journal file. As I understand it there are two locks here, one is a lock-file that locks-out other administrative actions (syncs, expires, backups etc), the other is the reader-writer lock in gdb. I suspect that the spamd children are locked-out trying to get a read-lock on the DB rather than on the lock-file. Either way, I think what you are saying makes sense.
Re: Bayes dbm sync/expire speedup suggestion
On Nov 1, 2010, at 1:54 PM, Michael Scheidell wrote: > then you will probably always have delays. Hence my suggestion for making copies of the database to be worked on during the sync/expire process. Then there should be virtually no delay other than lock/copy which should be virtually seconds instead of several minutes. -- Robert Blayzor INOC, LLC rblay...@inoc.net http://www.inoc.net/~rblayzor/
Re: Bayes dbm sync/expire speedup suggestion
On 11/1/10 1:52 PM, Robert Blayzor wrote: On Nov 1, 2010, at 10:38 AM, Michael Scheidell wrote: Switch to the special mysql bayes. it will also allow you to expire based on time (with some added table). sync is dynamic but don't forget the cronjob to expire bayes daily. Unfortunately mysql is not an option in our setup, and running a standalone instance of mysql on the server itself just to serve bayes seems like a lot of additional overhead. then you will probably always have delays. -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259 ISN: 1259*1300 >*| *SECNAP Network Security Corporation * Certified SNORT Integrator * 2008-9 Hot Company Award Winner, World Executive Alliance * Five-Star Partner Program 2009, VARBusiness * Best in Email Security,2010: Network Products Guide * King of Spam Filters, SC Magazine 2008 __ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ __
Re: Bayes dbm sync/expire speedup suggestion
On 11/1/10 10:28 AM, Robert Blayzor wrote: lock_method flock Switch to the special mysql bayes. it will also allow you to expire based on time (with some added table). sync is dynamic but don't forget the cronjob to expire bayes daily. -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259 ISN: 1259*1300 >*| *SECNAP Network Security Corporation * Certified SNORT Integrator * 2008-9 Hot Company Award Winner, World Executive Alliance * Five-Star Partner Program 2009, VARBusiness * Best in Email Security,2010: Network Products Guide * King of Spam Filters, SC Magazine 2008 __ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ __
Bayes dbm sync/expire speedup suggestion
For the past several months I have been trying to find a way to make maintaining the SpamAssassin bayes database more effective on our SA servers. We have several SA servers, all running bayes globally on the server, not per user. Bayes generally does a good job but on a fairly busy server bayes can be less effective based on how you set the database up to learn/expire, etc. So far I've followed just about every suggestion on trying to effectively maintain bayes, while bayes still works, it's not without some major problems mainly the one being when large syncs happen, the bayes token database can be locked out from other SA children for up to 10 minutes per sync. Basically we have setup our servers for "learn to journal" and we sync the journal to the main bayes database about once an hour. We've found that this process can take 8 to 10 minutes, give or take. We recently moved the bayes database into a RAM disk to see if that would help, and while reads/seeks have sped up considerably, sync has not. Expire does not seem to be a problem. Correct me if I'm wrong, but when you have bayes_learn_to_journal enabled and then you run a sync, sa-learn basically moves bayes_journal to bayes_journal.old and then starts merging/adding tokens into bayes_toks. When this happens, bayes_toks is locked for the entire time until the sync completes. So that, for us means the bayes database is locked for about 10 minutes an hour. Expires do not seem to run that long. In fact, expires finish about a minute.. which is acceptable. Would it make more sense that when you do a learn_to_journal and a sync to make a copy of the bayes_toks database, say to "bayes_toks.new" and merge/add tokens from the journal to that? Then, once the sync is complete you can lock and copy the .new to the current and continue. This should only lockout the database from updates for only seconds (if that) rather than locking it out during the entire learn/add process. I assume an expire could actually use the same logic for those of us using manually running expire/sync in cron and periodically rather than via auto methods. Thoughts? I guess my thought is to keep a read only version of bayes_toks at almost the whole time avoiding any lock contentions from the database being synced/expired. Our current bayes config: use_bayes1 bayes_auto_learn 1 bayes_auto_expire0 bayes_learn_to_journal 1 bayes_journal_max_size 0 bayes_expiry_max_db_size 100 lock_method flock SA 3.3.1 on FreeBSD 6.4 Perl 5.10 -- Robert Blayzor INOC, LLC rblay...@inoc.net http://www.inoc.net/~rblayzor/