RE: slow sql bayes store
Are you running dstat (vstat/iostat) on the SQL server? I'd be interested in seeing what the disk/mem/procs are doing during a load situation. We don't have 9m rows in ours but with 1m and a simple processor (ht) with 4gb of ram it works without any significant problems. We are seeing on the outside 3 seconds to process a message under load (avg 5k), average .5sec normally. Now it's the same average if we have 1 message or 5 messages coming through (per server -- we have 4 of them). Is the database on the same box as SA? Ours is not so we count a little latency in there with ours as well. -Original Message- From: David Morton [mailto:[EMAIL PROTECTED] Sent: Thursday, August 10, 2006 7:39 PM To: Gary W. Smith Cc: users@spamassassin.apache.org Subject: Re: slow sql bayes store -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 This has been seen on a variety of systems, from my own small 1Ghz AMD system to dual xeon w/ SCSI drives On my somewhat slowish system: sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 2937 0 non-token data: nspam 0.000 0 30745 0 non-token data: nham 0.000 0 130608 0 non-token data: ntokens 0.000 0 1148246665 0 non-token data: oldest atime 0.000 0 1155262955 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1153091434 0 non-token data: last expiry atime 0.000 04847556 0 non-token data: last expire atime delta 0.000 0 13576 0 non-token data: last expire reduction count on a fast system with 10k SATA raptors: sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 09685479 0 non-token data: nspam 0.000 0 794330 0 non-token data: nham 0.000 0 143002 0 non-token data: ntokens 0.000 0 1155209840 0 non-token data: oldest atime 0.000 0 1155260496 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1155253048 0 non-token data: last expiry atime 0.000 0 43200 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count We have experimented with auto expiry, and batch expiry at night. So far, I haven't found a suitable answer. On factor I'm testing, but I don't think it made a difference, I reversed the order of the column in the index, since all of our mail is stored under one user, so user id = 1 isn't much of an index. Still, it doesn't seem to help. I'm intuitively guessing that it takes a while to write the new indexes, but I don't have anything to substantiate that. Gary W. Smith wrote: > In the past we have seen some slowness for bayes and AWL (mostly AWL). > We found after a couple million rows in AWL that the system starts > getting real slow. We setup a script to prune records that have a high > bayes threshold but a count of 1 (usually anonymous spammers). They > will not use that IP/sender combination again anyways. This keeps it > nice and tidy. > > As for the bayes, you might want to manually expire the old tokens. > That might help. > > But this is just a guess at this time. What would be more useful are > things like the number of records you have in the db, the hardware of > the DB (memory, etc), and any other good information that might help > make a better guess. > > Gary Wayne Smith > >> -Original Message- >> From: David Morton [mailto:[EMAIL PROTECTED] >> Sent: Thursday, August 10, 2006 2:28 PM >> To: users@spamassassin.apache.org >> Subject: slow sql bayes store >> > Greetings... > > On the Maia Mailguard mailing list, we have encountered a number of >> folks > (myself included) that are seeing some slow performance in the bayes > storage > when using mysql (innodb engine), taking anywhere from .5 to 10 >> seconds to > store/update all the tokens for a message. Has anyone else seen >> this? > > > -- > David Morton > Maia Mailguard- http://www.maiamailguard.com > Morton Software Design and Consulting - http://www.dgrmm.net - -- David Morton Maia Mailguard- http://www.maiamailguard.com Morton Software Design and Consulting - http://www.dgrmm.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE2+3VUy30ODPkzl0RAjimAJ9iroEdMbb/BOLsYPdA3ksvVPY1ZgCdHas0 tUI3n/PUTqzOH6WluBXykro= =s9MW -END PGP SIGNATURE-
Re: slow sql bayes store
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 This has been seen on a variety of systems, from my own small 1Ghz AMD system to dual xeon w/ SCSI drives On my somewhat slowish system: sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 2937 0 non-token data: nspam 0.000 0 30745 0 non-token data: nham 0.000 0 130608 0 non-token data: ntokens 0.000 0 1148246665 0 non-token data: oldest atime 0.000 0 1155262955 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1153091434 0 non-token data: last expiry atime 0.000 04847556 0 non-token data: last expire atime delta 0.000 0 13576 0 non-token data: last expire reduction count on a fast system with 10k SATA raptors: sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 09685479 0 non-token data: nspam 0.000 0 794330 0 non-token data: nham 0.000 0 143002 0 non-token data: ntokens 0.000 0 1155209840 0 non-token data: oldest atime 0.000 0 1155260496 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1155253048 0 non-token data: last expiry atime 0.000 0 43200 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count We have experimented with auto expiry, and batch expiry at night. So far, I haven't found a suitable answer. On factor I'm testing, but I don't think it made a difference, I reversed the order of the column in the index, since all of our mail is stored under one user, so user id = 1 isn't much of an index. Still, it doesn't seem to help. I'm intuitively guessing that it takes a while to write the new indexes, but I don't have anything to substantiate that. Gary W. Smith wrote: > In the past we have seen some slowness for bayes and AWL (mostly AWL). > We found after a couple million rows in AWL that the system starts > getting real slow. We setup a script to prune records that have a high > bayes threshold but a count of 1 (usually anonymous spammers). They > will not use that IP/sender combination again anyways. This keeps it > nice and tidy. > > As for the bayes, you might want to manually expire the old tokens. > That might help. > > But this is just a guess at this time. What would be more useful are > things like the number of records you have in the db, the hardware of > the DB (memory, etc), and any other good information that might help > make a better guess. > > Gary Wayne Smith > >> -Original Message- >> From: David Morton [mailto:[EMAIL PROTECTED] >> Sent: Thursday, August 10, 2006 2:28 PM >> To: users@spamassassin.apache.org >> Subject: slow sql bayes store >> > Greetings... > > On the Maia Mailguard mailing list, we have encountered a number of >> folks > (myself included) that are seeing some slow performance in the bayes > storage > when using mysql (innodb engine), taking anywhere from .5 to 10 >> seconds to > store/update all the tokens for a message. Has anyone else seen >> this? > > > -- > David Morton > Maia Mailguard- http://www.maiamailguard.com > Morton Software Design and Consulting - http://www.dgrmm.net - -- David Morton Maia Mailguard- http://www.maiamailguard.com Morton Software Design and Consulting - http://www.dgrmm.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFE2+3VUy30ODPkzl0RAjimAJ9iroEdMbb/BOLsYPdA3ksvVPY1ZgCdHas0 tUI3n/PUTqzOH6WluBXykro= =s9MW -END PGP SIGNATURE-
RE: slow sql bayes store
In the past we have seen some slowness for bayes and AWL (mostly AWL). We found after a couple million rows in AWL that the system starts getting real slow. We setup a script to prune records that have a high bayes threshold but a count of 1 (usually anonymous spammers). They will not use that IP/sender combination again anyways. This keeps it nice and tidy. As for the bayes, you might want to manually expire the old tokens. That might help. But this is just a guess at this time. What would be more useful are things like the number of records you have in the db, the hardware of the DB (memory, etc), and any other good information that might help make a better guess. Gary Wayne Smith > -Original Message- > From: David Morton [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 10, 2006 2:28 PM > To: users@spamassassin.apache.org > Subject: slow sql bayes store > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Greetings... > > On the Maia Mailguard mailing list, we have encountered a number of folks > (myself included) that are seeing some slow performance in the bayes > storage > when using mysql (innodb engine), taking anywhere from .5 to 10 seconds to > store/update all the tokens for a message. Has anyone else seen this? > > > > - -- > David Morton > Maia Mailguard- http://www.maiamailguard.com > Morton Software Design and Consulting - http://www.dgrmm.net > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFE26TLUy30ODPkzl0RAgNHAJ9UNgS4zudN5dAdkcOGw/ljmAe5tACgzzNQ > j0YStIUlkDn2qx9LXVZpUus= > =tvfh > -END PGP SIGNATURE-