Re: SpamTips.org: Why run your own DNS server?
On 4-7-2011 20:35, Toni Mueller wrote: Hi, On Mon, 04.07.2011 at 13:23:42 +0200, Axbaxb.li...@gmail.com wrote: Well, together with the auth server it creates an ecosystem with some (limited) vendor lock-in capability. vendor lock-in ? be explicit, please. last I looked PowerDNS highlighted some custom DNS RR types (on the wire) that were not usable with any other software. Migrating from one DNS server package to another is imho difficult enough without that already. Kind regards, --Toni++ Well, if you're concerned about that then don't use the custom RR types that PowerDNS provides. I don't really see the fuss about it and saying it's vendor lock-in is simply not true. It's just some non-standard RR types and that's that. You're not being forced to use them or anything. Regards, Jorn
Re: Two more SA/MySQL questions.
Kris Deugau wrote: Steven W. Orr wrote: I. I am running MySQL version 5.067 which apparently is not enough to be allowed to run InnoDB. Strange; IIRC InnoDB has been available since some late 3.something versions, and most 4.x releases. The MySQL docs should be the authoritative reference for this though. Yes, up to my knowledge MySQL 5.0.xx supports InnoDB just fine. The questions is this: I thought that InnoDB was going to consume *more* resources because the purpose of it was to support transactions. Am I wrong? If I convert to a higher rev of MySQL and get InnoDB will I get *better* performance? Likely not; MySQL gets a great deal of its speed from... not doing transactions. Not having done any detailed performance tuning or testing, I can't say for sure - but from what I recall from my reading on this subject that's one of the points formally documented by the MySQL devs themselves. I can tell from experience that MyISAM is useless when it comes to Bayes. As pointed out by Benny Pedersen, MySQL will do nothing more than waiting on table locks. A single UPDATE query will take 30-90 seconds, and even more when you are on a busy site, not to mention the load of your MySQL server is going skyhigh. Your scantimes will increase dramatically to over 2-3 minutes of a single e-mail. If you, like me, get 25.000 emails a day to process you can't afford this. Using InnoDB is absolutely vital and you cannot use MyISAM at all from my experience. My bayes_token table is 12 million rows and increasing every day, and performance is still just fine. Hope this helps some. Jorn
Re: [Solved] Bad performance of Bayes with MySQL cluster
Jonas A. Larsen wrote: In case anybody else comes across the same, I've kicked out the MySQL cluster and now using MySQL with multi-master replication. There we can use InnoDB and this definitely solved all of the problems I had with bayes. Scantimes are now below 1 second. I don't have much load as of yet, so I expect this to increase somewhat during business hours, but all in all things look a lot more promising. I've used this howto: http://capttofu.livejournal.com/1752.html Hi Jorn I'm running the same kind of multimaster mysql replication scheme (my sql ndoes are in different datacenters so a cluster was not a good option for us) However I've seen several reports of people having problem with bayes and replication, since with multimaster replication you have to manually be sure there is no overlap in row id's etc. I cant remember 100% if it was an old issue or if it still should be an issue, I just thought I'd send you this warning, that replication might break if you habve a bayes DB on the dbms and the 2 masters write the same id/token whatever to the DB. So you might want to test thoroughly before you deploy in production, or at least if it breaks later you might remember this mail :) Hi Johas, Thanks a lot for your heads-up here. Actually I was realising this myself as well, especially since Bayes doesn't seem to be using MySQL's AUTO_INCREMENT property for its ids. I've seen a few articles where the different masters used different offsets for their AUTO_INCREMENT columns. Frankly this wasn't going to work for the bayes_token table at the very least. And even if it did work, you'd end up with different IDs, same token and different count values (purely theoretical if my logic isn't failing). For the sake of the KISS concept I decided to scrap the multi-master replication and made it a master-slave setup only, where all 3 boxes point to one master. Then the slave is present as a backup server only. I was expecting to need 2 masters because of performance reasons, but the single master works like a charm in production. I'm sticking to that for now and works surprisingly well with InnoDB and the my-4GB-huge-InnoDB.cnf (or whatever it's called). Cheers, Jorn Med venlig hilsen / Best regards Jonas Akrouh Larsen TechBiz ApS Laplandsgade 4, 2. sal 2300 København S Office: 7020 0979 Direct: 3336 9974 Mobile: 5120 1096 Fax:7020 0978 Web: www.techbiz.dk __ Information from ESET NOD32 Antivirus, version of virus signature database 4343 (20090817) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __ Information from ESET NOD32 Antivirus, version of virus signature database 4343 (20090817) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: Bad performance of Bayes with MySQL cluster
Henrik K wrote: On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote: Hi All, I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes, and all three of them are sharing the same bayes DB using a MySQL cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes with a quadcore and 4 GB of memory. Everything is working fine, even the AWL in SQL, except for Bayes. The bayes database currently houses a bit less than 500k tokens and the database size is not very big either, as the datanodes have less than 1 GB of storage in use. I've followed the instructions from the Spamassassin wiki, and I also used the supplied bayes_mysql.sql file to create my tables. In case anyone is interested, you can find the cluster.ini and the my.cnf used on the SQL nodes here: http://www.wcborstel.com/web/mysql/my.cnf skip-innodb That's pretty much the reason. You _need_ to use InnoDB as it has row level locking. MyISAM just kills Bayes. Actually I'm using NDB and not MyISAM. I need a clustered storage engine, otherwise the bayes DB can't really be shared. If I create an InnoDB table on one SQL node, it doesn't show up at the other SQL node, while this is the case with an NDB storage engine. What I can do however, is point all mailservers to one SQL node. I just need to synchronize the bayes_token table to the other SQL node I guess. Do you have an idea about this? Now the problem at the first glance seems to be, from my perspective (please correct me if I'm wrong), the actual queries being done. For every mail being scanned by spamassassin, it seems to be doing the SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token query every time. This effectively requesting the entire bayes_token table What you are seeing are expiry runs. As you right now use MyISAM, the whole table is locked for such operations so you are pretty much hosed. In any case, you should use bayes_auto_expire 0 and run expire for example once every night when traffic is slower. Thanks for this, I was not aware of it. Running expiry runs manually is done by sa-learn --force-expiry, correct? It seems that the query cache is either not suitable for this or I am doing something majorly wrong :) You are right. Better to disable completely if there's nothing else running that uses it and save little CPU. Good to know. There will be other applications running on it as well so I'll reduce the size of the query cache for a good bit. Thanks a lot for your feedback. Jorn __ Information from ESET NOD32 Antivirus, version of virus signature database 4336 (20090814) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Re: [Solved] Bad performance of Bayes with MySQL cluster
Henrik K wrote: On Sat, Aug 15, 2009 at 09:50:41AM +0200, Jorn Argelo wrote: Henrik K wrote: On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote: Hi All, I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes, and all three of them are sharing the same bayes DB using a MySQL cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes with a quadcore and 4 GB of memory. Everything is working fine, even the AWL in SQL, except for Bayes. The bayes database currently houses a bit less than 500k tokens and the database size is not very big either, as the datanodes have less than 1 GB of storage in use. I've followed the instructions from the Spamassassin wiki, and I also used the supplied bayes_mysql.sql file to create my tables. In case anyone is interested, you can find the cluster.ini and the my.cnf used on the SQL nodes here: http://www.wcborstel.com/web/mysql/my.cnf skip-innodb That's pretty much the reason. You _need_ to use InnoDB as it has row level locking. MyISAM just kills Bayes. Actually I'm using NDB and not MyISAM. I need a clustered storage engine, otherwise the bayes DB can't really be shared. If I create an InnoDB table on one SQL node, it doesn't show up at the other SQL node, while this is the case with an NDB storage engine. Ah right sorry.. I have no idea on NDB and how it performs for SA. What I can do however, is point all mailservers to one SQL node. I just need to synchronize the bayes_token table to the other SQL node I guess. Do you have an idea about this? MySQL replication? Maybe search on spamassassin-users archives to find experiences. Thanks for this, I was not aware of it. Running expiry runs manually is done by sa-learn --force-expiry, correct? Yep. In case anybody else comes across the same, I've kicked out the MySQL cluster and now using MySQL with multi-master replication. There we can use InnoDB and this definitely solved all of the problems I had with bayes. Scantimes are now below 1 second. I don't have much load as of yet, so I expect this to increase somewhat during business hours, but all in all things look a lot more promising. I've used this howto: http://capttofu.livejournal.com/1752.html Thanks for the pointers, Henrik. Regards, Jorn __ Information from ESET NOD32 Antivirus, version of virus signature database 4336 (20090814) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Bad performance of Bayes with MySQL cluster
Hi All, I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes, and all three of them are sharing the same bayes DB using a MySQL cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes with a quadcore and 4 GB of memory. Everything is working fine, even the AWL in SQL, except for Bayes. The bayes database currently houses a bit less than 500k tokens and the database size is not very big either, as the datanodes have less than 1 GB of storage in use. I've followed the instructions from the Spamassassin wiki, and I also used the supplied bayes_mysql.sql file to create my tables. In case anyone is interested, you can find the cluster.ini and the my.cnf used on the SQL nodes here: http://www.wcborstel.com/web/mysql/my.cnf http://www.wcborstel.com/web/mysql/cluster.ini I've been doing quite a bit of research and so on. First I thought it were the settings of my cluster, as I knew there was a lot to be tuned. Things like query cache sizes, thread cache, table cache, specific NDB settings et cetera. Unfortunately that didn't have seemed to help. I came to the conclusion that the bayes table was simply too heavily used. I have scantimes of 30-200+ seconds with bayes enabled, while I have scantimes under 8 seconds when disabling bayes. Now the problem at the first glance seems to be, from my perspective (please correct me if I'm wrong), the actual queries being done. For every mail being scanned by spamassassin, it seems to be doing the SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token query every time. This effectively requesting the entire bayes_token table, which can take up to 10-20 seconds. Now one would think that this is a nice canidate to cache. I would agree, unfortunately the MySQL query cache is not very efficient here, seeing as the atime of a token is being updated continuously. In other words, the cache is pretty much invalid most of the time. My Qcache hits is also very low (I noticed 8k inserts with about 250 cache hits). It seems that the query cache is either not suitable for this or I am doing something majorly wrong :) Here is how I came to my findings. Note I removed some SELECT RPAD rows to avoid spammyness (they show essentially the same as the other rows anyway): mysql show processlist\G *** 1. row *** Id: 1 User: system user Host: db: Command: Daemon Time: 0 State: Waiting for event from ndbcluster Info: NULL FROM bayes_token *** 3. row *** Id: 1464 User: bayes Host: :::1.2.3.4:57082 db: spamd Command: Query Time: 13 State: Sending data Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token *** 5. row *** Id: 1479 User: bayes Host: :::1.2.3.4:57133 db: spamd Command: Query Time: 24 State: Searching rows for update Info: UPDATE bayes_token SET atime = '1250259027' WHERE id = '3' AND token IN ('e?5?U','?;?6','?e?F?','? *** 8. row *** Id: 1485 User: bayes Host: :::1.2.3.4:57148 db: spamd Command: Query Time: 18 State: Sending data Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token *** 9. row *** Id: 1487 User: bayes Host: :::1.2.3.4:57155 db: spamd Command: Query Time: 18 State: Sending data Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token 12 rows in set (0.00 sec) As you can see, row #9 has been executing for 18 seconds already. I was first playing around with trying to create some additional indexes, but I've seen a couple of SELECT queries where the indexes where actually used and that was pretty quick. Now I am by far not a MySQL guru, so again, if anyone has any info in regards to creating additional indexes I would love to hear them. Currently I don't have any indexes other than those provided by the bayes_mysql.sql file. Currently I'm running my mail servers without bayes where they are performing fine. Does anyone have any recommendations or experiences with this? Or perhaps is there more information needed? Also will adding more memory to my datanodes solve anything? Thanks a lot for any feedback. Best regards, Jorn Argelo __ Information from ESET NOD32 Antivirus, version of virus signature database 4336 (20090814) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com