Re: [Solved] Bad performance of Bayes with MySQL cluster

2009-10-30 Thread furban


John Hardin wrote:
> 
> Suggestion: open a feature request bug to allow bayes autolearn to use a 
> different database connection string than bayes scoring. That way you 
> could configure all the daemons' autolearns to write to the master, but 
> distribute their scoring queries across X number of replicated slaves...
> 
There are to feature requests active.
Bug 4508 asking for this problem here.
Bug 5998 asking for solving the problem with a master-master replication
database
In both Bugs there is a patch included to solve the problem. I just asked
the SpamAssassin Team if these patches will be included in SA3.3 but the
answer was:

--- Comment #6 from Justin Mason  2009-10-30 05:51:10 UTC ---
(In reply to comment #5)
> Bug 5998. This feature will be included in SA 3.3?
Nope -- Michael is about the nearest thing we have on the dev team to our
SQL
expert.  if he vetoes it, that's a big problem

> What about Bug 4508? I like more to have the possibility to use another
> server
> for writing data than to read from.
> I'm using SQLGrey in this way today.

without someone reviewing and approving the patches, we can't apply them

So it seemed that some more people need to test the patches...

-- 
View this message in context: 
http://old.nabble.com/Bad-performance-of-Bayes-with-MySQL-cluster-tp24975811p26129756.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: [Solved] Bad performance of Bayes with MySQL cluster

2009-08-17 Thread John Hardin

On Tue, 18 Aug 2009, Jorn Argelo wrote:

For the sake of the KISS concept I decided to scrap the multi-master 
replication and made it a master-slave setup only, where all 3 boxes 
point to one master. Then the slave is present as a backup server only. 
I was expecting to need 2 masters because of performance reasons, but 
the single master works like a charm in production. I'm sticking to that 
for now and works surprisingly well with InnoDB and the 
my-4GB-huge-InnoDB.cnf (or whatever it's called).


I assume you're using autolearning. Master+multislave can support 
distributed scoring if you're manually learning, as sa-learn can be 
configured to talk to the master while the SA daemons cal query the 
slave(s).


Suggestion: open a feature request bug to allow bayes autolearn to use a 
different database connection string than bayes scoring. That way you 
could configure all the daemons' autolearns to write to the master, but 
distribute their scoring queries across X number of replicated slaves...


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The real opiate of the masses isn't religion; it's the belief that
  somewhere there is a benefit that can be delivered without a
  corresponding cost.   -- Tom of "Radio Free NJ"
---
 7 days until the 1930th anniversary of the destruction of Pompeii


Re: [Solved] Bad performance of Bayes with MySQL cluster

2009-08-17 Thread Jorn Argelo

Jonas A. Larsen wrote:

In case anybody else comes across the same, I've kicked out the MySQL
cluster and now using MySQL with multi-master replication. There we can
use InnoDB and this definitely solved all of the problems I had with
bayes. Scantimes are now below 1 second. I don't have much load as of
yet, so I expect this to increase somewhat during business hours, but
all in all things look a lot more promising. I've used this howto:
http://capttofu.livejournal.com/1752.html




Hi Jorn

I'm running the same kind of multimaster mysql replication scheme (my sql
ndoes are in different datacenters so a cluster was not a good option for
us)

However I've seen several reports of people having problem with bayes and
replication, since with multimaster replication you have to manually be sure
there is no overlap in row id's etc.

I cant remember 100% if it was an old issue or if it still should be an
issue, I just thought I'd send you this warning, that replication might
break if you habve a bayes DB on the dbms and the 2 masters write the same
id/token whatever to the DB.

So you might want to test thoroughly before you deploy in production, or at
least if it breaks later you might remember this mail :)
  

Hi Johas,

Thanks a lot for your heads-up here.

Actually I was realising this myself as well, especially since Bayes 
doesn't seem to be using MySQL's AUTO_INCREMENT property for its ids. 
I've seen a few articles where the different masters used different 
offsets for their AUTO_INCREMENT columns. Frankly this wasn't going to 
work for the bayes_token table at the very least. And even if it did 
work, you'd end up with different IDs, same token and different count 
values (purely theoretical if my logic isn't failing).


For the sake of the KISS concept I decided to scrap the multi-master 
replication and made it a master-slave setup only, where all 3 boxes 
point to one master. Then the slave is present as a backup server only. 
I was expecting to need 2 masters because of performance reasons, but 
the single master works like a charm in production. I'm sticking to that 
for now and works surprisingly well with InnoDB and the 
my-4GB-huge-InnoDB.cnf (or whatever it's called).


Cheers,
Jorn



Med venlig hilsen / Best regards
 
Jonas Akrouh Larsen
 
TechBiz ApS

Laplandsgade 4, 2. sal
2300 København S
 
Office: 7020 0979

Direct: 3336 9974
Mobile: 5120 1096
Fax:7020 0978
Web: www.techbiz.dk





__ Information from ESET NOD32 Antivirus, version of virus signature 
database 4343 (20090817) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com


  




__ Information from ESET NOD32 Antivirus, version of virus signature 
database 4343 (20090817) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com




RE: [Solved] Bad performance of Bayes with MySQL cluster

2009-08-17 Thread Jonas A. Larsen
> >
> In case anybody else comes across the same, I've kicked out the MySQL
> cluster and now using MySQL with multi-master replication. There we can
> use InnoDB and this definitely solved all of the problems I had with
> bayes. Scantimes are now below 1 second. I don't have much load as of
> yet, so I expect this to increase somewhat during business hours, but
> all in all things look a lot more promising. I've used this howto:
> http://capttofu.livejournal.com/1752.html
> 

Hi Jorn

I'm running the same kind of multimaster mysql replication scheme (my sql
ndoes are in different datacenters so a cluster was not a good option for
us)

However I've seen several reports of people having problem with bayes and
replication, since with multimaster replication you have to manually be sure
there is no overlap in row id's etc.

I cant remember 100% if it was an old issue or if it still should be an
issue, I just thought I'd send you this warning, that replication might
break if you habve a bayes DB on the dbms and the 2 masters write the same
id/token whatever to the DB.

So you might want to test thoroughly before you deploy in production, or at
least if it breaks later you might remember this mail :)



Med venlig hilsen / Best regards
 
Jonas Akrouh Larsen
 
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
 
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Fax:7020 0978
Web: www.techbiz.dk





Re: [Solved] Bad performance of Bayes with MySQL cluster

2009-08-15 Thread Jorn Argelo

Henrik K wrote:

On Sat, Aug 15, 2009 at 09:50:41AM +0200, Jorn Argelo wrote:
  

Henrik K wrote:


On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote:
  
  

Hi All,

I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three 
boxes,  and all three of them are sharing the same bayes DB using a 
MySQL  cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 
datanodes  with a quadcore and 4 GB of memory. Everything is working 
fine, even the  AWL in SQL, except for Bayes. The bayes database 
currently houses a bit  less than 500k tokens and the database size 
is not very big either, as  the datanodes have less than 1 GB of 
storage in use. I've followed the  instructions from the Spamassassin 
wiki, and I also used the supplied  bayes_mysql.sql file to create my 
tables. In case anyone is interested,  you can find the cluster.ini 
and the my.cnf used on the SQL nodes here:


http://www.wcborstel.com/web/mysql/my.cnf



skip-innodb

That's pretty much the reason. You _need_ to use InnoDB as it has row level
locking. MyISAM just kills Bayes.
  
  
Actually I'm using NDB and not MyISAM. I need a clustered storage  
engine, otherwise the bayes DB can't really be shared. If I create an  
InnoDB table on one SQL node, it doesn't show up at the other SQL node,  
while this is the case with an NDB storage engine.



Ah right sorry.. I have no idea on NDB and how it performs for SA.

  
What I can do however, is point all mailservers to one SQL node. I just  
need to synchronize the bayes_token table to the other SQL node I guess.  
Do you have an idea about this?



MySQL replication? Maybe search on spamassassin-users archives to find
experiences.

  
Thanks for this, I was not aware of it. Running expiry runs manually is  
done by sa-learn --force-expiry, correct?



Yep.


  
In case anybody else comes across the same, I've kicked out the MySQL 
cluster and now using MySQL with multi-master replication. There we can 
use InnoDB and this definitely solved all of the problems I had with 
bayes. Scantimes are now below 1 second. I don't have much load as of 
yet, so I expect this to increase somewhat during business hours, but 
all in all things look a lot more promising. I've used this howto: 
http://capttofu.livejournal.com/1752.html


Thanks for the pointers, Henrik.

Regards,
Jorn



__ Information from ESET NOD32 Antivirus, version of virus signature 
database 4336 (20090814) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com