Tompkins Neil wrote:
Hi
We are looking to upgrade our version of mySQL to the latest version of
mySQL 5.  One of the main features we are going to think about using is
replication for our website data.  Basically we have 2 websites located in
the UK and US which share similar information, and we are going to be using
replication as a way of keeping the data up to date and in sync.

Based on your experiences, is there anything we should be aware of before
investigating this route and putting it into practice ?

I've had to take servers out for bad raid-controllers, bad ram, bad mobos. Disks have been the least of my problems. So make sure your architecture tolerates the ability to take members of your pool out without load-spiking the remaining members. And if you're doing filesystem snapshots from a master to a replicant, you will have to either have policy or extra servers available to maintain your uptime when you interrupt the master to flush all the tables, sync the filesystem and do an LVM snapshot. Innodb would require a shutdown. Don't forget that LVM snapshots are copy-on-write, so when that master comes back up and starts processing modifying tables, you'll get amazing system load on a busy system as your file system starts madly copying extents into the snapshot volume.

Define a procedure for junior staff how to properly down and up a pool member. Like, if you get a disk-full on one member, and it borks replication, what's the step-by-step for a) determining if replication can re-establish after you do a FLUSH LOGS, b) under what conditions do you have to re-copy all data from one master to another because your replication window has expired and your logs have gotten flushed. Your replication binlogs get really big if you're pushing large materialized views regularly via replication, or your servers have fast disks, not enough size to handle a more than a weekend or whole day (for example) of neglect.

Define a procedure for checking your my.cnf files for correct auto-increment-* settings and server-id settings. Junior staff, and even senior staff rarely add more members to the pool, so these settings are often mistaken during a midnight maintenance hour. Procedure for adding members and changing master replication settings is very important. Often your DBA is not racking and changing the equipment.

Make sure that you have a good understanding of what kind of capacity you're growing at. I started a project with two four-core boxes with plenty of 15krpm disk and when they got into production, they regularly spiked to load 20 and 30. Not pretty. Not only had my old architecture refused traffic to lighten the load, my new architecture didn't. My data set was growing so fast my sort-buffer settings for the old servers were too small for new servers. I ended up with four DL380s with 8 cores per box. I really had to scramble to get more servers in there. The addition of two more read-only members really helped, and backups handled by replication to an off-site replicant.

Another load capacity warning: if your traffic is very spiky, and you get high-load conditions, I've seen reset/dropped connections and also plain old connection timeouts. So if you have RAM for 1024 connections, you prolly can't service 1024 connections when you've got table contention and connections from your web-nodes just start failing. If they fail for too long, then you have to do some FLUSH HOSTS to reset connection attempt counters.

I don't know what your application does, but I certainly monitor replication lag. Load spikes can certainly increase lag. I've had to move from single instances of mysql to mysqld_multi and separate databases by replication rate. Your monitoring should also track sql threads. You might need to define procedure on how to deal with pooling-out members that fall too far behind in replication.

I've written an iptables script to block webnode connections but allow sql pool member connections. I use this to take a member out to run table repairs or to lighten the load while it does replication catch-up.

WAN connectivity for replication is interesting! I did site-to-site transfer using stunnel. I had to negotiate weird Cisco 5502 VPN behavior. Copying gigs of myisam files between sites would knock over my vpn so I had to rate-limit using rsync --bwlimit. Bursting bandwidth charges were still brutal, though. Later, we ended up configuring CBQ (search freshmeat.net for cbq-init) on my backup replicant to limit bandwidth so it wouldn't provoke bursting charges.

Jed


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to