all, I recently completed upgrading the core database pool for our site from 4.0.18 (32-bit) to 5.0.27 (64-bit) but am now experiencing intermittent replication instability.
we replicate ~20M DMLs/day across 18 DB nodes in three datacenters. about once/week I'm getting a 2013 error (error reading packet from server) but only on the two slaves whose master is in a different datacenter (never once among intra-datacenter nodes). this would make me suspicious of the network (at least WAN links/devices) except this never happened once in two years w/4.0.18. when it happens I am able to fix it by doing a slave stop/change master (to last execute)/slave start but I would like to find the root of the problem. is anyone aware of any reported replication stability issues w/5.0.27? are their any my.cnf parameters I can change to minimize the frequency? does this sound like a network issue and if so why did 4.0.18 not fail in this way? it's not critical at this point but it's extremely annoying so any advice would be appreciated...