Looks like the problem is the replication write queue was growing
wildly as it can't keep up with the reads. I've checked in a fix that
should prevent this problem at the possible expense of replication
throughput.
-Damien
I've checked in a fix that I think
On Jul 16, 2008, at 1:51 PM, Chris Anderson wrote:
I've started digging into replication issues a little deeper. This is
what I've found so far, replicating a 2GB database into a fresh
(empty) target database.
I'm triggering the replication via curl, so there are no browser/proxy
timeout issues at play, which I've found can muddy the waters a bit.
The replication starts fine, and as I query the target database, I can
see the doc count going up over time. Occasionally the doc-count stays
the same for a few minutes. After that it will sometimes start back
up, or equally likely, I get a nasty crash with output that looks
like: http://friendpaste.com/g7zRzvPc
If I restart replication by rerunning the curl command, it seems to
pick up where it left off just fine, with the doc count moving up
smoothly for a while, before I get another error. Just now, the one I
got wasn't a crasher, just a replication-stopping failure:
http://friendpaste.com/BBfSPEZm But I've seen this error as the first
in a fresh replication, and the crasher coming after a restarted
replication, so I don't think the order is significant.
Next I'll see if I can trigger the problem on a smaller dataset.
--
Chris Anderson
http://jchris.mfdz.com