Awesome, I'll give that a shot John.
On Fri, Aug 16, 2013 at 8:39 AM, John DeSoi <de...@pgedit.com> wrote: > > On Aug 15, 2013, at 1:07 PM, Andrew Berman <rexx...@gmail.com> wrote: > > > I'm having an issue where streaming replication just randomly stops > working. I haven't been able to find anything in the logs which point to > an issue, but the Postgres process shows a "waiting" status on the slave: > > > > postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54 > postgres: startup process recovering 000000010000053D0000003F waiting > > postgres 5642 0.0 21.4 3428356 2613252 ? Ss Aug14 0:30 > postgres: writer process > > postgres 5659 0.0 0.0 177524 788 ? Ss Aug14 0:03 > postgres: stats collector process > > postgres 7159 1.2 0.1 3451360 18352 ? Ss Aug14 17:31 > postgres: wal receiver process streaming 549/216B3730 > > > > The replication works great for days, but randomly seems to lock up and > replication halts. I verified that the two databases were out of sync with > a query on both of them. Has anyone experienced this issue before? > > > > Here are some relevant config settings: > > > > Master: > > > > wal_level = hot_standby > > checkpoint_segments = 32 > > checkpoint_completion_target = 0.9 > > archive_mode = on > > archive_command = 'rsync -a %p foo@foo:/var/lib/pgsql/9.1/wals/%f > </dev/null' > > max_wal_senders = 2 > > wal_keep_segments = 32 > > I recently posted about the same thing -- replication just stops after > working OK for days or weeks, no errors in the logs on master or slave. > > It appears I solved it by adding --timeout=30 to my rsync command. My > guess was some kind of network hang and then rsync would just wait forever > and never return. > > John DeSoi, Ph.D. > >