Hello,

We're running PG 8.3 in a warm standby configuration.  About 3 weeks ago we
had to fail over from the primary to the standby.  That worked fine, but
we're having problems getting standby mode set up again.  On the standby,
everything works fine for a little while: WALs are rsynced over and seem to
be getting processed correctly.  But every 65-75 minutes (very regularly), a
WAL file is copied that's actually a symlink.  When the standby tries to
read an rsynced symlink, it hangs indefinitely, presumably because the
target of the link doesn't exist on the standby.

In the primary's pg_xlog, I see the expected WAL files with increasing
numbers and recent modification dates, but every 65-75 files there's one of
these symlinks. For example:

Sep 28 16:13 0000000300000A5C00000070
Sep 28 16:15 0000000300000A5C00000071
Sep 28 16:12 0000000300000A5C00000072
Sep  5 01:00 0000000300000A5C00000073 ->
/srv/db/chdbprod_wal_archives/00000001000009D6000000D6
Sep 28 16:21 0000000300000A5C00000074
Sep 28 16:19 0000000300000A5C00000075

The "/srv/db/chdbprod_wal_archives" directory is where incoming WAL files
used to go, back when the current primary server was the standby.  The
September 5 date you see above is shortly before the failover was done.  The
target of the symlinks is always the same.

pg_xlog also contains a 00000003.history file, which references the target
of the symlinks.  Here's its contents:

1       00000001000009D6000000D6        before transaction 0 at 2000-01-01
00:00:00+00

I gather that my problems here are due to having a primary server that was
itself formerly a standby, but I'm not sure what action to take.  I don't
know enough about how the history files work and what the significance of
the symlinks is.  What purpose to the symlinks serve?  Why are they
recreated regularly at slighly more than hourly intervals?  Why do they
point to a directory that was only used back when the primary was a
standby?  (If it makes any difference, back when the primary server was a
standby, it was running pg_standby with the -l option.)  Does their presence
mean that something's wrong on the primary, or should they be ignored when
copying to the standby?

Thanks in advance for any information!
Chris

Reply via email to