does a file named 0000000100000000000000CD exist anywhere on your disk? -lee
On Fri, Feb 27, 2009 at 2:47 AM, Brad Wiemerslage <wieme...@yahoo.com> wrote: > > I'm attempting to get warm standby up and running with a pair of servers > running ubuntu 8.04 and postgresql 8.3. Been following the docs: > > http://www.postgresql.org/docs/8.3/static/warm-standby.html > http://www.postgresql.org/docs/current/static/pgstandby.html > > Also, basically following the ideas here in this blog post: > > http://scale-out-blog.blogspot.com/2009/02/simple-ha-with-postgresql-point-in-time.html > > I've customized the original script he refers to in the article, which is > here in its entirety for reference: > > https://s3.amazonaws.com/extras.continuent.com/standby.sh > > Here is the meat of my customized script, which runs on the standby. The > postgresql server on the standby is stopped first. > > start_backup="SELECT pg_start_backup('my_backup');" > stop_backup="SELECT pg_stop_backup();" > echo "$start_backup" | $psql -h$PRIMARY -U myuser -d mydb -e > rsync --delete -avz -e "ssh -i /path/to/key" myu...@$primary:$PG_DATA/ > $PG_DATA > echo "$stop_backup" | $psql -h $PRIMARY -U myuser -d mydb -e > > The files seem to copied over to the standby machine just fine. Success is > reported with respect to the backup commands. Permissions seem fine. > > Next, there are some steps which blow out some files. As I understand it, > you no longer need the files on the standby that were in pg_xlog on the > primary. > > rm -f $PG_DATA/recovery.* > rm -f $PG_DATA/8.3/main/logfile > rm -f $PG_DATA/8.3/main/postmaster.pid > rm -f $PG_DATA/8.3/main/pg_xlog/0* > rm -f $PG_DATA/8.3/main/pg_xlog/archive_status/0* > > This step seems to work fine. > > Then, the archives are pulled. They are pulled to /mnt/postgresql_archives > with this command: > > rsync --delete -avz -e "ssh -i /path/to/key" myu...@$primary:$PG_ARCHIVES/ > $PG_ARCHIVES > > Everything looks good. I end up with an up to date list of WAL files in > /mnt/postgresql_archives on the standby. Here is a listing: > > r...@standby:/mnt/postgresql_archives# ls > total 688996 > drwxr-xr-x 2 postgres postgres 4096 2009-02-27 01:13 . > drwxr-xr-x 14 root root 4096 2009-02-27 01:17 .. > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:19 > 0000000100000000000000CB > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:29 > 0000000100000000000000CC > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:38 > 0000000100000000000000CD > -rw-rw---- 1 postgres postgres 245 2009-02-27 00:38 > 0000000100000000000000CD.00000020.backup > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:48 > 0000000100000000000000CE > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:54 > 0000000100000000000000CF > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 00:58 > 0000000100000000000000D0 > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 01:01 > 0000000100000000000000D1 > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 01:03 > 0000000100000000000000D2 > -rw-rw---- 1 postgres postgres 16777216 2009-02-27 01:13 > 0000000100000000000000D3 > > Then, the recovery.conf is put in place. I've tried two different versions, > which end up giving me the same error. Here are the two different versions. > > #1: restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -c -d -s 2 -t > /mnt/postgresql_archives/pgsql.trigger /mnt/postgresql_archives %f %p >> > /mnt/postgresql_archives/standby.log 1>&2' > > #2: restore_command = 'cp /mnt/server/archivedir/%f "%p"' > > I don't believe that #2 is suitable for warm standby, but just tried it to > debug after #1 wouldn't work. Now, I try to start up the server. For it to > work in standby mode, additional archive files will be pulled from the > primary machine on a periodic basis. I'm using this command, which deletes > them on the primary when they are no longer necessary. It also seems to work > fine. > > rsync -avz -e "ssh -i /path/to/key" myu...@$primary:$PG_ARCHIVES/ $PG_ARCHIVES > > I guess I'm a little confused about exactly what is happening here when the > server comes up, but here is the error message I'm getting. It seems to be > looking for the files in pg_pxlog, which is cleared out. So, the error makes > sense. But isn't it supposed to be looking in /mnt/postgresql_archives per > the restore_command(s)? The files are available there. > > 2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,2,,2009-02-27 01:26:52 > EST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000100000000000000CD"" > (log file 0, segment 20 > 5): No such file or directory",,,,,,,, > 2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,3,,2009-02-27 01:26:52 > EST,,0,LOG,00000,"invalid checkpoint record",,,,,,,, > 2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,4,,2009-02-27 01:26:52 > EST,,0,PANIC,XX000,"could not locate required checkpoint record",,"If you are > not restoring from a > backup, try removing the file > ""/var/lib/postgresql/8.3/main/backup_label"".",,,,,, > 2009-02-27 01:26:52.868 EST,,,7419,,49a787ab.1cfb,1,,2009-02-27 01:26:51 > EST,,0,LOG,00000,"startup process (PID 7422) was terminated by signal 6: > Aborted",,,,,,,, > 2009-02-27 01:26:52.868 EST,,,7419,,49a787ab.1cfb,2,,2009-02-27 01:26:51 > EST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,, > > So, I tried copying the files in the /mnt/postgresql_archives over to > pg_xlog. This seemed to work, and the updates were applied. Never at any > point did I get a recovery.done file. Also, for whatever reason, I was never > able to get any debug info from pg_standby in standby.log. > > Anyhow, I've burned up a couple days trying to figure this out. Any help > would be much appreciated. > > Thanks, > Brad > > > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin > -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin