Re: [HACKERS] Online base backup from the hot-standby

Steve Singer Thu, 19 Jan 2012 20:02:52 -0800

On 12-01-17 05:38 AM, Fujii Masao wrote:

On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masao<masao.fu...@gmail.com>  wrote:

The amount of code changes to allow pg_basebackup to make a backup from
the standby seems to be small. So I ended up merging that changes and the
infrastructure patch. WIP patch attached. But I'd happy to split the patch again
if you want.

Attached is the updated version of the patch. I wrote the limitations of
standby-only backup in the document and changed the error messages.

Here is my review of this verison of the patch. I think this patch hasbeen in every CF for 9.2 and I feel it is getting close to beingcommitted. The only issue of significants is a crash I encounteredwhile testing, see below.

I am fine with including the pg_basebackup changes in the patch it alsomakes testing some of the other changes possible.



The documentation updates you have are good

I don't see any issues looking at the code.



Testing Review
--------------------------------

I encountered this on my first replica (the one based on the master). Iam not sure if it is related to this patch, it happened after thepg_basebackup against the replica finished.

TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File:"twophase.c", Line: 1238)

LOG:  startup process (PID 12222) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

A little earlier this postmaster had printed.

LOG:  restored log file "00000001000000000000001F" from archive
LOG:  restored log file "000000010000000000000020" from archive

cp: cannot stat`/usr/local/pgsql92git/archive/000000010000000000000021': No such fileor directory

LOG:  unexpected pageaddr 0/19000000 in log file 0, segment 33, offset 0

cp: cannot stat`/usr/local/pgsql92git/archive/000000010000000000000021': No such fileor directory

I have NOT been able to replicate this error and I am not sure exactlywhat I had done in my testing prior to that point.



In another test run I had

- set full page writes=off and did a checkpoint
- Started the pg_basebackup

- set full_page_writes=on and did a HUP + some database activity thatmight have forced a checkpoint.


I got this message from pg_basebackup.
./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
pg_basebackup: could not get WAL end position from server

I point this out because the message is different than the normal "couldnot initiate base backup: FATAL: WAL generated withfull_page_writes=off" thatI normally see. We might want to add aPQerrorMessage(conn)) to pg_basebackup to print the error details.Since this patch didn't actually change pg_basebackup I don't think yourrequired to improve the error messages in it. I am just mentioning thisbecause it came up in testing.

The rest of the tests I did involving changing full_page_writeswith/without checkpoints and sighups and promoting the replica seemed towork as expected.

Regards,

Re: [HACKERS] Online base backup from the hot-standby

Reply via email to