(2013/03/06 16:50), Heikki Linnakangas wrote:>
Hi,

Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
Attempt patch records minRecoveryPoint.
[crash recovery -> record minRecoveryPoint in control file -> archive
recovery]
I think that this is an original intention of Heikki's patch.

Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, 
and false on error, and the patch makes it return 'true' on error, if archive recovery 
was requested but we're still in crash recovery. The real issue here is that I missed the 
two "return NULL;"s in ReadRecord(), so the code that I put in the 
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find the file at all. 
Attached patch is the proper fix for this.

Thanks for createing patch! I test your patch in 9.2_STABLE, but it does not 
use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop, crash 
recovery loop, and archive recovery loop.
Your patch is not good for promoting Standby to Master. It does not come off 
standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.
I attempt test script which was modifyed Horiuch's script. This script does not 
depend on shell enviroment. It was only needed to fix PGPATH.
Please execute this test script.


I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.

Works for me.. Can you create a test script for that? Remember to set 
"recovery_target_timeline='latest'".
I set recovery_target_timeline=latest. hmm...

Here is my recovery.conf.
mitsu-ko@localhost postgresql]$ cat Standby/recovery.conf
standby_mode = 'yes'
recovery_target_timeline='latest'
primary_conninfo='host=localhost port=65432'
restore_command='cp ../arc/%f %p'
And my system's log message is here.
waiting for server to start....[Standby] LOG:  database system was shut down in 
recovery at 2013-03-07 02:56:05 EST
[Standby] LOG:  restored log file "00000002.history" from archive
cp: cannot stat `../arc/00000003.history': そのようなファイルやディレクトリはありません
[Standby] FATAL:  requested timeline 2 is not a child of database system 
timeline 1
[Standby] LOG:  startup process (PID 20941) exited with exit code 1
[Standby] LOG:  aborting startup due to startup process failure
It can be reproduced in my test script, too.
Last master start command might seem not to exist generally in my test script.
But it is generally that PostgreSQL with Pacemaker system.


Best regards,
--
Mitsumasa KONDO
NTT OSS Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..2486683 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4010,7 +4010,15 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
 retry:
 	/* Read the page containing the record */
 	if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess))
+	{
+		/*
+		 * If archive recovery was requested when crash recovery failed, go to
+		 * the label next_record_is_invalid to switch to archive recovery.
+		 */
+		if (!InArchiveRecovery && ArchiveRecoveryRequested)
+			goto next_record_is_invalid;
 		return NULL;
+	}
 
 	pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);
 	targetRecOff = RecPtr->xrecoff % XLOG_BLCKSZ;
@@ -4168,7 +4176,15 @@ retry:
 			}
 			/* Wait for the next page to become available */
 			if (!XLogPageRead(&pagelsn, emode, false, false))
+		        {
+	        	        /*
+				 * If archive recovery was requested when crash recovery failed, go to
+				 * the label next_record_is_invalid to switch to archive recovery.
+				 */
+				if (!InArchiveRecovery && ArchiveRecoveryRequested)
+					goto next_record_is_invalid;
 				return NULL;
+			}
 
 			/* Check that the continuation record looks valid */
 			if (!(((XLogPageHeader) readBuf)->xlp_info & XLP_FIRST_IS_CONTRECORD))
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..591e8c0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4446,7 +4462,7 @@ readTimeLineHistory(TimeLineID targetTLI)
 	if (targetTLI == 1)
 		return list_make1_int((int) targetTLI);
 
-	if (InArchiveRecovery)
+	if (ArchiveRecoveryRequested)
 	{
 		TLHistoryFileName(histfname, targetTLI);
 		fromArchive =
#! /bin/sh
echo "#### initial settings ####"
PGPATH=`pwd`"/bin"
PATH=$PGPATH:$PATH
PGDATA0="Master"
PGDATA1="Standby"
PGARC="arc"
mkdir $PGARC
PGPORT0=65432
PGPORT1=65433
unset PGPORT
unset PGDATA
echo "Postgresql is \"`which postgres`\""
killall -9 postgres
rm -rf $PGDATA0 $PGDATA1 $PGARC/*
initdb $PGDATA0
cat >> $PGDATA0/postgresql.conf <<EOF
port=$PGPORT0
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p ../$PGARC/%f'
max_wal_senders = 3
hot_standby = on
log_min_messages = debug1
log_line_prefix = '[Master] '
EOF
cat >> $PGDATA0/pg_hba.conf <<EOF
local  replication      all                     trust
host   replication      all     127.0.0.1/32    trust
host   replication      all     ::1/128         trust
EOF
echo "#### Startup master ####"
pg_ctl -D $PGDATA0 -w  start

echo "#### basebackup ####"
pg_basebackup -p $PGPORT0 -F p -X s -D $PGDATA1
cat >> $PGDATA1/recovery.conf <<EOF
standby_mode = 'yes'
recovery_target_timeline='latest'
primary_conninfo='host=localhost port=$PGPORT0'
restore_command='cp ../$PGARC/%f %p'
EOF

cat >> $PGDATA1/postgresql.conf <<EOF
port=$PGPORT1
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p ../$PGARC/%f'
max_wal_senders = 3
hot_standby = on
log_min_messages = debug1
log_line_prefix = '[Standby] '
EOF


echo "#### Startup standby ####"
pg_ctl -D $PGDATA1 start
echo "#### Sleep for 5 seconds ####"
sleep 5

echo "#### Shutdown standby ####"
pg_ctl -D $PGDATA1 -w stop -m f

echo "#### Shutdown master in immediate mode ####"
pg_ctl -D $PGDATA0 -w stop -m i

cat >> $PGDATA0/recovery.conf <<EOF
standby_mode = 'yes'
recovery_target_timeline='latest'
primary_conninfo='host=localhost port=$PGPORT1'
restore_command='cp ../$PGARC/%f %p'
EOF

echo "#### Starting master as a standby, and next promote####"
pg_ctl -w -D $PGDATA0 start
pg_ctl -w -D $PGDATA0 promote

echo "#### Sleep for 5 seconds ####"
sleep 5
echo "#### Was latest .history file archived ??? ####"
sleep 2
echo "#### Starting standby ####"
pg_ctl -w -D $PGDATA1 start
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to