On 05/17/2014 12:28 AM, Jeff Janes wrote:
More fun with my torn page injection test program on 9.4.

24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
failed, calculated checksum 21100 but expected 3356
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified, bufpage.c:145
24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
34666 of relation base/16384/16405
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
bufmgr.c:483


I've seen this twice now, the checksum failure was both times for the block
labelled "next" in the redo record.  Is this another case where the block
needs to be reinitialized upon replay?

Hmm, it looks like I fumbled the numbering of the backup blocks in the b-tree split WAL record (in 9.4). I blame the comments; the comments where the record is generated numbers the backup blocks starting from 1, but XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number them starting from 0.

Attached is a patch that I think fixes them. In addition to the rnext-reference, clearing the incomplete-split flag in the child page, had a similar numbering mishap.

- Heikki
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index d64cbd9..ecee5ac 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -1299,7 +1299,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 
 			lastrdata->data = (char *) newitem;
 			lastrdata->len = MAXALIGN(newitemsz);
-			lastrdata->buffer = buf;	/* backup block 1 */
+			lastrdata->buffer = buf;	/* backup block 0 */
 			lastrdata->buffer_std = true;
 		}
 
@@ -1320,7 +1320,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			item = (IndexTuple) PageGetItem(origpage, itemid);
 			lastrdata->data = (char *) item;
 			lastrdata->len = MAXALIGN(IndexTupleSize(item));
-			lastrdata->buffer = buf;	/* backup block 1 */
+			lastrdata->buffer = buf;	/* backup block 0 */
 			lastrdata->buffer_std = true;
 		}
 
@@ -1333,11 +1333,11 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			 * Although we don't need to WAL-log anything on the left page, we
 			 * still need XLogInsert to consider storing a full-page image of
 			 * the left page, so make an empty entry referencing that buffer.
-			 * This also ensures that the left page is always backup block 1.
+			 * This also ensures that the left page is always backup block 0.
 			 */
 			lastrdata->data = NULL;
 			lastrdata->len = 0;
-			lastrdata->buffer = buf;	/* backup block 1 */
+			lastrdata->buffer = buf;	/* backup block 0 */
 			lastrdata->buffer_std = true;
 		}
 
@@ -1353,7 +1353,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			cblkno = BufferGetBlockNumber(cbuf);
 			lastrdata->data = (char *) &cblkno;
 			lastrdata->len = sizeof(BlockNumber);
-			lastrdata->buffer = cbuf;	/* backup block 2 */
+			lastrdata->buffer = cbuf;	/* backup block 1 */
 			lastrdata->buffer_std = true;
 		}
 
@@ -1386,7 +1386,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 
 			lastrdata->data = NULL;
 			lastrdata->len = 0;
-			lastrdata->buffer = sbuf;	/* bkp block 2 (leaf) or 3 (non-leaf) */
+			lastrdata->buffer = sbuf;	/* bkp block 1 (leaf) or 2 (non-leaf) */
 			lastrdata->buffer_std = true;
 		}
 
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 640639c..5f9fc49 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -300,8 +300,8 @@ btree_xlog_split(bool onleft, bool isroot,
 	 */
 	if (!isleaf)
 	{
-		if (record->xl_info & XLR_BKP_BLOCK(2))
-			(void) RestoreBackupBlock(lsn, record, 2, false, false);
+		if (record->xl_info & XLR_BKP_BLOCK(1))
+			(void) RestoreBackupBlock(lsn, record, 1, false, false);
 		else
 			_bt_clear_incomplete_split(lsn, record, xlrec->node, cblkno);
 	}
@@ -439,10 +439,10 @@ btree_xlog_split(bool onleft, bool isroot,
 	if (xlrec->rnext != P_NONE)
 	{
 		/*
-		 * the backup block containing right sibling is 2 or 3, depending
+		 * the backup block containing right sibling is 1 or 2, depending
 		 * whether this was a leaf or internal page.
 		 */
-		int			rnext_index = isleaf ? 2 : 3;
+		int			rnext_index = isleaf ? 1 : 2;
 
 		if (record->xl_info & XLR_BKP_BLOCK(rnext_index))
 			(void) RestoreBackupBlock(lsn, record, rnext_index, false, false);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to