Re: [HACKERS] 9.4 checksum error in recovery with btree index

2014-05-21 Thread Michael Paquier
On Tue, May 20, 2014 at 5:10 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 What would be a good way of doing that?  Mostly I've just been sharing it
 on google drive when I find something.  Should I fork the mirror from
 github (https://github.com/postgres/postgres)?  Forking the entire
 codebase just to have a 200 line patch and 2 driving scripts seems
 excessive, but I guess that that is the git philosophy.  Or is there a
 better way than that?

When forking a repository for development purposes, usually the whole
repository is not necessary: push only the master branch that you update
from time to time and create separate branches for each development patch.
If people would like to have a look at your patch, they would simply need
to add a remote link to your fork, fetching only the diffs introduced by
your additional branch(es) and then to checkout the branch on which your
patch is. This has the advantage to let the user merge himself the code
with newer code and not getting failures when applying a patch that will
get chunks of code rejected over time.

My 2c.
-- 
Michael


Re: [HACKERS] 9.4 checksum error in recovery with btree index

2014-05-19 Thread Heikki Linnakangas

On 05/18/2014 06:30 AM, Jeff Janes wrote:

On Saturday, May 17, 2014, Heikki Linnakangas hlinnakan...@vmware.com
wrote:


On 05/17/2014 12:28 AM, Jeff Janes wrote:


More fun with my torn page injection test program on 9.4.

24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
failed, calculated checksum 21100 but expected 3356
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified,
bufpage.c:145
24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
34666 of relation base/16384/16405
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
bufmgr.c:483


I've seen this twice now, the checksum failure was both times for the
block
labelled next in the redo record.  Is this another case where the block
needs to be reinitialized upon replay?



Hmm, it looks like I fumbled the numbering of the backup blocks in the
b-tree split WAL record (in 9.4). I blame the comments; the comments where
the record is generated numbers the backup blocks starting from 1, but
XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number them
starting from 0.

Attached is a patch that I think fixes them. In addition to the
rnext-reference, clearing the incomplete-split flag in the child page, had
a similar numbering mishap.



The seems to have fixed it.


Okay, thanks, committed.

Your torn-page generator seems to be very good at finding bugs - any 
chance you could publish it?


I wonder if it could've caught the similar mishap in the clearing of the 
incomplete-split flag. I think you'd a checkpoint to begin in the very 
narrow window between splitting a page and inserting the parent pointer.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 checksum error in recovery with btree index

2014-05-19 Thread Jeff Janes
On Mon, May 19, 2014 at 3:32 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 05/18/2014 06:30 AM, Jeff Janes wrote:

 On Saturday, May 17, 2014, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:



 The seems to have fixed it.


 Okay, thanks, committed.

 Your torn-page generator seems to be very good at finding bugs - any
 chance you could publish it?


What would be a good way of doing that?  Mostly I've just been sharing it
on google drive when I find something.  Should I fork the mirror from
github (https://github.com/postgres/postgres)?  Forking the entire codebase
just to have a 200 line patch and 2 driving scripts seems excessive, but I
guess that that is the git philosophy.  Or is there a better way than that?

This is a recent one on google drive here:
https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHMusp=sharing

It is the one that I used for testing gin indexes, not the one that was at
issue here.



 I wonder if it could've caught the similar mishap in the clearing of the
 incomplete-split flag. I think you'd a checkpoint to begin in the very
 narrow window between splitting a page and inserting the parent pointer.


I could re-introduce that bug and see if it would be found, but I've
already moved on to testing fk (which is what I was trying to test when I
found the btree bug) and I don't think I preserved the original code, so I
don't know how effective it would be as a test.  (I guess that speaks in
favor of using git to record the source code as of each interesting point.)

Cheers,

Jeff


Re: [HACKERS] 9.4 checksum error in recovery with btree index

2014-05-17 Thread Heikki Linnakangas

On 05/17/2014 12:28 AM, Jeff Janes wrote:

More fun with my torn page injection test program on 9.4.

24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
failed, calculated checksum 21100 but expected 3356
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified, bufpage.c:145
24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
34666 of relation base/16384/16405
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
bufmgr.c:483


I've seen this twice now, the checksum failure was both times for the block
labelled next in the redo record.  Is this another case where the block
needs to be reinitialized upon replay?


Hmm, it looks like I fumbled the numbering of the backup blocks in the 
b-tree split WAL record (in 9.4). I blame the comments; the comments 
where the record is generated numbers the backup blocks starting from 1, 
but XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number 
them starting from 0.


Attached is a patch that I think fixes them. In addition to the 
rnext-reference, clearing the incomplete-split flag in the child page, 
had a similar numbering mishap.


- Heikki
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index d64cbd9..ecee5ac 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -1299,7 +1299,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 
 			lastrdata-data = (char *) newitem;
 			lastrdata-len = MAXALIGN(newitemsz);
-			lastrdata-buffer = buf;	/* backup block 1 */
+			lastrdata-buffer = buf;	/* backup block 0 */
 			lastrdata-buffer_std = true;
 		}
 
@@ -1320,7 +1320,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			item = (IndexTuple) PageGetItem(origpage, itemid);
 			lastrdata-data = (char *) item;
 			lastrdata-len = MAXALIGN(IndexTupleSize(item));
-			lastrdata-buffer = buf;	/* backup block 1 */
+			lastrdata-buffer = buf;	/* backup block 0 */
 			lastrdata-buffer_std = true;
 		}
 
@@ -1333,11 +1333,11 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			 * Although we don't need to WAL-log anything on the left page, we
 			 * still need XLogInsert to consider storing a full-page image of
 			 * the left page, so make an empty entry referencing that buffer.
-			 * This also ensures that the left page is always backup block 1.
+			 * This also ensures that the left page is always backup block 0.
 			 */
 			lastrdata-data = NULL;
 			lastrdata-len = 0;
-			lastrdata-buffer = buf;	/* backup block 1 */
+			lastrdata-buffer = buf;	/* backup block 0 */
 			lastrdata-buffer_std = true;
 		}
 
@@ -1353,7 +1353,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 			cblkno = BufferGetBlockNumber(cbuf);
 			lastrdata-data = (char *) cblkno;
 			lastrdata-len = sizeof(BlockNumber);
-			lastrdata-buffer = cbuf;	/* backup block 2 */
+			lastrdata-buffer = cbuf;	/* backup block 1 */
 			lastrdata-buffer_std = true;
 		}
 
@@ -1386,7 +1386,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 
 			lastrdata-data = NULL;
 			lastrdata-len = 0;
-			lastrdata-buffer = sbuf;	/* bkp block 2 (leaf) or 3 (non-leaf) */
+			lastrdata-buffer = sbuf;	/* bkp block 1 (leaf) or 2 (non-leaf) */
 			lastrdata-buffer_std = true;
 		}
 
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 640639c..5f9fc49 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -300,8 +300,8 @@ btree_xlog_split(bool onleft, bool isroot,
 	 */
 	if (!isleaf)
 	{
-		if (record-xl_info  XLR_BKP_BLOCK(2))
-			(void) RestoreBackupBlock(lsn, record, 2, false, false);
+		if (record-xl_info  XLR_BKP_BLOCK(1))
+			(void) RestoreBackupBlock(lsn, record, 1, false, false);
 		else
 			_bt_clear_incomplete_split(lsn, record, xlrec-node, cblkno);
 	}
@@ -439,10 +439,10 @@ btree_xlog_split(bool onleft, bool isroot,
 	if (xlrec-rnext != P_NONE)
 	{
 		/*
-		 * the backup block containing right sibling is 2 or 3, depending
+		 * the backup block containing right sibling is 1 or 2, depending
 		 * whether this was a leaf or internal page.
 		 */
-		int			rnext_index = isleaf ? 2 : 3;
+		int			rnext_index = isleaf ? 1 : 2;
 
 		if (record-xl_info  XLR_BKP_BLOCK(rnext_index))
 			(void) RestoreBackupBlock(lsn, record, rnext_index, false, false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 checksum error in recovery with btree index

2014-05-17 Thread Jeff Janes
On Saturday, May 17, 2014, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 On 05/17/2014 12:28 AM, Jeff Janes wrote:

 More fun with my torn page injection test program on 9.4.

 24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
 failed, calculated checksum 21100 but expected 3356
 24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
 192
 24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified,
 bufpage.c:145
 24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
 34666 of relation base/16384/16405
 24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
 192
 24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
 bufmgr.c:483


 I've seen this twice now, the checksum failure was both times for the
 block
 labelled next in the redo record.  Is this another case where the block
 needs to be reinitialized upon replay?


 Hmm, it looks like I fumbled the numbering of the backup blocks in the
 b-tree split WAL record (in 9.4). I blame the comments; the comments where
 the record is generated numbers the backup blocks starting from 1, but
 XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number them
 starting from 0.

 Attached is a patch that I think fixes them. In addition to the
 rnext-reference, clearing the incomplete-split flag in the child page, had
 a similar numbering mishap.


The seems to have fixed it.

Thanks,

Jeff


[HACKERS] 9.4 checksum error in recovery with btree index

2014-05-16 Thread Jeff Janes
More fun with my torn page injection test program on 9.4.

24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
failed, calculated checksum 21100 but expected 3356
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified, bufpage.c:145
24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
34666 of relation base/16384/16405
24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
192
24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
bufmgr.c:483


I've seen this twice now, the checksum failure was both times for the block
labelled next in the redo record.  Is this another case where the block
needs to be reinitialized upon replay?

Cheers,

Jeff