Andreas Seltenreich wrote:
> I wrote:
> 
> > Re-fuzzing now with your patch applied.
> 
> This so far yielded three BRIN core dumps on different machines with the
> same backtraces.  Crash recovery fails in these cases.
> 
> I've put the data directory here (before recovery):
> 
>     http://ansel.ydns.eu/~andreas/brincrash2-spidew.tar.xz (9.1M)
> 
> Corresponding backtraces of the backend and startup core files below.

Ah, of course.  These two crashes are two different bugs: the xlog one
is because I forgot to attach the new PageAddItem() flag to the
corresponding replay code; and the one in regular running is because I
neglected to have the patched PageAddItem() compute pd_lower correctly,
which in effect left pages as if empty.

I came up with a simple-minded test program (attached) which reproduces
the reported crashes in a couple of seconds; it can be improved further
for stronger BRIN testing, but at least with this I am confident that
the crashes at hand are gone.  If you can re-run sqlsmith and see if you
can find different bugs, I'd appreciate it.

I'm not going to push this just yet, in order to give others time to
comment on the new PageAddItemFlags API I'm adding.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 6fa8e8ce3812dd6be2ae962ffe076a06482906c9
Author:     Alvaro Herrera <alvhe...@alvh.no-ip.org>
AuthorDate: Fri May 27 18:14:01 2016 -0400
CommitDate: Fri May 27 18:14:01 2016 -0400

    Fix PageAddItem BRIN bug
    
    BRIN was relying on the ability of removing a tuple from an index page,
    then putting it back again later; but the PageAddItem interface doesn't
    actually allow this to happen for tuples beyond the first free one past
    the last used item.  To fix, create a new function PageAddItemFlags
    which is like PageAddItem except that it has a flags bitmap; the
    "overwrite" and "is_heap" boolean flags in the original become
    PAI_OVERWITE and PAI_IS_HEAP flags, and a new flag
    PAI_ALLOW_LARGE_OFFSET enables the behavior required by BRIN.
    PageAddItem() remains with the original signature, for compatibility
    with third-party modules (and other callers in core code are not
    modified, either).
    
    I also added a new error check in brinGetTupleForHeapBlock to raise an
    error if an offset found in the revmap is not actually marked as live by
    the page header, which would have detected this sooner (and, in any
    case, react with a "corrupted BRIN index" error, rather than a hard
    crash, suggesting a REINDEX.)
    
    Reported by Andreas Seltenreich as detected by his handy sqlsmith
    fuzzer.

diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index d0ca485..4b1afce 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -179,8 +179,8 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
 
 		START_CRIT_SECTION();
 		PageIndexDeleteNoCompact(oldpage, &oldoff, 1);
-		if (PageAddItem(oldpage, (Item) newtup, newsz, oldoff, true,
-						false) == InvalidOffsetNumber)
+		if (PageAddItemFlags(oldpage, (Item) newtup, newsz, oldoff,
+							 PAI_OVERWRITE | PAI_ALLOW_LARGE_OFFSET) == InvalidOffsetNumber)
 			elog(ERROR, "failed to add BRIN tuple");
 		MarkBufferDirty(oldbuf);
 
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 812f76c..853181b 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -272,6 +272,10 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
 		/* If we land on a revmap page, start over */
 		if (BRIN_IS_REGULAR_PAGE(page))
 		{
+			if (*off > PageGetMaxOffsetNumber(page))
+				ereport(ERROR,
+						(errcode(ERRCODE_INDEX_CORRUPTED),
+						 errmsg_internal("corrupted BRIN index: inconsistent range map")));
 			lp = PageGetItemId(page, *off);
 			if (ItemIdIsUsed(lp))
 			{
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index deb7af4..3103775 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -193,7 +193,8 @@ brin_xlog_samepage_update(XLogReaderState *record)
 			elog(PANIC, "brin_xlog_samepage_update: invalid max offset number");
 
 		PageIndexDeleteNoCompact(page, &offnum, 1);
-		offnum = PageAddItem(page, (Item) brintuple, tuplen, offnum, true, false);
+		offnum = PageAddItemFlags(page, (Item) brintuple, tuplen, offnum,
+								  PAI_OVERWRITE | PAI_ALLOW_LARGE_OFFSET);
 		if (offnum == InvalidOffsetNumber)
 			elog(PANIC, "brin_xlog_samepage_update: failed to add tuple");
 
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 0189bc9..d227d4b 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -153,12 +153,12 @@ PageIsVerified(Page page, BlockNumber blkno)
 
 
 /*
- *	PageAddItem
+ *	PageAddItemFlags
  *
  *	Add an item to a page.  Return value is offset at which it was
  *	inserted, or InvalidOffsetNumber if there's not room to insert.
  *
- *	If overwrite is true, we just store the item at the specified
+ *	If flag PAI_OVERWRITE is set, we just store the item at the specified
  *	offsetNumber (which must be either a currently-unused item pointer,
  *	or one past the last existing item).  Otherwise,
  *	if offsetNumber is valid and <= current max offset in the page,
@@ -167,18 +167,20 @@ PageIsVerified(Page page, BlockNumber blkno)
  *	If offsetNumber is not valid, then assign one by finding the first
  *	one that is both unused and deallocated.
  *
- *	If is_heap is true, we enforce that there can't be more than
+ *	If flag PAI_IS_HEAP is set, we enforce that there can't be more than
  *	MaxHeapTuplesPerPage line pointers on the page.
  *
+ *	If flag PAI_ALLOW_LARGE_OFFSET is not set, we disallow placing items
+ *	beyond one past the last existing item.
+ *
  *	!!! EREPORT(ERROR) IS DISALLOWED HERE !!!
  */
 OffsetNumber
-PageAddItem(Page page,
-			Item item,
-			Size size,
-			OffsetNumber offsetNumber,
-			bool overwrite,
-			bool is_heap)
+PageAddItemFlags(Page page,
+				 Item item,
+				 Size size,
+				 OffsetNumber offsetNumber,
+				 int flags)
 {
 	PageHeader	phdr = (PageHeader) page;
 	Size		alignedSize;
@@ -209,7 +211,7 @@ PageAddItem(Page page,
 	if (OffsetNumberIsValid(offsetNumber))
 	{
 		/* yes, check it */
-		if (overwrite)
+		if ((flags & PAI_OVERWRITE) != 0)
 		{
 			if (offsetNumber < limit)
 			{
@@ -257,13 +259,18 @@ PageAddItem(Page page,
 		}
 	}
 
-	if (offsetNumber > limit)
+	/*
+	 * Reject placing items beyond the first unused line pointer, unless
+	 * caller asked for that behavior specifically.
+	 */
+	if (((flags & PAI_ALLOW_LARGE_OFFSET) == 0) && offsetNumber > limit)
 	{
 		elog(WARNING, "specified item offset is too large");
 		return InvalidOffsetNumber;
 	}
 
-	if (is_heap && offsetNumber > MaxHeapTuplesPerPage)
+	/* Reject placing items beyond heap boundary, if heap */
+	if (((flags & PAI_IS_HEAP) != 0) && offsetNumber > MaxHeapTuplesPerPage)
 	{
 		elog(WARNING, "can't put more than MaxHeapTuplesPerPage items in a heap page");
 		return InvalidOffsetNumber;
@@ -275,7 +282,10 @@ PageAddItem(Page page,
 	 * Note: do arithmetic as signed ints, to avoid mistakes if, say,
 	 * alignedSize > pd_upper.
 	 */
-	if (offsetNumber == limit || needshuffle)
+	if ((flags & PAI_ALLOW_LARGE_OFFSET) != 0)
+		lower = Max(phdr->pd_lower,
+					SizeOfPageHeaderData + sizeof(ItemIdData) * offsetNumber);
+	else if (offsetNumber == limit || needshuffle)
 		lower = phdr->pd_lower + sizeof(ItemIdData);
 	else
 		lower = phdr->pd_lower;
@@ -324,6 +334,36 @@ PageAddItem(Page page,
 }
 
 /*
+ *	PageAddItem, compatibility layer on top of PageAddItem.
+ *
+ *	Add an item to a page.  Return value is offset at which it was
+ *	inserted, or InvalidOffsetNumber if there's not room to insert.
+ *
+ *	If overwrite is true, we just store the item at the specified
+ *	offsetNumber (which must be either a currently-unused item pointer,
+ *	or one past the last existing item).  Otherwise,
+ *	if offsetNumber is valid and <= current max offset in the page,
+ *	insert item into the array at that position by shuffling ItemId's
+ *	down to make room.
+ *	If offsetNumber is not valid, then assign one by finding the first
+ *	one that is both unused and deallocated.
+ *
+ *	If is_heap is true, we enforce that there can't be more than
+ *	MaxHeapTuplesPerPage line pointers on the page.
+ */
+OffsetNumber
+PageAddItem(Page page, Item item, Size size, OffsetNumber offsetNumber,
+			bool overwrite, bool is_heap)
+{
+	int		flags = 0;
+
+	flags |= overwrite ? PAI_OVERWRITE : 0;
+	flags |= is_heap ? PAI_IS_HEAP : 0;
+
+	return PageAddItemFlags(page, item, size, offsetNumber, flags);
+}
+
+/*
  * PageGetTempPage
  *		Get a temporary page in local memory for special processing.
  *		The returned page is not initialized at all; caller must do that.
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 63053d4..5ca1db7 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -407,11 +407,16 @@ do { \
  *		extern declarations
  * ----------------------------------------------------------------
  */
+#define PAI_OVERWRITE			(1 << 0)
+#define PAI_IS_HEAP				(1 << 1)
+#define PAI_ALLOW_LARGE_OFFSET	(1 << 2)
 
 extern void PageInit(Page page, Size pageSize, Size specialSize);
 extern bool PageIsVerified(Page page, BlockNumber blkno);
 extern OffsetNumber PageAddItem(Page page, Item item, Size size,
 			OffsetNumber offsetNumber, bool overwrite, bool is_heap);
+extern OffsetNumber PageAddItemFlags(Page page, Item item, Size size,
+				 OffsetNumber offsetNumber, int flags);
 extern Page PageGetTempPage(Page page);
 extern Page PageGetTempPageCopy(Page page);
 extern Page PageGetTempPageCopySpecial(Page page);
#!/usr/bin/perl

use warnings;
use strict;

use DBI;

my ($conn, $insert, $update);

$conn = DBI->connect('dbi:Pg:dbname=alvherre port=55432 host=/tmp', '', '', {AutoCommit => 1});
$conn->{pg_prepare_now} = 1;

$conn->do("drop table if exists brintest");
$conn->do("create table brintest (id serial, a integer, b text, c text)");
$conn->do("create index brinidx on brintest using brin (a, b, c) with (pages_per_range=1)");

$insert = $conn->prepare("insert into brintest (a, b, c) values (?, ?, 'this is a very long text intended to make each brin tuple a bit larger than it should really be, because only because I am an annoying person') returning id");
$update = $conn->prepare("update brintest set a = a + ?, b = ? where id = ?");

my $total = 100000;
my $i = 0;
my $lastval;

my @letters = ("a" .. "z");

$insert->execute(int(rand(1000)), 'foo');
$lastval = $insert->fetch()->[0];

my $thresh = 1;
my $maxlength = 4;
while ($i++ < $total) {
	my $prob = rand(1);

	if ($prob < $thresh) {
		$insert->execute(int(rand(1000)), 'foo');
		$lastval = $insert->fetch()->[0];
		$thresh *= 0.99;
	} else {
		my $length = rand($maxlength);
		$maxlength++ if $length >= $maxlength / 2;
		my $b = "";
		while ($length-- > 0) {
			$b = $b . $letters[rand $#letters];
		}

		$update->execute(
			int(rand(100)), 
			$b,
			int(rand($lastval + 1)));
	}

	if ($prob > 0.95) {
		$conn->do('vacuum brintest');
	}
}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to