Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-10 Thread Kyotaro Horiguchi
At Wed, 09 Sep 2020 14:40:37 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
> correctly for RelationBuildDesc().
> 
> Both changes look good to me.

Thank you for committing this!

regards.
-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-08 Thread Kyotaro Horiguchi
Thank you for the checking.

At Tue, 8 Sep 2020 21:05:43 -0700, Noah Misch  wrote in 
> On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:
> After looking closer, I've moved the test to reindex_catalog.sql; see that
> file's header comment for the reasons.  One now needs this command:
> 
>   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 
> 'max_wal_senders = 0' >/tmp/minimal.conf
>   make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf

I missed the file. It is surely the best place.

> RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
> can stop calling RelFileNodeSkippingWAL().  The attached version makes it so,
> and I plan to push it.

Sure. relNode is filled with zeros so RelationInitPhysicalAddr() works
correctly for RelationBuildDesc().

Both changes look good to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-08 Thread Noah Misch
On Mon, Sep 07, 2020 at 07:47:09PM -0700, Noah Misch wrote:
> On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:
> > At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in 
> > > At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch  wrote 
> > > in 
> > > > As a PoC, this looks promising.  Thanks.  Would you add a test case 
> > > > such that
> > > > the following demonstrates the bug in the absence of your PoC?
> > > > 
> > > >   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 
> > > > 'max_wal_senders = 0' >/tmp/minimal.conf
> > > >   make check TEMP_CONFIG=/tmp/minimal.conf
> > > 
> > > Mmm. I was close to add some tests to 018_wal_optimize.pl but your
> > > suggestion seems better.  I added several ines to create_index.sql.

After looking closer, I've moved the test to reindex_catalog.sql; see that
file's header comment for the reasons.  One now needs this command:

  printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 
'max_wal_senders = 0' >/tmp/minimal.conf
  make check-tests TESTS=reindex_catalog TEMP_CONFIG=/tmp/minimal.conf

> > > > Please have the test try both a nailed-and-mapped relation and a 
> > > > "nailed, but
> > > > not mapped" relation.  I am fairly confident that your PoC fixes the 
> > > > former
> > > > case, but the latter may need additional code.
> > > 
> > > Mmm. You're right. I choosed pg_amproc_fam_proc_index as
> > > nailed-but-not-mapped index.
> > 
> > I fixed a typo (s/staring/starting/).
> 
> At a glance, this looks reasonable.  If a closer look doesn't reveal problems,
> I'll push this.

RelationBuildDesc() calls RelationInitPhysicalAddr(), so RelationBuildDesc()
can stop calling RelFileNodeSkippingWAL().  The attached version makes it so,
and I plan to push it.
Author: Noah Misch 
Commit: Noah Misch 

Fix rd_firstRelfilenodeSubid for nailed relations, in parallel workers.

Move applicable code out of RelationBuildDesc(), which nailed relations
bypass.  Non-assert builds experienced no known problems.  Back-patch to
v13, where commit c6b92041d38512a4176ed76ad06f713d2e6c01a8 introduced
rd_firstRelfilenodeSubid.

Kyotaro Horiguchi.  Reported by Justin Pryzby.

Discussion: https://postgr.es/m/20200907023737.ga7...@telsasoft.com

diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index 96ecad0..9061af8 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1243,14 +1243,6 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
if (insertIt)
RelationCacheInsert(relation, true);
 
-   /*
-* For RelationNeedsWAL() to answer correctly on parallel workers, 
restore
-* rd_firstRelfilenodeSubid.  No subtransactions start or end while in
-* parallel mode, so the specific SubTransactionId does not matter.
-*/
-   if (IsParallelWorker() && RelFileNodeSkippingWAL(relation->rd_node))
-   relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
-
/* It's fully valid */
relation->rd_isvalid = true;
 
@@ -1273,6 +1265,8 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+   Oid oldnode = relation->rd_node.relNode;
+
/* these relations kinds never have storage */
if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
return;
@@ -1330,6 +1324,19 @@ RelationInitPhysicalAddr(Relation relation)
elog(ERROR, "could not find relation mapping for 
relation \"%s\", OID %u",
 RelationGetRelationName(relation), 
relation->rd_id);
}
+
+   /*
+* For RelationNeedsWAL() to answer correctly on parallel workers, 
restore
+* rd_firstRelfilenodeSubid.  No subtransactions start or end while in
+* parallel mode, so the specific SubTransactionId does not matter.
+*/
+   if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+   {
+   if (RelFileNodeSkippingWAL(relation->rd_node))
+   relation->rd_firstRelfilenodeSubid = 
TopSubTransactionId;
+   else
+   relation->rd_firstRelfilenodeSubid = 
InvalidSubTransactionId;
+   }
 }
 
 /*
diff --git a/src/test/regress/expected/reindex_catalog.out 
b/src/test/regress/expected/reindex_catalog.out
index 4b5fba4..204f056 100644
--- a/src/test/regress/expected/reindex_catalog.out
+++ b/src/test/regress/expected/reindex_catalog.out
@@ -36,3 +36,13 @@ REINDEX INDEX pg_index_indexrelid_index; -- non-mapped, 
non-shared, critical
 REINDEX INDEX pg_index_indrelid_index; -- non-mapped, non-shared, non-critical
 REINDEX INDEX pg_database_oid_index; -- mapped, shared, critical
 REINDEX INDEX pg_shdescription_o_c_index; -- mapped, shared, non-critical
+-- Check the same REINDEX INDEX statements under parallelism.

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-07 Thread Noah Misch
On Tue, Sep 08, 2020 at 10:43:32AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch  wrote in 
> > > As a PoC, this looks promising.  Thanks.  Would you add a test case such 
> > > that
> > > the following demonstrates the bug in the absence of your PoC?
> > > 
> > >   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 
> > > 'max_wal_senders = 0' >/tmp/minimal.conf
> > >   make check TEMP_CONFIG=/tmp/minimal.conf
> > 
> > Mmm. I was close to add some tests to 018_wal_optimize.pl but your
> > suggestion seems better.  I added several ines to create_index.sql.
> > 
> > > Please have the test try both a nailed-and-mapped relation and a "nailed, 
> > > but
> > > not mapped" relation.  I am fairly confident that your PoC fixes the 
> > > former
> > > case, but the latter may need additional code.
> > 
> > Mmm. You're right. I choosed pg_amproc_fam_proc_index as
> > nailed-but-not-mapped index.
> 
> I fixed a typo (s/staring/starting/).

At a glance, this looks reasonable.  If a closer look doesn't reveal problems,
I'll push this.




Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-07 Thread Kyotaro Horiguchi
At Tue, 08 Sep 2020 09:13:53 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch  wrote in 
> > As a PoC, this looks promising.  Thanks.  Would you add a test case such 
> > that
> > the following demonstrates the bug in the absence of your PoC?
> > 
> >   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 
> > 'max_wal_senders = 0' >/tmp/minimal.conf
> >   make check TEMP_CONFIG=/tmp/minimal.conf
> 
> Mmm. I was close to add some tests to 018_wal_optimize.pl but your
> suggestion seems better.  I added several ines to create_index.sql.
> 
> > Please have the test try both a nailed-and-mapped relation and a "nailed, 
> > but
> > not mapped" relation.  I am fairly confident that your PoC fixes the former
> > case, but the latter may need additional code.
> 
> Mmm. You're right. I choosed pg_amproc_fam_proc_index as
> nailed-but-not-mapped index.

I fixed a typo (s/staring/starting/).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 44512f8672395dc40a001eb92b7ffca060eb411d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 7 Sep 2020 20:30:30 +0900
Subject: [PATCH v2] Fix assertion failure during reindex while
 wal_level=minimal.

When wal_level=minimal, nailed relations are forgotten to set
rd_firstRelfilenodeSubid correctly on parallel workers. Fix it.
---
 src/backend/utils/cache/relcache.c | 16 
 src/test/regress/expected/create_index.out |  5 +
 src/test/regress/sql/create_index.sql  |  6 ++
 3 files changed, 27 insertions(+)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..8a61fd674e 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1273,6 +1273,8 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+	Oid oldnode = relation->rd_node.relNode;
+
 	/* these relations kinds never have storage */
 	if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
 		return;
@@ -1330,6 +1332,20 @@ RelationInitPhysicalAddr(Relation relation)
 			elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
  RelationGetRelationName(relation), relation->rd_id);
 	}
+
+	/*
+	 * Our leader process might have replaced storage before starting this
+	 * worker. We need to set rd_firstRelfilenodeSubid according to pending
+	 * sync hash in that case. Currently the information is not actually used
+	 * in worker processes, but make things tidy.
+	 */
+	if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+	{
+		if (RelFileNodeSkippingWAL(relation->rd_node))
+			relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+		else
+			relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+	}
 }
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 64c0c66859..66ec1465e1 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2732,6 +2732,11 @@ REINDEX TABLE pg_toast.pg_toast_1260;
 ERROR:  must be owner of table pg_toast_1260
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 ERROR:  must be owner of index pg_toast_1260_index
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;	-- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index; 		-- nailed but not mapped
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 37f7259da9..6b104a2389 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1191,6 +1191,12 @@ SET SESSION ROLE regress_reindexuser;
 REINDEX TABLE pg_toast.pg_toast_1260;
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;	-- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index; 		-- nailed but not mapped
+
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
-- 
2.18.4



Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-07 Thread Kyotaro Horiguchi
At Mon, 7 Sep 2020 02:32:55 -0700, Noah Misch  wrote in 
> On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in 
> > > The cause is that the worker had received pending-sync entry correctly
> > > but not never created a relcache entry for the relation using
> > > RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
> > > set.
> > > 
> > > I'm investigating it.
> > 
> > Relcaches are loaded from a file with old content at parallel worker
> > startup. The relcache entry is corrected by invalidation at taking a
> > lock but pending syncs are not considered.
> > 
> > Since parallel workers don't access the files so we can just ignore
> > the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
> > invalidation, as attached PoC patch.
> 
> > [patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
> > rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
> > RelationBuildDesc() would do.]
> 
> As a PoC, this looks promising.  Thanks.  Would you add a test case such that
> the following demonstrates the bug in the absence of your PoC?
> 
>   printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 
> 'max_wal_senders = 0' >/tmp/minimal.conf
>   make check TEMP_CONFIG=/tmp/minimal.conf

Mmm. I was close to add some tests to 018_wal_optimize.pl but your
suggestion seems better.  I added several ines to create_index.sql.

> Please have the test try both a nailed-and-mapped relation and a "nailed, but
> not mapped" relation.  I am fairly confident that your PoC fixes the former
> case, but the latter may need additional code.

Mmm. You're right. I choosed pg_amproc_fam_proc_index as
nailed-but-not-mapped index.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From c4d53acd64011f1ed5652177923d010f4c50fa89 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 7 Sep 2020 20:30:30 +0900
Subject: [PATCH] Fix assertion failure during reindex while wal_level=minimal.

When wal_level=minimal, nailed relations are forgotten to set
rd_firstRelfilenodeSubid correctly on parallel workers. Fix it.
---
 src/backend/utils/cache/relcache.c | 16 
 src/test/regress/expected/create_index.out |  5 +
 src/test/regress/sql/create_index.sql  |  6 ++
 3 files changed, 27 insertions(+)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..cd6c8bbc70 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1273,6 +1273,8 @@ RelationBuildDesc(Oid targetRelId, bool insertIt)
 static void
 RelationInitPhysicalAddr(Relation relation)
 {
+	Oid oldnode = relation->rd_node.relNode;
+
 	/* these relations kinds never have storage */
 	if (!RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
 		return;
@@ -1330,6 +1332,20 @@ RelationInitPhysicalAddr(Relation relation)
 			elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
  RelationGetRelationName(relation), relation->rd_id);
 	}
+
+	/*
+	 * Our leader process might have replaced storage before staring this
+	 * worker. We need to set rd_firstRelfilenodeSubid according to pending
+	 * sync hash in that case. Currently the information is not actually used
+	 * in worker processes, but make things tidy.
+	 */
+	if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+	{
+		if (RelFileNodeSkippingWAL(relation->rd_node))
+			relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+		else
+			relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+	}
 }
 
 /*
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 814416d936..6468418639 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2598,6 +2598,11 @@ REINDEX TABLE pg_toast.pg_toast_1260;
 ERROR:  must be owner of table pg_toast_1260
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 ERROR:  must be owner of index pg_toast_1260_index
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;	-- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index; 		-- nailed but not mapped
 -- Clean up
 RESET ROLE;
 REVOKE USAGE ON SCHEMA pg_toast FROM regress_reindexuser;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index f3667bacdc..d9cb3f14f6 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1115,6 +1115,12 @@ SET SESSION ROLE regress_reindexuser;
 REINDEX TABLE pg_toast.pg_toast_1260;
 REINDEX INDEX pg_toast.pg_toast_1260_index;
 
+-- parallel reindex
+RESET ROLE;
+SET min_parallel_table_scan_size = 0;
+REINDEX INDEX pg_attribute_relid_attnum_index;	-- nailed and mapped
+REINDEX INDEX pg_amproc_fam_proc_index; 		-- nailed but not mapped
+
 -- Clean up
 RESET ROLE;
 

Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-07 Thread Noah Misch
On Mon, Sep 07, 2020 at 05:40:36PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > The cause is that the worker had received pending-sync entry correctly
> > but not never created a relcache entry for the relation using
> > RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
> > set.
> > 
> > I'm investigating it.
> 
> Relcaches are loaded from a file with old content at parallel worker
> startup. The relcache entry is corrected by invalidation at taking a
> lock but pending syncs are not considered.
> 
> Since parallel workers don't access the files so we can just ignore
> the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
> invalidation, as attached PoC patch.

> [patch: When RelationInitPhysicalAddr() handles a mapped relation, re-fill
> rd_firstRelfilenodeSubid from RelFileNodeSkippingWAL(), like
> RelationBuildDesc() would do.]

As a PoC, this looks promising.  Thanks.  Would you add a test case such that
the following demonstrates the bug in the absence of your PoC?

  printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 
'max_wal_senders = 0' >/tmp/minimal.conf
  make check TEMP_CONFIG=/tmp/minimal.conf

Please have the test try both a nailed-and-mapped relation and a "nailed, but
not mapped" relation.  I am fairly confident that your PoC fixes the former
case, but the latter may need additional code.




Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-07 Thread Kyotaro Horiguchi
At Mon, 07 Sep 2020 13:45:28 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> The cause is that the worker had received pending-sync entry correctly
> but not never created a relcache entry for the relation using
> RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
> set.
> 
> I'm investigating it.

Relcaches are loaded from a file with old content at parallel worker
startup. The relcache entry is corrected by invalidation at taking a
lock but pending syncs are not considered.

Since parallel workers don't access the files so we can just ignore
the assertion safely, but I want to rd_firstRelfilenodeSubid flag at
invalidation, as attached PoC patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..d66e914345 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1322,10 +1322,27 @@ RelationInitPhysicalAddr(Relation relation)
 	}
 	else
 	{
+		Oid oldnode = relation->rd_node.relNode;
+
 		/* Consult the relation mapper */
 		relation->rd_node.relNode =
 			RelationMapOidToFilenode(relation->rd_id,
 	 relation->rd_rel->relisshared);
+
+		/*
+		 * The leader process might have replaced storage before staring this
+		 * worker. We need to set rd_firstRelfilenodeSubid according to pending
+		 * sync hash in that case. Currently the information is not actually
+		 * used in worker processes, but make things tidy.
+		 */
+		if (IsParallelWorker() && oldnode != relation->rd_node.relNode)
+		{
+			if (RelFileNodeSkippingWAL(relation->rd_node))
+relation->rd_firstRelfilenodeSubid = TopSubTransactionId;
+			else
+relation->rd_firstRelfilenodeSubid = InvalidTransactionId;
+		}
+
 		if (!OidIsValid(relation->rd_node.relNode))
 			elog(ERROR, "could not find relation mapping for relation \"%s\", OID %u",
  RelationGetRelationName(relation), relation->rd_id);


Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-06 Thread Kyotaro Horiguchi
At Mon, 07 Sep 2020 00:33:03 -0400, Tom Lane  wrote in 
> Justin Pryzby  writes:
> > This is easily reproduced, at least on pg_attribute

Thanks for the repro.

> Yeah, duplicated here.  The assertion (not SIGSEGV) is in code
> added by c6b92041d, so it seems clearly an open issue for v13.
> 
> I did not check to see what happens in a non-assert build.

The cause is that the worker had received pending-sync entry correctly
but not never created a relcache entry for the relation using
RelationBuildDesc. So the rd_firstRelfilenodeSubid is not correctly
set.

I'm investigating it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-06 Thread Tom Lane
Justin Pryzby  writes:
> This is easily reproduced, at least on pg_attribute

Yeah, duplicated here.  The assertion (not SIGSEGV) is in code
added by c6b92041d, so it seems clearly an open issue for v13.

I did not check to see what happens in a non-assert build.

regards, tom lane




Re: v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-06 Thread Justin Pryzby
This is easily reproduced, at least on pg_attribute

[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/initdb -D pgsql13.dat
[pryzbyj@localhost ~]$ /usr/pgsql-13/bin/postgres -D pgsql13.dat -c 
logging_collector=off -c port=5678 -c unix_socket_directories=/tmp -c 
wal-level=minimal -c max_wal_senders=0&
[pryzbyj@localhost ~]$ psql -h /tmp -p 5678 postgres -c "SET 
min_parallel_table_scan_size=0" -c "SET client_min_messages=debug" -c "REINDEX 
INDEX pg_attribute_relid_attnum_index"
SET
SET
DEBUG:  building index "pg_attribute_relid_attnum_index" on table 
"pg_attribute" with request for 1 parallel worker
TRAP: FailedAssertion("relcache_verdict == 
RelFileNodeSkippingWAL(relation->rd_node)", File: "relcache.c", Line: 2976)
postgres: parallel worker for PID 9637 (ExceptionalCondition+0x66)[0x921d86]
postgres: parallel worker for PID 9637 
(AssertPendingSyncs_RelationCache+0x1db)[0x91a08b]
postgres: parallel worker for PID 9637 (smgrDoPendingSyncs+0x71)[0x58e591]
postgres: parallel worker for PID 9637 [0x53b1a4]
postgres: parallel worker for PID 9637 
(EndParallelWorkerTransaction+0x16)[0x53b826]
postgres: parallel worker for PID 9637 (ParallelWorkerMain+0x437)[0x52fcf7]
postgres: parallel worker for PID 9637 (StartBackgroundWorker+0x25a)[0x76047a]
postgres: parallel worker for PID 9637 [0x76d88d]
postgres: parallel worker for PID 9637 [0x76e43e]
/lib64/libpthread.so.0(+0xf630)[0x7f46d4b47630]
/lib64/libc.so.6(__select+0x13)[0x7f46d26ba983]
postgres: parallel worker for PID 9637 [0x4887bc]
postgres: parallel worker for PID 9637 (PostmasterMain+0x1165)[0x76fb45]
postgres: parallel worker for PID 9637 (main+0x70d)[0x48a83d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f46d25e7555]

-- 
Justin




v13: CLUSTER segv with wal_level=minimal and parallel index creation

2020-09-06 Thread Justin Pryzby
Following a bulk load, a CLUSTER command run by a maintenance script crashed.
This is currently reproducible on that instance, so please suggest if I can
provide more info.

< 2020-09-06 15:44:16.369 MDT  >LOG:  background worker "parallel worker" (PID 
2576) was terminated by signal 6: Aborted
< 2020-09-06 15:44:16.369 MDT  >DETAIL:  Failed process was running: CLUSTER 
pg_attribute USING pg_attribute_relid_attnam_index

The crash happens during:
ts=# REINDEX INDEX pg_attribute_relid_attnum_index;
..but not:
ts=# REINDEX INDEX pg_attribute_relid_attnam_index ;

 pg_catalog | pg_attribute_relid_attnam_index | index | postgres | pg_attribute 
| permanent   | 31 MB | 
 pg_catalog | pg_attribute_relid_attnum_index | index | postgres | pg_attribute 
| permanent   | 35 MB | 

I suspect
|commit c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal.

In fact, I set wal_level=minimal for the bulk load.  Note also:
 override | data_checksums  | on
 configuration file   | checkpoint_timeout  | 60
 configuration file   | maintenance_work_mem| 1048576
 configuration file   | max_wal_senders | 0
 configuration file   | wal_compression | on
 configuration file   | wal_level   | minimal
 configuration file   | fsync   | off
 configuration file   | full_page_writes| off
 default  | server_version  | 13beta3

(gdb) bt
#0  0x7ffad387 in raise () from /lib64/libc.so.6
#1  0x7ffaea78 in abort () from /lib64/libc.so.6
#2  0x00921da5 in ExceptionalCondition 
(conditionName=conditionName@entry=0xad4078 "relcache_verdict == 
RelFileNodeSkippingWAL(relation->rd_node)", errorType=errorType@entry=0x977f49 
"FailedAssertion", 
fileName=fileName@entry=0xad3068 "relcache.c", 
lineNumber=lineNumber@entry=2976) at assert.c:67
#3  0x0091a08b in AssertPendingSyncConsistency 
(relation=0x7ff99c2a70b8) at relcache.c:2976
#4  AssertPendingSyncs_RelationCache () at relcache.c:3036
#5  0x0058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, 
isParallelWorker=isParallelWorker@entry=true) at storage.c:685
#6  0x0053b1a4 in CommitTransaction () at xact.c:2118
#7  0x0053b826 in EndParallelWorkerTransaction () at xact.c:5300
#8  0x0052fcf7 in ParallelWorkerMain (main_arg=) at 
parallel.c:1479
#9  0x0076047a in StartBackgroundWorker () at bgworker.c:813
#10 0x0076d88d in do_start_bgworker (rw=0x23ac110) at postmaster.c:5865
#11 maybe_start_bgworkers () at postmaster.c:6091
#12 0x0076e43e in sigusr1_handler (postgres_signal_arg=) 
at postmaster.c:5260
#13 
#14 0x7ff999a6c983 in __select_nocancel () from /lib64/libc.so.6
#15 0x004887bc in ServerLoop () at postmaster.c:1691
#16 0x0076fb45 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x237d280) at postmaster.c:1400
#17 0x0048a83d in main (argc=3, argv=0x237d280) at main.c:210

(gdb) bt f
...
#4  AssertPendingSyncs_RelationCache () at relcache.c:3036
status = {hashp = 0x23cba50, curBucket = 449, curEntry = 0x0}
locallock = 
rels = 0x23ff018
maxrels = 
nrels = 0
idhentry = 
i = 
#5  0x0058e591 in smgrDoPendingSyncs (isCommit=isCommit@entry=true, 
isParallelWorker=isParallelWorker@entry=true) at storage.c:685
pending = 
nrels = 0
maxrels = 0
srels = 0x0
scan = {hashp = 0x23edf60, curBucket = 9633000, curEntry = 0xe01600 
}
pendingsync = 
#6  0x0053b1a4 in CommitTransaction () at xact.c:2118
s = 0xe01600 
latestXid = 
is_parallel_worker = true
__func__ = "CommitTransaction"