date:20200812

On Thu, Aug 13, 2020 at 01:29:35AM -0400, Tom Lane wrote:
> Well, "you get a compiler warning" isn't a reason to consider the version
> unsupported.  There are probably going to be a few other warnings you get
> when building on an ancient platform --- as long as it works, I think
> we're fine.  So based on this, no objection, and I think no need to
> change our statement about what's supported.

Okay, thanks for confirming.  Let's do so then, I'll just wait a bit
to see if there are more comments from others.
--
Michael


signature.asc
Description: PGP signature

Re: security_context_t marked as deprecated in libselinux 3.1

Michael Paquier  writes:
> On Wed, Aug 12, 2020 at 10:50:21PM -0400, Tom Lane wrote:
>> Ummm ... aren't you going to get some cast-away-const warnings now?

> Let me see..  The function signatures we use have been visibly changed
> in 9eb9c932, which comes down to a point between 2.2.2 and 2.3, and
> there are two of them we care about, both use now "const char *":
> - security_check_context_raw()
> - security_compute_create_name_raw()

OK, it's all good then.

> Based on this information, what if we increased the minimum support to
> 2.3 then?  That's a release from 2014, and maintaining such legacy
> code does not seem much worth the effort IMO.

Well, "you get a compiler warning" isn't a reason to consider the version
unsupported.  There are probably going to be a few other warnings you get
when building on an ancient platform --- as long as it works, I think
we're fine.  So based on this, no objection, and I think no need to
change our statement about what's supported.

regards, tom lane

Re: run pgindent on a regular basis / scripted manner

2020-08-12 Thread Jesse Zhang

On Wed, Aug 12, 2020 at 10:14 PM Tom Lane wrote:
>
> Noah Misch  writes:
> > ... Another advantage of master-only is a guarantee against
> > disrupting time-critical patches.  (It would be ugly to push back branches 
> > and
> > sort out the master push later, but it doesn't obstruct the mission.)
>
> Hm, doesn't it?  I had the idea that "git push" is atomic --- either all
> the per-branch commits succeed, or they all fail.  I might be wrong.

As of Git 2.28, atomic pushes are not turned on by default. That means
"git push my-remote foo bar" _can_ result in partial success. I'm that
paranoid who does "git push --atomic my-remote foo bar fizz".

Cheers,
Jesse

Re: security_context_t marked as deprecated in libselinux 3.1

On Wed, Aug 12, 2020 at 10:50:21PM -0400, Tom Lane wrote:
> Ummm ... aren't you going to get some cast-away-const warnings now?
> Or are all of the called functions declared as taking "const char *"
> not just "char *"?

Let me see..  The function signatures we use have been visibly changed
in 9eb9c932, which comes down to a point between 2.2.2 and 2.3, and
there are two of them we care about, both use now "const char *":
- security_check_context_raw()
- security_compute_create_name_raw()
We claim in the docs that the minimum version of libselinux supported
is 2.1.10 (7a86fe1a from march 2012).

Then, the only buildfarm animal I know of testing selinux is
rhinoceros, that uses CentOS 7.1, and this visibly already bundles
libselinux 2.5 that was released in 2016 (2b69984), per the RPM list
here:
http://mirror.centos.org/centos/7/
Joe, what's the version of libselinux used in rhinoceros?  2.5?

Based on this information, what if we increased the minimum support to
2.3 then?  That's a release from 2014, and maintaining such legacy
code does not seem much worth the effort IMO.
--
Michael


signature.asc
Description: PGP signature

Re: run pgindent on a regular basis / scripted manner

On Thu, Aug 13, 2020 at 01:14:33AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > ... Another advantage of master-only is a guarantee against
> > disrupting time-critical patches.  (It would be ugly to push back branches 
> > and
> > sort out the master push later, but it doesn't obstruct the mission.)
> 
> Hm, doesn't it?  I had the idea that "git push" is atomic --- either all
> the per-branch commits succeed, or they all fail.  I might be wrong.

Atomicity is good.  I just meant that you could issue something like "git push
origin $(cd .git/refs/heads && ls REL*)" to defer the complaint about master.

Re: run pgindent on a regular basis / scripted manner

Noah Misch  writes:
> ... Another advantage of master-only is a guarantee against
> disrupting time-critical patches.  (It would be ugly to push back branches and
> sort out the master push later, but it doesn't obstruct the mission.)

Hm, doesn't it?  I had the idea that "git push" is atomic --- either all
the per-branch commits succeed, or they all fail.  I might be wrong.

regards, tom lane

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

2020-08-12 Thread Pavel Stehule

Hi

čt 13. 8. 2020 v 6:31 odesílatel Mikhail Titov  napsal:

> Hello!
>
> According to the docs[1], one may use DEFAULT keyword while inserting
> into generated columns (stored and identity). However, currently it
> works only for a single VALUES sublist with DEFAULT for a generated column
> but not for the case when VALUES RTE is used. This is not being tested
> and it is broken.
>
> I am attaching two patches. One for tests and another one with the
> proposed changes based on ideas from Andrew on IRC. So if all good there
> goes the credit where credit is due. If patch is no good, then it is
> likely my misunderstanding how to put words into code :-)
>
> This is my only second patch to PostgreSQL (the first one was rejected)
> so don't be too harsh :-) It may not be perfect but I am open for a
> feedback and this is just to get the ball rolling and to let the
> community know about this issue.
>
> Before you ask why would I want to insert DEFAULTs ... well, there are
> ORMs[2] that still need to be patched and current situation contradicts
> documentation[1].
>

please, assign your patch to commitfest application

https://commitfest.postgresql.org/29/

Regards

Pavel


> Footnotes:
> [1]  https://www.postgresql.org/docs/devel/ddl-generated-columns.html
>
> [2]  https://github.com/rails/rails/pull/39368#issuecomment-670351379
>
> --
> Mikhail
>

Re: Parallel query hangs after a smart shutdown is issued

Thomas Munro  writes:
> Makes sense.  I tested this version on a primary and a replica and
> verified that parallel workers launch, but I saw that autovacuum
> workers still can't start without something like this:

> @@ -2463,7 +2463,8 @@ canAcceptConnections(int backend_type)
>  * be returned until we have checked for too many children.
>  */
> if (smartShutState != SMART_NORMAL_USAGE &&
> -   backend_type != BACKEND_TYPE_BGWORKER)
> +   backend_type != BACKEND_TYPE_BGWORKER &&
> +   backend_type != BACKEND_TYPE_AUTOVAC)

Hmmm ... maybe that should be more like

if (smartShutState != SMART_NORMAL_USAGE &&
backend_type == BACKEND_TYPE_NORMAL)

(the adjacent comment needs adjustment too of course).

regards, tom lane

Re: Parallel query hangs after a smart shutdown is issued

On Thu, Aug 13, 2020 at 2:37 PM Tom Lane  wrote:
> I experimented with separating the shutdown-in-progress state into a
> separate variable, letting us actually reduce not increase the number of
> pmStates.  This way, PM_RUN and other states still apply until we're
> ready to pull the shutdown trigger, so that we don't need to complicate
> state-based decisions about launching auxiliary processes.  This patch
> also unifies the signal-sending for the smart and fast shutdown paths,
> which seems like a nice improvement.  I kind of like this, though I'm not
> in love with the particular variable name I used here (smartShutState).

Makes sense.  I tested this version on a primary and a replica and
verified that parallel workers launch, but I saw that autovacuum
workers still can't start without something like this:

@@ -2463,7 +2463,8 @@ canAcceptConnections(int backend_type)
 * be returned until we have checked for too many children.
 */
if (smartShutState != SMART_NORMAL_USAGE &&
-   backend_type != BACKEND_TYPE_BGWORKER)
+   backend_type != BACKEND_TYPE_BGWORKER &&
+   backend_type != BACKEND_TYPE_AUTOVAC)
{
if (smartShutState == SMART_SUPERUSER_ONLY)
result = CAC_WAITBACKUP;/* allow
superusers only */
@@ -2471,7 +2472,8 @@ canAcceptConnections(int backend_type)
return CAC_SHUTDOWN;/* shutdown is pending */
}
if (pmState != PM_RUN &&
-   backend_type != BACKEND_TYPE_BGWORKER)
+   backend_type != BACKEND_TYPE_BGWORKER &&
+   backend_type != BACKEND_TYPE_AUTOVAC)
{
if (Shutdown > NoShutdown)
return CAC_SHUTDOWN;/* shutdown is pending */

Re: Switch to multi-inserts for pg_depend

On Wed, Aug 12, 2020 at 07:52:42PM -0400, Alvaro Herrera wrote:
> Yeah.  As I understand, the only reason to have this number is to avoid
> an arbitrarily large number of entries created as a single multi-insert
> WAL record ... but does that really ever happen?  I guess if you create
> a table with some really complicated schema you might get, say, a
> hundred pg_depend rows at once.  But to fill eight complete pages of
> pg_depend entries sounds astoundingly ridiculous already -- I'd say it's
> just an easy way to spell "infinity" for this.  Tweaking one infinity
> value to become some other infinity value sounds useless.
> 
> So I agree with what Andres said.  Let's have just one such define and
> be done with it.

Okay.  Would src/include/catalog/catalog.h be a suited location for
this flag or somebody has a better idea?
--
Michael


signature.asc
Description: PGP signature

REINDEX SCHEMA/DATABASE/SYSTEM weak with dropped relations

Hi all,

While working on support for REINDEX for partitioned relations, I have
noticed an old bug in the logic of ReindexMultipleTables(): the list
of relations to process is built in a first transaction, and then each
table is done in an independent transaction, but we don't actually
check that the relation still exists when doing the real work.  I
think that a complete fix involves two things:
1) Addition of one SearchSysCacheExists1() at the beginning of the
loop processing each item in the list in ReindexMultipleTables().
This would protect from a relation dropped, but that would not be
enough if ReindexMultipleTables() is looking at a relation dropped
before we lock it in reindex_table() or ReindexRelationConcurrently().
Still that's simple, cheaper, and would protect from most problems.
2) Be completely water-proof and adopt a logic close to what we do for
VACUUM where we try to open a relation, but leave if it does not
exist.  This can be achieved with just some try_relation_open() calls
with the correct lock used, and we also need to have a new
REINDEXOPT_* flag to control this behavior conditionally.

2) and 1) are complementary, but 2) is invasive, so based on the lack
of complaints we have seen that does not seem really worth
backpatching to me, and I think that we could also just have 1) on
stable branches as a minimal fix, to take care of most of the
problems that could show up to users.

Attached is a patch to fix all that, with a cheap isolation test that
fails on HEAD with a cache lookup failure.  I am adding that to the
next CF for now, and I would rather fix this issue before moving on
with REINDEX for partitioned relations as fixing this issue reduces
the use of session locks for partition trees. 

Any thoughts?  
--
Michael
diff --git a/src/include/access/table.h b/src/include/access/table.h
index cf0ef7b337..68dc4d62c0 100644
--- a/src/include/access/table.h
+++ b/src/include/access/table.h
@@ -22,6 +22,7 @@ extern Relation table_open(Oid relationId, LOCKMODE lockmode);
 extern Relation table_openrv(const RangeVar *relation, LOCKMODE lockmode);
 extern Relation table_openrv_extended(const RangeVar *relation,
 	  LOCKMODE lockmode, bool missing_ok);
+extern Relation try_table_open(Oid relationId, LOCKMODE lockmode);
 extern void table_close(Relation relation, LOCKMODE lockmode);
 
 #endif			/* TABLE_H */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 151bcdb7ef..f6ffee7e53 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3351,6 +3351,7 @@ typedef struct ConstraintsSetStmt
 /* Reindex options */
 #define REINDEXOPT_VERBOSE (1 << 0) /* print progress info */
 #define REINDEXOPT_REPORT_PROGRESS (1 << 1) /* report pgstat progress */
+#define REINDEXOPT_MISSING_OK (2 << 1) /* skip missing relations */
 
 typedef enum ReindexObjectType
 {
diff --git a/src/backend/access/table/table.c b/src/backend/access/table/table.c
index 1aa01a54b3..7c29091e6c 100644
--- a/src/backend/access/table/table.c
+++ b/src/backend/access/table/table.c
@@ -57,6 +57,40 @@ table_open(Oid relationId, LOCKMODE lockmode)
 	return r;
 }
 
+
+/* 
+ *		try_table_open - open a table relation by relation OID
+ *
+ *		Same as table_open, except return NULL instead of failing
+ *		if the relation does not exist.
+ * 
+ */
+Relation
+try_table_open(Oid relationId, LOCKMODE lockmode)
+{
+	Relation	r;
+
+	r = try_relation_open(relationId, lockmode);
+
+	/* leave if table does not exist */
+	if (!r)
+		return NULL;
+
+	if (r->rd_rel->relkind == RELKIND_INDEX ||
+		r->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is an index",
+		RelationGetRelationName(r;
+	else if (r->rd_rel->relkind == RELKIND_COMPOSITE_TYPE)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is a composite type",
+		RelationGetRelationName(r;
+
+	return r;
+}
+
 /* 
  *		table_openrv - open a table relation specified
  *		by a RangeVar node
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1be27eec52..4e7b3eb6a2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3423,8 +3423,20 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 	 * Open and lock the parent heap relation.  ShareLock is sufficient since
 	 * we only need to be sure no schema or data changes are going on.
 	 */
-	heapId = IndexGetRelation(indexId, false);
-	heapRelation = table_open(heapId, ShareLock);
+	heapId = IndexGetRelation(indexId,
+			  (options & REINDEXOPT_MISSING_OK) != 0);
+	/* if relation is missing, leave */
+	if (!OidIsValid(heapId))
+		return;
+
+	if ((options & REINDEXOPT_MISSING_OK) != 0)
+		heapRelation = try_table_open(heapId, ShareLock);
+	else
+		heapRelation = table_open(heapId, ShareLock);
+
+	/* if relation is gone, leave */
+	if

[bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

2020-08-12 Thread Mikhail Titov

Hello!

According to the docs[1], one may use DEFAULT keyword while inserting
into generated columns (stored and identity). However, currently it
works only for a single VALUES sublist with DEFAULT for a generated column
but not for the case when VALUES RTE is used. This is not being tested
and it is broken.

I am attaching two patches. One for tests and another one with the
proposed changes based on ideas from Andrew on IRC. So if all good there
goes the credit where credit is due. If patch is no good, then it is
likely my misunderstanding how to put words into code :-)

This is my only second patch to PostgreSQL (the first one was rejected)
so don't be too harsh :-) It may not be perfect but I am open for a
feedback and this is just to get the ball rolling and to let the
community know about this issue.

Before you ask why would I want to insert DEFAULTs ... well, there are
ORMs[2] that still need to be patched and current situation contradicts
documentation[1].

Footnotes:
[1]  https://www.postgresql.org/docs/devel/ddl-generated-columns.html

[2]  https://github.com/rails/rails/pull/39368#issuecomment-670351379

--
Mikhail
From d606c4f952a6080dff6fb621ea034bfce2865f7b Mon Sep 17 00:00:00 2001
From: Mikhail Titov 
Date: Wed, 12 Aug 2020 22:42:37 -0500
Subject: [PATCH 1/2] Test DEFAULT in VALUES RTE for generated columns

---
 src/test/regress/expected/generated.out |  9 +
 src/test/regress/expected/identity.out  | 13 +
 src/test/regress/sql/generated.sql  |  4 
 src/test/regress/sql/identity.sql   |  1 +
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index 7ccc3c65ed..31525ef2a6 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -90,6 +90,15 @@ CREATE TABLE gtest_err_8 (a int PRIMARY KEY, b int GENERATED BY DEFAULT AS (a *
 ERROR:  for a generated column, GENERATED ALWAYS must be specified
 LINE 1: ...E gtest_err_8 (a int PRIMARY KEY, b int GENERATED BY DEFAULT...
  ^
+-- test VALUES RTE with defaults
+INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT);
+SELECT * FROM gtest0;
+ a | b
+---+
+ 1 | 55
+ 2 | 55
+(2 rows)
+
 INSERT INTO gtest1 VALUES (1);
 INSERT INTO gtest1 VALUES (2, DEFAULT);
 INSERT INTO gtest1 VALUES (3, 33);  -- error
diff --git a/src/test/regress/expected/identity.out b/src/test/regress/expected/identity.out
index 7ac9df767f..ca27b7ff73 100644
--- a/src/test/regress/expected/identity.out
+++ b/src/test/regress/expected/identity.out
@@ -76,6 +76,7 @@ INSERT INTO itest3 DEFAULT VALUES;
 INSERT INTO itest3 DEFAULT VALUES;
 INSERT INTO itest4 DEFAULT VALUES;
 INSERT INTO itest4 DEFAULT VALUES;
+INSERT INTO itest4 (a, b) VALUES (DEFAULT, 'foo'), (DEFAULT, 'bar');
 SELECT * FROM itest1;
  a | b
 ---+---
@@ -99,10 +100,12 @@ SELECT * FROM itest3;

 SELECT * FROM itest4;
  a | b
+---
+---+-
  1 |
  2 |
-(2 rows)
+ 3 | foo
+ 4 | bar
+(4 rows)

 -- VALUES RTEs
 INSERT INTO itest3 VALUES (DEFAULT, 'a');
@@ -211,11 +214,13 @@ ALTER TABLE itest4 ALTER COLUMN a DROP NOT NULL;
 INSERT INTO itest4 DEFAULT VALUES;
 SELECT * FROM itest4;
  a | b
+---
+---+-
  1 |
  2 |
+ 3 | foo
+ 4 | bar
|
-(3 rows)
+(5 rows)

 -- check that sequence is removed
 SELECT sequence_name FROM itest4_a_seq;
diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql
index 4cff1279c7..18bb1d3c78 100644
--- a/src/test/regress/sql/generated.sql
+++ b/src/test/regress/sql/generated.sql
@@ -40,6 +40,10 @@ CREATE TABLE gtest_err_7d (a int PRIMARY KEY, b int GENERATED ALWAYS AS (generat
 -- GENERATED BY DEFAULT not allowed
 CREATE TABLE gtest_err_8 (a int PRIMARY KEY, b int GENERATED BY DEFAULT AS (a * 2) STORED);

+-- test VALUES RTE with defaults
+INSERT INTO gtest0 VALUES (1, DEFAULT), (2, DEFAULT);
+SELECT * FROM gtest0;
+
 INSERT INTO gtest1 VALUES (1);
 INSERT INTO gtest1 VALUES (2, DEFAULT);
 INSERT INTO gtest1 VALUES (3, 33);  -- error
diff --git a/src/test/regress/sql/identity.sql b/src/test/regress/sql/identity.sql
index 1bf2a976eb..b3d99583c2 100644
--- a/src/test/regress/sql/identity.sql
+++ b/src/test/regress/sql/identity.sql
@@ -47,6 +47,7 @@ INSERT INTO itest3 DEFAULT VALUES;
 INSERT INTO itest3 DEFAULT VALUES;
 INSERT INTO itest4 DEFAULT VALUES;
 INSERT INTO itest4 DEFAULT VALUES;
+INSERT INTO itest4 (a, b) VALUES (DEFAULT, 'foo'), (DEFAULT, 'bar');

 SELECT * FROM itest1;
 SELECT * FROM itest2;
--
2.28.0.windows.1

From 7a187f698d31638c65da10a71e97060793a29c7f Mon Sep 17 00:00:00 2001
From: Mikhail Titov 
Date: Wed, 12 Aug 2020 21:07:22 -0500
Subject: [PATCH 2/2] Allow DEFAULT in VALUES RTE for generated columns

---
 src/backend/parser/parse_relation.c  | 50 ++--
 src/backend/rewrite/rewriteHandler.c | 13 +++-
 2 files changed, 51 insertions(+), 12 deletions(-)

diff --git

Re: run pgindent on a regular basis / scripted manner

On Thu, Aug 13, 2020 at 12:08:36AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote:
> >> If the workflow is commit first and re-indent later, then we can always
> >> revert the pgindent commit and clean things up manually; but an auto
> >> re-indent during commit wouldn't provide that history.
> 
> > There are competing implementations of assuring pgindent-cleanliness at 
> > every
> > check-in:
> 
> > 1. After each push, an automated followup commit appears, restoring
> >pgindent-cleanliness.
> > 2. "git push" results in a commit that melds pgindent changes into what the
> >committer tried to push.
> > 3. "git push" fails, for the master branch, if the pushed commit disrupts
> >pgindent-cleanliness.
> 
> > Were you thinking of (2)?
> 
> I was objecting to (2).  (1) would perhaps work.  (3) could be pretty
> darn annoying,

Right.  I think of three use cases here:

a) I'm a committer who wants to push clean code.  I want (3).
b) I'm a committer who wants to ignore pgindent.  I get some email complaints
   under (1), which I ignore.  Under (3), I'm forced to become (a).
c) I'm reading the history.  I want (3).

> I hadn't thought about the angle of HEAD versus back-branch patches,
> but that does seem like something to ponder.  The back branches don't
> have the same pgindent rules necessarily, plus the patch versions
> might be different in more than just whitespace.  My own habit when
> back-patching has been to indent the HEAD patch per-current-rules and
> then preserve that layout as much as possible in the back branches,
> but I doubt we could get a tool to do that with any reliability.

Similar habit here.  Another advantage of master-only is a guarantee against
disrupting time-critical patches.  (It would be ugly to push back branches and
sort out the master push later, but it doesn't obstruct the mission.)

Re: run pgindent on a regular basis / scripted manner

Noah Misch  writes:
> On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote:
>> If the workflow is commit first and re-indent later, then we can always
>> revert the pgindent commit and clean things up manually; but an auto
>> re-indent during commit wouldn't provide that history.

> There are competing implementations of assuring pgindent-cleanliness at every
> check-in:

> 1. After each push, an automated followup commit appears, restoring
>pgindent-cleanliness.
> 2. "git push" results in a commit that melds pgindent changes into what the
>committer tried to push.
> 3. "git push" fails, for the master branch, if the pushed commit disrupts
>pgindent-cleanliness.

> Were you thinking of (2)?

I was objecting to (2).  (1) would perhaps work.  (3) could be pretty
darn annoying, especially if it blocks a time-critical security patch.

> Regarding typedefs.list, I would use the in-tree one, like you discussed here:

Yeah, after thinking about that more, it seems like automated
typedefs.list updates would be far riskier than automated reindent
based on the existing typedefs.list.  The latter could at least be
expected not to change code unrelated to the immediate commit.
typedefs.list updates need some amount of adult supervision.

(I'd still vote for nag-mail to the committer whose work got reindented,
in case the bot made things a lot worse.)

I hadn't thought about the angle of HEAD versus back-branch patches,
but that does seem like something to ponder.  The back branches don't
have the same pgindent rules necessarily, plus the patch versions
might be different in more than just whitespace.  My own habit when
back-patching has been to indent the HEAD patch per-current-rules and
then preserve that layout as much as possible in the back branches,
but I doubt we could get a tool to do that with any reliability.

Of course, there's also the possibility of forcibly reindenting
all the active back branches to current rules.  But I think we've
rejected that idea already, because it would cause so much pain
for forks that are following a back branch.

regards, tom lane

Re: run pgindent on a regular basis / scripted manner

On Wed, Aug 12, 2020 at 07:47:01PM -0400, Tom Lane wrote:
> Jesse Zhang  writes:
> > On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
> >> Is there any reason we don't just automatically run pgindent regularly?
> >> Like once a week? And also update typedefs.list automatically, while
> >> we're at it?
> 
> > You know what's better than weekly? Every check-in.
> 
> I'm not in favor of unsupervised pgindent runs, really.  It can do a lot
> of damage to code that was written without thinking about it --- in
> particular, it'll make a hash of comment blocks that were manually
> formatted and not protected with dashes.
> 
> If the workflow is commit first and re-indent later, then we can always
> revert the pgindent commit and clean things up manually; but an auto
> re-indent during commit wouldn't provide that history.

There are competing implementations of assuring pgindent-cleanliness at every
check-in:

1. After each push, an automated followup commit appears, restoring
   pgindent-cleanliness.
2. "git push" results in a commit that melds pgindent changes into what the
   committer tried to push.
3. "git push" fails, for the master branch, if the pushed commit disrupts
   pgindent-cleanliness.

Were you thinking of (2)?  (1) doesn't have the lack-of-history problem, but
it does have the unexpected-damage problem, and it makes gitweb noisier.  (3)
has neither problem, and I'd prefer it over (1), (2), or $SUBJECT.

Regarding typedefs.list, I would use the in-tree one, like you discussed here:

On Wed, Aug 12, 2020 at 07:57:29PM -0400, Tom Lane wrote:
> Maybe the secret is to not allow automated adoption of new typedefs.list
> entries, but to require somebody to add entries to that file by hand,
> even if they're basing it on the buildfarm results.  (This would
> encourage the habit some people have adopted of updating typedefs.list
> along with commits that add typedefs.  I've never done that, but would
> be willing to change if there's good motivation to.)

Re: Handing off SLRU fsyncs to the checkpointer

On Wed, Aug 12, 2020 at 6:06 PM Thomas Munro  wrote:
> [patch]

Bitrot, rebased, no changes.

> Yeah, the combined effect of these two patches is better than I
> expected.  To be clear though, I was only measuring the time between
> the "redo starts at ..." and "redo done at ..." messages, since I've
> been staring at the main recovery code, but there are also some more
> fsyncs before (SyncDataDirectory()) and after (RemoveOldXlogFiles())
> that are unaffected.  I think it's probably possible to do something
> about those too, but that's another topic.

... and of course the end-of-recovery checkpoint; in my tests this
wasn't materially changed since there isn't actually very much CLOG,
it's just that we avoided syncing it block at a time and getting
rescheduled.  FWIW I put a very simple test here:
https://github.com/macdice/redo-bench, YMMV.
From 32c1c16d2800467d1d179678b66d1042d07c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 4 Aug 2020 17:57:18 +1200
Subject: [PATCH v2] Defer flushing of SLRU files.

Previously, we called fsync() after writing out pg_xact, multixact and
commit_ts pages, leading to an I/O stall in user backends and recovery.
Collapse requests for the same file into a single system call as part of
the next checkpoint, as we do for relation files.

Discussion: https://postgr.es/m/CA+hUKGLJ=84yt+nvhkeedauutvhmfq9i-n7k_o50jmq6rpj...@mail.gmail.com
---
 src/backend/access/transam/clog.c  |  13 +++-
 src/backend/access/transam/commit_ts.c |  12 ++-
 src/backend/access/transam/multixact.c |  24 +-
 src/backend/access/transam/slru.c  | 101 +++--
 src/backend/access/transam/subtrans.c  |   4 +-
 src/backend/commands/async.c   |   5 +-
 src/backend/storage/lmgr/predicate.c   |   4 +-
 src/backend/storage/sync/sync.c|  28 ++-
 src/include/access/clog.h  |   3 +
 src/include/access/commit_ts.h |   3 +
 src/include/access/multixact.h |   4 +
 src/include/access/slru.h  |  12 ++-
 src/include/storage/sync.h |   7 +-
 13 files changed, 174 insertions(+), 46 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index dd2f4d5bc7..3eb33aea01 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -42,6 +42,7 @@
 #include "pg_trace.h"
 #include "pgstat.h"
 #include "storage/proc.h"
+#include "storage/sync.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -692,7 +693,8 @@ CLOGShmemInit(void)
 {
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "Xact", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER);
+  XactSLRULock, "pg_xact", LWTRANCHE_XACT_BUFFER,
+  SYNC_HANDLER_CLOG);
 }
 
 /*
@@ -1034,3 +1036,12 @@ clog_redo(XLogReaderState *record)
 	else
 		elog(PANIC, "clog_redo: unknown op code %u", info);
 }
+
+/*
+ * Entrypoint for sync.c to sync clog files.
+ */
+int
+clogsyncfiletag(const FileTag *ftag, char *path)
+{
+	return slrusyncfiletag(XactCtl, ftag, path);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 5244b06a2b..913ec9e48d 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -555,7 +555,8 @@ CommitTsShmemInit(void)
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs", CommitTsShmemBuffers(), 0,
   CommitTsSLRULock, "pg_commit_ts",
-  LWTRANCHE_COMMITTS_BUFFER);
+  LWTRANCHE_COMMITTS_BUFFER,
+  SYNC_HANDLER_COMMIT_TS);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
 	 sizeof(CommitTimestampShared),
@@ -1083,3 +1084,12 @@ commit_ts_redo(XLogReaderState *record)
 	else
 		elog(PANIC, "commit_ts_redo: unknown op code %u", info);
 }
+
+/*
+ * Entrypoint for sync.c to sync commit_ts files.
+ */
+int
+committssyncfiletag(const FileTag *ftag, char *path)
+{
+	return slrusyncfiletag(CommitTsCtl, ftag, path);
+}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index b8bedca04a..344006b0f5 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1831,11 +1831,13 @@ MultiXactShmemInit(void)
 	SimpleLruInit(MultiXactOffsetCtl,
   "MultiXactOffset", NUM_MULTIXACTOFFSET_BUFFERS, 0,
   MultiXactOffsetSLRULock, "pg_multixact/offsets",
-  LWTRANCHE_MULTIXACTOFFSET_BUFFER);
+  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
+  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SimpleLruInit(MultiXactMemberCtl,
   "MultiXactMember", NUM_MULTIXACTMEMBER_BUFFERS, 0,
   MultiXactMemberSLRULock, "pg_multixact/members",
-  LWTRANCHE_MULTIXACTMEMBER_BUFFER);
+  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
+  SYNC_HANDLER_MULTIXACT_MEMBER);
 
 	/* Initialize our shared state struct */
 	MultiXactState = ShmemInitStruct("Shared MultiXact State",
@@ -3386,3 +3388,21 @@

Autonomous database is coming to Postgres?

2020-08-12 Thread tsunakawa.ta...@fujitsu.com

Hello,

I'm not sure if I should have posted this to pgsql-advocacy, but this is being 
developed so I posted here.

Does anyone know if this development come to open source Postgres, or only to 
the cloud services of Microsoft and Google?

(I wonder this will become another reason that Postgres won't incorporate 
optimizer hint feature.)

Data systems that learn to be better
http://news.mit.edu/2020/mit-data-systems-learn-be-better-tsunami-bao-0810


[Quote]
--
As a first step toward this vision, Kraska and colleagues developed Tsunami and 
Bao. Tsunami uses machine learning to automatically re-organize a dataset’s 
storage layout based on the types of queries that its users make. Tests show 
that it can run queries up to 10 times faster than state-of-the-art systems. 
What’s more, its datasets can be organized via a series of "learned indexes" 
that are up to 100 times smaller than the indexes used in traditional systems. 

Bao, meanwhile, focuses on improving the efficiency of query optimization 
through machine learning.
...
Traditional query optimizers take years to build, are very hard to maintain, 
and, most importantly, do not learn from their mistakes. Bao is the first 
learning-based approach to query optimization that has been fully integrated 
into the popular database management system PostgreSQL. Lead author Ryan 
Marcus, a postdoc in Kraska’s group, says that Bao produces query plans that 
run up to 50 percent faster than those created by the PostgreSQL optimizer, 
meaning that it could help to significantly reduce the cost of cloud services, 
like Amazon’s Redshift, that are based on PostgreSQL.

Kraska says that in contrast to other learning-based approaches to query 
optimization, Bao learns much faster and can outperform open-source and 
commercial optimizers with as little as one hour of training time.In the 
future, his team aims to integrate Bao into cloud systems to improve resource 
utilization in environments where disk, RAM, and CPU time are scarce resources.
...
The work was done as part of the Data System and AI Lab (DSAIL@CSAIL), which is 
sponsored by Intel, Google, Microsoft, and the U.S. National Science 
Foundation. 
--


Regards
Takayuki Tsunakawa

Re: remove some ancient port hacks

On Wed, Aug 12, 2020 at 09:12:07AM +0200, Peter Eisentraut wrote:
> There are two ancient hacks in the cygwin and solaris ports that appear to
> have been solved more than 10 years ago, so I think we can remove them.  See
> attached patches.

+1 for removing these.  >10y age is not sufficient justification by itself; if
systems that shipped with the defect were not yet EOL, that would tend to
justify waiting longer.  For these particular hacks, though, affected systems
are both old and EOL.

Re: security_context_t marked as deprecated in libselinux 3.1

Michael Paquier  writes:
> Per the following commit in upstream SELinux, security_context_t has
> been marked as deprecated, generating complains with
> -Wdeprecated-declarations:
> https://github.com/SELinuxProject/selinux/commit/7a124ca2758136f49cc38efc26fb1a2d385ecfd9

Huh.  Apparently it's been considered legacy for a good while, too.

> This can be seen with Debian GID when building contrib/selinux/, as it
> we have libselinux 3.1 there.  Per the upstream repo,
> security_context_t maps to char * in include/selinux/selinux.h, so we
> can get rid easily of the warnings with the attached that replaces
> the references to security_context_t.

Ummm ... aren't you going to get some cast-away-const warnings now?
Or are all of the called functions declared as taking "const char *"
not just "char *"?

regards, tom lane

Re: Parallel query hangs after a smart shutdown is issued

Thomas Munro  writes:
> On Thu, Aug 13, 2020 at 10:21 AM Tom Lane  wrote:
>> Also, the state before PM_WAIT_READONLY could have been
>> PM_RECOVERY or PM_STARTUP, in which case we don't really want to think
>> it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart
>> case should be accepted.  That suggests that we need yet another pmState,
>> or else a more thoroughgoing refactoring of how the postmaster's state
>> is represented.

> Hmm.

I experimented with separating the shutdown-in-progress state into a
separate variable, letting us actually reduce not increase the number of
pmStates.  This way, PM_RUN and other states still apply until we're
ready to pull the shutdown trigger, so that we don't need to complicate
state-based decisions about launching auxiliary processes.  This patch
also unifies the signal-sending for the smart and fast shutdown paths,
which seems like a nice improvement.  I kind of like this, though I'm not
in love with the particular variable name I used here (smartShutState).

If we go this way, CAC_WAITBACKUP ought to be renamed since the PMState
it's named after no longer exists.  I left that alone pending making
final naming choices, though.

regards, tom lane

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
stop mode shuts down the server that is running in
the specified data directory.  Three different
shutdown methods can be selected with the -m
-   option.  Smart mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  Smart mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
If the server is in hot standby, recovery and streaming replication
will be terminated once all clients have disconnected.
Fast mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..66b402e7f7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER	0x0008	/* bgworker process */
 #define BACKEND_TYPE_ALL		0x000F	/* OR of all the above */
 
-#define BACKEND_TYPE_WORKER		(BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -304,8 +302,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
  * and we switch to PM_RUN state.
  *
  * Normal child backends can only be launched when we are in PM_RUN or
- * PM_HOT_STANDBY state.  (We also allow launch of normal
- * child backends in PM_WAIT_BACKUP state, but only for superusers.)
+ * PM_HOT_STANDBY state.  (smartShutState can also restrict launching.)
  * In other states we handle connection requests by launching "dead_end"
  * child processes, which will simply send the client an error message and
  * quit.  (We track these in the BackendList so that we can know when they
@@ -319,10 +316,10 @@ static bool FatalError = false; /* T if recovering from backend crash */
  *
  * Notice that this state variable does not distinguish *why* we entered
  * states later than PM_RUN --- Shutdown and FatalError must be consulted
- * to find that out.  FatalError is never true in PM_RECOVERY_* or PM_RUN
- * states, nor in PM_SHUTDOWN states (because we don't enter those states
- * when trying to recover from a crash).  It can be true in PM_STARTUP state,
- * because we don't clear it until we've successfully started WAL redo.
+ * to find that out.  FatalError is never true in PM_RECOVERY, PM_HOT_STANDBY,
+ * or PM_RUN states, nor in PM_SHUTDOWN states (because we don't enter those
+ * states when trying to recover from a crash).  It can be true in PM_STARTUP
+ * state, because we don't clear it until we've successfully started WAL redo.
  */
 typedef enum
 {
@@ -331,8 +328,7 @@ typedef enum
 	PM_RECOVERY,/* in archive recovery mode */
 	PM_HOT_STANDBY,/* in hot standby mode */
 	PM_RUN,		/* normal "database is alive" state */
-	PM_WAIT_BACKUP,/* waiting for online backup mode to end */
-	PM_WAIT_READONLY,			/* waiting for read only backends to exit */
+	PM_STOP_BACKENDS,			/* need to stop remaining backends */
 	PM_WAIT_BACKENDS,			/* waiting for live backends to exit */
 	PM_SHUTDOWN,/* waiting for checkpointer to do shutdown
  * ckpt */
@@ -344,6 +340,21 @@ typedef enum
 
 static PMState pmState = PM_INIT;
 
+/*
+ * While performing a "smart shutdown", we restrict new connections but stay
+ * in PM_RUN or PM_HOT_STANDBY state until all the client backends are gone.
+ * smartShutState is a sub-state indicator showing the

Re: run pgindent on a regular basis / scripted manner

On Wed, Aug 12, 2020 at 06:53:25PM -0400, Alvaro Herrera wrote:
> On 2020-Aug-12, Andres Freund wrote:
>> Is there any reason we don't just automatically run pgindent regularly?
>> Like once a week? And also update typedefs.list automatically, while
>> we're at it?
> 
> Seconded.

Thirded.
--
Michael


signature.asc
Description: PGP signature

security_context_t marked as deprecated in libselinux 3.1

Hi all,

Per the following commit in upstream SELinux, security_context_t has
been marked as deprecated, generating complains with
-Wdeprecated-declarations:
https://github.com/SELinuxProject/selinux/commit/7a124ca2758136f49cc38efc26fb1a2d385ecfd9

This can be seen with Debian GID when building contrib/selinux/, as it
we have libselinux 3.1 there.  Per the upstream repo,
security_context_t maps to char * in include/selinux/selinux.h, so we
can get rid easily of the warnings with the attached that replaces
the references to security_context_t.  Funnily, our code already mixes
both definitions, see for example sepgsql_set_client_label, so this
clarifies things.

Any thoughts?
--
Michael
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 32e405530b..b00b91df5a 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -120,7 +120,7 @@ sepgsql_set_client_label(const char *new_label)
 		tcontext = client_label_peer;
 	else
 	{
-		if (security_check_context_raw((security_context_t) new_label) < 0)
+		if (security_check_context_raw(new_label) < 0)
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_NAME),
 	 errmsg("SELinux: invalid security label: \"%s\"",
@@ -453,9 +453,9 @@ sepgsql_get_label(Oid classId, Oid objectId, int32 subId)
 	object.objectSubId = subId;
 
 	label = GetSecurityLabel(, SEPGSQL_LABEL_TAG);
-	if (!label || security_check_context_raw((security_context_t) label))
+	if (!label || security_check_context_raw(label))
 	{
-		security_context_t unlabeled;
+		char	   *unlabeled;
 
 		if (security_get_initial_context_raw("unlabeled", ) < 0)
 			ereport(ERROR,
@@ -487,7 +487,7 @@ sepgsql_object_relabel(const ObjectAddress *object, const char *seclabel)
 	 * context of selinux.
 	 */
 	if (seclabel &&
-		security_check_context_raw((security_context_t) seclabel) < 0)
+		security_check_context_raw(seclabel) < 0)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_NAME),
  errmsg("SELinux: invalid security label: \"%s\"", seclabel)));
@@ -725,7 +725,7 @@ exec_object_restorecon(struct selabel_handle *sehnd, Oid catalogId)
 		char	   *objname;
 		int			objtype = 1234;
 		ObjectAddress object;
-		security_context_t context;
+		char	   *context;
 
 		/*
 		 * The way to determine object name depends on object classes. So, any
diff --git a/contrib/sepgsql/selinux.c b/contrib/sepgsql/selinux.c
index 9fdc810f2e..2695e88f23 100644
--- a/contrib/sepgsql/selinux.c
+++ b/contrib/sepgsql/selinux.c
@@ -768,8 +768,8 @@ sepgsql_compute_avd(const char *scontext,
 	 * Ask SELinux what is allowed set of permissions on a pair of the
 	 * security contexts and the given object class.
 	 */
-	if (security_compute_av_flags_raw((security_context_t) scontext,
-	  (security_context_t) tcontext,
+	if (security_compute_av_flags_raw(scontext,
+	  tcontext,
 	  tclass_ex, 0, _ex) < 0)
 		ereport(ERROR,
 (errcode(ERRCODE_INTERNAL_ERROR),
@@ -838,7 +838,7 @@ sepgsql_compute_create(const char *scontext,
 	   uint16 tclass,
 	   const char *objname)
 {
-	security_context_t ncontext;
+	char	   *ncontext;
 	security_class_t tclass_ex;
 	const char *tclass_name;
 	char	   *result;
@@ -853,8 +853,8 @@ sepgsql_compute_create(const char *scontext,
 	 * Ask SELinux what is the default context for the given object class on a
 	 * pair of security contexts
 	 */
-	if (security_compute_create_name_raw((security_context_t) scontext,
-		 (security_context_t) tcontext,
+	if (security_compute_create_name_raw(scontext,
+		 tcontext,
 		 tclass_ex,
 		 objname,
 		 ) < 0)
diff --git a/contrib/sepgsql/uavc.c b/contrib/sepgsql/uavc.c
index 639a52c556..97189b7c46 100644
--- a/contrib/sepgsql/uavc.c
+++ b/contrib/sepgsql/uavc.c
@@ -171,7 +171,7 @@ sepgsql_avc_unlabeled(void)
 {
 	if (!avc_unlabeled)
 	{
-		security_context_t unlabeled;
+		char	   *unlabeled;
 
 		if (security_get_initial_context_raw("unlabeled", ) < 0)
 			ereport(ERROR,
@@ -216,7 +216,7 @@ sepgsql_avc_compute(const char *scontext, const char *tcontext, uint16 tclass)
 	 * policy is reloaded, validation status shall be kept, so we also cache
 	 * whether the supplied security context was valid, or not.
 	 */
-	if (security_check_context_raw((security_context_t) tcontext) != 0)
+	if (security_check_context_raw(tcontext) != 0)
 		ucontext = sepgsql_avc_unlabeled();
 
 	/*


signature.asc
Description: PGP signature

Re: Add LWLock blocker(s) information

2020-08-12 Thread Peter Geoghegan

On Wed, Aug 12, 2020 at 5:39 PM Andres Freund  wrote:
> Attached. Needed one python3 fix, and to be adapted so it works with
> futex based semaphores. Seems to work for both sysv and posix semaphores
> now, based a very short test.

Great, thanks!

-- 
Peter Geoghegan

Re: [HACKERS] Dynamic instrumentation of lwlock wait times (lwlock flamegraphs)

Hi,

On 2017-06-22 14:08:45 -0700, Andres Freund wrote:
> At pgcon some people were talking about the difficulty of instrumenting
> the time actually spent waiting for lwlocks and related measurements.
> I'd mentioned that linux these days provides infrastructure to measure
> such things in unmodified binaries.
> 
> Attached is a prototype of a script that measures the time spent inside
> PGSemaphoreLock(), aggregates that inside the kernel, grouped by pid and
> stacktrace.  That allows one to generate nice flame graphs showing which
> part of the code waits how long for lwlocks.
> 
> The attached script clearly needs improvements, but I thought I'd show
> some of the results it can get.  To run it you need the the python
> library of the 'bcc' project [1], and a sufficiently new kernel.  Some
> distributions, e.g. newer debian versions, package this as python-bpfcc
> and similar.
> 
> The output of the script can be turned into a flamegraph with
> https://github.com/brendangregg/FlameGraph 's flamegraph.pl.

The script has bitrot slightly, due to python3 and postgres changes (the
move to posix semaphores). Updated version attached.

Based on the discussion in
https://www.postgresql.org/message-id/20200813003934.yrm4qqngfygr6ii7%40alap3.anarazel.de

Greetings,

Andres Freund
#!/usr/bin/python

from __future__ import print_function
from bcc import BPF
from time import sleep
import argparse
import signal

def positive_int(val):
try:
ival = int(val)
except ValueError:
raise argparse.ArgumentTypeError("must be an integer")

if ival < 0:
raise argparse.ArgumentTypeError("must be positive")
return ival

def positive_nonzero_int(val):
ival = positive_int(val)
if ival == 0:
raise argparse.ArgumentTypeError("must be nonzero")
return ival

bpf_text = """
#include 
#include 

struct stats_key_t {
u32 pid;
int user_stack_id;
};

struct stats_value_t {
u64 total_time;
};

struct start_key_t {
u32 pid;
};

struct start_value_t {
u64 last_start;
};

// map_type, key_type, leaf_type, table_name, num_entry
BPF_HASH(stats, struct stats_key_t, struct stats_value_t);
BPF_HASH(start, struct start_key_t, struct start_value_t);

BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE);

int trace_sem_entry(struct pt_regs *ctx)
{
u32 pid = bpf_get_current_pid_tgid();
struct start_key_t start_key = {};
struct start_value_t start_value = {};

if (!(THREAD_FILTER)) {
return 0;
}

start_key.pid = pid;
start_value.last_start = bpf_ktime_get_ns();

start.update(_key, _value);

return 0;
}

int trace_sem_return(struct pt_regs *ctx)
{
u32 pid = bpf_get_current_pid_tgid();
struct stats_key_t stats_key = {};
struct start_key_t start_key = {};
struct stats_value_t zero = {};
struct stats_value_t *stats_value;
struct start_value_t *start_value;
u64 delta;

if (!(THREAD_FILTER)) {
return 0;
}

start_key.pid = pid;
start_value = start.lookup(_key);

if (!start_value)
return 0; /* missed start */;

delta = bpf_ktime_get_ns() - start_value->last_start;
start.delete(_key);

stats_key.pid = pid;
stats_key.user_stack_id = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID | BPF_F_USER_STACK);

stats_value = stats.lookup_or_init(_key, );
stats_value->total_time += delta;

return 0;
}

"""

examples = """examples:
./pgsemwait.py -x BINARY# trace postgres lwlock wait time until Ctrl-C
./pgsemwait.py -x BINARY 5  # trace for 5 seconds only
./pgsemwait.py -x BINARY -p PID # trace PID
"""

parser = argparse.ArgumentParser(
description="Measure Postgres LWLock Wait Time",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-x", "--binary", metavar="BINARY", dest="binary", required = True,
help="path to postgres binary")
parser.add_argument("-p", "--pid", metavar="PID", dest="pid",
help="trace this PID only", type=positive_int)
parser.add_argument("-f", "--folded", action="store_true",
help="output folded format")
parser.add_argument("duration", nargs="?", default=,
type=positive_nonzero_int,
help="duration of trace, in seconds")
parser.add_argument("--stack-storage-size", default=1024,
type=positive_nonzero_int,
help="the number of unique stack traces that can be stored and "
 "displayed (default 1024)")

args = parser.parse_args()

folded = args.folded
duration = int(args.duration)

# set stack storage size
bpf_text = bpf_text.replace('STACK_STORAGE_SIZE', str(args.stack_storage_size))

# setup pid filter
thread_filter = '1'
if args.pid is not None:
thread_filter = 'pid == %d' % args.pid
bpf_text = bpf_text.replace('THREAD_FILTER', thread_filter)

binary = args.binary

b = BPF(text=bpf_text)

libpath = BPF.find_exe(binary)
if not libpath:
bail("can't resolve library %s" % library)
library = libpath

Re: Add LWLock blocker(s) information

Hi,

On 2020-08-12 16:47:13 -0700, Peter Geoghegan wrote:
> On Mon, Aug 10, 2020 at 5:41 PM Andres Freund  wrote:
> > Most of the cases where this kind of information really is interesting
> > seem to benefit a lot from having stack information available.  That
> > obviously has overhead, so we don't want the cost all the
> > time.  The script at
> > https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de
> > can give you results like e.g.
> > https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg
> 
> It seems to have bitrot. Do you have a more recent version of the script?

Attached. Needed one python3 fix, and to be adapted so it works with
futex based semaphores. Seems to work for both sysv and posix semaphores
now, based a very short test.

sudo python3 ./pgsemwait.py -x 
/home/andres/build/postgres/dev-optimize/vpath/src/backend/postgres -f 
3|~/src/flamegraph/flamegraph.pl

Will add a note to the other thread.

Greetings,

Andres Freund
#!/usr/bin/python

from __future__ import print_function
from bcc import BPF
from time import sleep
import argparse
import signal

def positive_int(val):
try:
ival = int(val)
except ValueError:
raise argparse.ArgumentTypeError("must be an integer")

if ival < 0:
raise argparse.ArgumentTypeError("must be positive")
return ival

def positive_nonzero_int(val):
ival = positive_int(val)
if ival == 0:
raise argparse.ArgumentTypeError("must be nonzero")
return ival

bpf_text = """
#include 
#include 

struct stats_key_t {
u32 pid;
int user_stack_id;
};

struct stats_value_t {
u64 total_time;
};

struct start_key_t {
u32 pid;
};

struct start_value_t {
u64 last_start;
};

// map_type, key_type, leaf_type, table_name, num_entry
BPF_HASH(stats, struct stats_key_t, struct stats_value_t);
BPF_HASH(start, struct start_key_t, struct start_value_t);

BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE);

int trace_sem_entry(struct pt_regs *ctx)
{
u32 pid = bpf_get_current_pid_tgid();
struct start_key_t start_key = {};
struct start_value_t start_value = {};

if (!(THREAD_FILTER)) {
return 0;
}

start_key.pid = pid;
start_value.last_start = bpf_ktime_get_ns();

start.update(_key, _value);

return 0;
}

int trace_sem_return(struct pt_regs *ctx)
{
u32 pid = bpf_get_current_pid_tgid();
struct stats_key_t stats_key = {};
struct start_key_t start_key = {};
struct stats_value_t zero = {};
struct stats_value_t *stats_value;
struct start_value_t *start_value;
u64 delta;

if (!(THREAD_FILTER)) {
return 0;
}

start_key.pid = pid;
start_value = start.lookup(_key);

if (!start_value)
return 0; /* missed start */;

delta = bpf_ktime_get_ns() - start_value->last_start;
start.delete(_key);

stats_key.pid = pid;
stats_key.user_stack_id = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID | BPF_F_USER_STACK);

stats_value = stats.lookup_or_init(_key, );
stats_value->total_time += delta;

return 0;
}

"""

examples = """examples:
./pgsemwait.py -x BINARY# trace postgres lwlock wait time until Ctrl-C
./pgsemwait.py -x BINARY 5  # trace for 5 seconds only
./pgsemwait.py -x BINARY -p PID # trace PID
"""

parser = argparse.ArgumentParser(
description="Measure Postgres LWLock Wait Time",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-x", "--binary", metavar="BINARY", dest="binary", required = True,
help="path to postgres binary")
parser.add_argument("-p", "--pid", metavar="PID", dest="pid",
help="trace this PID only", type=positive_int)
parser.add_argument("-f", "--folded", action="store_true",
help="output folded format")
parser.add_argument("duration", nargs="?", default=,
type=positive_nonzero_int,
help="duration of trace, in seconds")
parser.add_argument("--stack-storage-size", default=1024,
type=positive_nonzero_int,
help="the number of unique stack traces that can be stored and "
 "displayed (default 1024)")

args = parser.parse_args()

folded = args.folded
duration = int(args.duration)

# set stack storage size
bpf_text = bpf_text.replace('STACK_STORAGE_SIZE', str(args.stack_storage_size))

# setup pid filter
thread_filter = '1'
if args.pid is not None:
thread_filter = 'pid == %d' % args.pid
bpf_text = bpf_text.replace('THREAD_FILTER', thread_filter)

binary = args.binary

b = BPF(text=bpf_text)

libpath = BPF.find_exe(binary)
if not libpath:
bail("can't resolve library %s" % library)
library = libpath

b.attach_uprobe(name=library, sym_re='PGSemaphoreLock|futex',
fn_name="trace_sem_entry",
pid = -1)

b.attach_uretprobe(name=library, sym_re='PGSemaphoreLock|futex',
fn_name="trace_sem_return",
pid = -1)

matched = b.num_open_uprobes()
if matched == 0:
print("error: 0 functions traced. Exiting.", file=stderr)
exit(1)

#

Re: run pgindent on a regular basis / scripted manner

Andres Freund  writes:
> Unfortunately that is, with the current tooling, not entirely trivial to
> do so completely. The way we generate the list of known typedefs
> unfortunately depends on the platform a build is run on. Therefore the
> buildfarm collects a number of the generated list of typedefs from
> different platforms, and then we use that combined list to run pgindent.

Yeah, it's hard to see how to avoid that given that the set of typedefs
provided by system headers is not fixed.  (Windows vs not Windows is the
worst case of course, but Unixen aren't all alike either.)

Another gotcha that we have to keep our eyes on is that sometimes the
process finds spurious names that we don't want to treat as typedefs
because it results in messing up too much code.  There's a reject list
in pgindent that takes care of those, but it has to be maintained
manually.  So I'm not sure how that could work in conjunction with
unsupervised reindents --- by the time you notice a problem, the git
history will already be cluttered with bogus reindentations.

Maybe the secret is to not allow automated adoption of new typedefs.list
entries, but to require somebody to add entries to that file by hand,
even if they're basing it on the buildfarm results.  (This would
encourage the habit some people have adopted of updating typedefs.list
along with commits that add typedefs.  I've never done that, but would
be willing to change if there's good motivation to.)

regards, tom lane

Re: Switch to multi-inserts for pg_depend

On 2020-Aug-11, Robert Haas wrote:

> On Tue, Aug 11, 2020 at 1:59 AM Michael Paquier  wrote:
> > On Mon, Aug 10, 2020 at 05:32:21PM -0700, Andres Freund wrote:
> > > Do we really want to end up with several separate defines for different
> > > type of catalog batch inserts? That doesn't seem like a good
> > > thing. Think there should be a single define for all catalog bulk
> > > inserts.
> >
> > Unlikely so, but I kept them separate to potentially lower the
> > threshold of 64kB for catalog rows that have a lower average size than
> > pg_attribute.
> 
> Uh, why would we want to do that?

Yeah.  As I understand, the only reason to have this number is to avoid
an arbitrarily large number of entries created as a single multi-insert
WAL record ... but does that really ever happen?  I guess if you create
a table with some really complicated schema you might get, say, a
hundred pg_depend rows at once.  But to fill eight complete pages of
pg_depend entries sounds astoundingly ridiculous already -- I'd say it's
just an easy way to spell "infinity" for this.  Tweaking one infinity
value to become some other infinity value sounds useless.

So I agree with what Andres said.  Let's have just one such define and
be done with it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add LWLock blocker(s) information

2020-08-12 Thread Peter Geoghegan

On Mon, Aug 10, 2020 at 5:41 PM Andres Freund  wrote:
> Most of the cases where this kind of information really is interesting
> seem to benefit a lot from having stack information available.  That
> obviously has overhead, so we don't want the cost all the
> time.  The script at
> https://postgr.es/m/20170622210845.d2hsbqv6rxu2tiye%40alap3.anarazel.de
> can give you results like e.g.
> https://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg

It seems to have bitrot. Do you have a more recent version of the script?


-- 
Peter Geoghegan

Re: run pgindent on a regular basis / scripted manner

Jesse Zhang  writes:
> On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
>> Is there any reason we don't just automatically run pgindent regularly?
>> Like once a week? And also update typedefs.list automatically, while
>> we're at it?

> You know what's better than weekly? Every check-in.

I'm not in favor of unsupervised pgindent runs, really.  It can do a lot
of damage to code that was written without thinking about it --- in
particular, it'll make a hash of comment blocks that were manually
formatted and not protected with dashes.

If the workflow is commit first and re-indent later, then we can always
revert the pgindent commit and clean things up manually; but an auto
re-indent during commit wouldn't provide that history.

I do like the idea of more frequent, smaller pgindent runs instead of
doing just one giant run per year.  With the giant runs it's necessary
to invest a fair amount of time eyeballing all the changes; if we did it
every couple weeks then the pain would be a lot less.

Another idea would be to have a bot that runs pgindent *without*
committing the results, and emails the people who have made commits
into files that changed, saying "if you don't like these diffs then
you'd better clean up your commit before it happens for real".  With
some warning like that, it might be okay to do automatic reindenting
a little bit later.  Plus, nagging committers who habitually commit
improperly-indented code might persuade them to clean up their acts ;-)

regards, tom lane

Re: Parallel query hangs after a smart shutdown is issued

On Thu, Aug 13, 2020 at 10:21 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > @@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime 
> > start_time)
> > +   case PM_WAIT_READONLY:
> > +   case PM_WAIT_CLIENTS:
> > case PM_RUN:
>
> So the question here is whether time-based bgworkers should be allowed to
> restart in this scenario.  I'm not quite sure --- depending on what the
> bgworker's purpose is, you could make an argument either way, I think.
> Do we need some way to control that?

I'm not sure why any bgworker would actually want to be shut down or
not restarted during the twilight zone of a smart shutdown though --
if users can do arbitrary stuff, why can't supporting workers carry
on?  For example, a hypothetical extension that triggers vacuum freeze
at smarter times, or a wait event sampling extension, an FDW that uses
an extra worker to maintain a connection to something, etc etc could
all be things that a user is indirectly relying on to do their normal
work, and I struggle to think of an example of something that you
explicitly don't want running just because (in some sense) the server
*plans* to shut down, when the users get around to logging off.  But
maybe I lack imagination.

> In any case, we'd want to treat PM_WAIT_READONLY like PM_HOT_STANDBY not
> PM_RUN, no?

Yeah, you're right.

> Also, the state before PM_WAIT_READONLY could have been
> PM_RECOVERY or PM_STARTUP, in which case we don't really want to think
> it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart
> case should be accepted.  That suggests that we need yet another pmState,
> or else a more thoroughgoing refactoring of how the postmaster's state
> is represented.

Hmm.

Re: run pgindent on a regular basis / scripted manner

Hi,

On 2020-08-12 16:08:50 -0700, Jesse Zhang wrote:
> On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
> >
> > Hi,
> >
> > When developing patches I find it fairly painful that I cannot re-indent
> > patches with pgindent without also seeing a lot of indentation changes
> > in unmodified parts of files.  It is easy enough ([1]) to only re-indent
> > files that I have modified, but there's often a lot of independent
> > indentation changes in the files that I did modified.
> >
> > I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
> > most of the hunks were entirely unrelated. Despite the development
> > window for 14 having only relatively recently opened. Based on my
> > experience it tends to get worse over time.
> 
> How bad was it right after branching 13? I wonder if we have any
> empirical measure of badness over time -- assuming there was a point in
> the recent past where everything was good, and the bad just crept in.

Well, just after branching it was perfect, because pgindent was
customarily is run just before branching. After that it incrementally
gets worse.


> > Is there any reason we don't just automatically run pgindent regularly?
> > Like once a week? And also update typedefs.list automatically, while
> > we're at it?
> 
> You know what's better than weekly? Every check-in. I for one would love
> it if we can just format the entire codebase, and ensure that new
> check-ins are also formatted. We _do_ need some form of continuous
> integration to catch us when we have fallen short (again, once HEAD
> reaches a "known good" state, it's conceivably cheap to keep it in the
> good state.

Unfortunately that is, with the current tooling, not entirely trivial to
do so completely. The way we generate the list of known typedefs
unfortunately depends on the platform a build is run on. Therefore the
buildfarm collects a number of the generated list of typedefs from
different platforms, and then we use that combined list to run pgindent.

We surely can improve further, but I think having any automation around
this already would be a huge step.

Greetings,

Andres Freund

Re: Dependencies for partitioned indexes are still a mess

Andres Freund  writes:
> Given that pg_dump already re-orders the columns for DDL, could we make
> it apply that re-ordering not just during the CREATE TABLE, but also
> when dumping the table contents?

Hm, possibly.  I think when this was last looked at, we didn't have any
way to get COPY to output the columns in non-physical order, but now we
do.

regards, tom lane

Re: run pgindent on a regular basis / scripted manner

2020-08-12 Thread Jesse Zhang

Hi Andres,

On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
>
> Hi,
>
> When developing patches I find it fairly painful that I cannot re-indent
> patches with pgindent without also seeing a lot of indentation changes
> in unmodified parts of files.  It is easy enough ([1]) to only re-indent
> files that I have modified, but there's often a lot of independent
> indentation changes in the files that I did modified.
>
> I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
> most of the hunks were entirely unrelated. Despite the development
> window for 14 having only relatively recently opened. Based on my
> experience it tends to get worse over time.

How bad was it right after branching 13? I wonder if we have any
empirical measure of badness over time -- assuming there was a point in
the recent past where everything was good, and the bad just crept in.

>
>
> Is there any reason we don't just automatically run pgindent regularly?
> Like once a week? And also update typedefs.list automatically, while
> we're at it?

You know what's better than weekly? Every check-in. I for one would love
it if we can just format the entire codebase, and ensure that new
check-ins are also formatted. We _do_ need some form of continuous
integration to catch us when we have fallen short (again, once HEAD
reaches a "known good" state, it's conceivably cheap to keep it in the
good state.

Cheers,
Jesse

Re: run pgindent on a regular basis / scripted manner

On 2020-Aug-12, Andres Freund wrote:

> Is there any reason we don't just automatically run pgindent regularly?
> Like once a week? And also update typedefs.list automatically, while
> we're at it?

Seconded.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Dependencies for partitioned indexes are still a mess

On 2020-Jul-15, Tom Lane wrote:

> Issue #2: parallel restore does not work
> 
> 1. dropdb r2; createdb r2
> 2. pg_restore -j8 -d r2 regression.dump 
> 
> This is fairly timing-dependent, but some attempts fail with messages
> like
> 
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey 
> postgres
> pg_restore: error: could not execute query: ERROR:  there is no unique 
> constraint matching given keys for referenced table "pk"
> Command was: ALTER TABLE fkpart3.fk
> ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);

Hmm, we do make the FK constraint depend on the ATTACH for the direct
children; what I think we're lacking is dependencies on descendants
twice-removed (?) or higher.  This mock patch seems to fix this problem
by adding dependencies recursively on all children of the index; I no
longer see this problem with it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 42391b0b2c..c78bbd7d00 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -7412,6 +7412,24 @@ getExtendedStatistics(Archive *fout)
 	destroyPQExpBuffer(query);
 }
 
+/* recursive bit of getConstraints */
+static void
+addConstrChildIdxDeps(DumpableObject *dobj, IndxInfo *refidx)
+{
+	SimplePtrListCell *cell;
+
+	for (cell = refidx->partattaches.head; cell; cell = cell->next)
+	{
+		DumpableObject *childobj = (DumpableObject *) cell->ptr;
+		IndexAttachInfo *attach = (IndexAttachInfo *) childobj;
+
+		addObjectDependency(dobj, childobj->dumpId);
+
+		if (attach->partitionIdx->partattaches.head != NULL)
+			addConstrChildIdxDeps(dobj, attach->partitionIdx);
+	}
+}
+
 /*
  * getConstraints
  *
@@ -7517,25 +7535,20 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
 			reftable = findTableByOid(constrinfo[j].confrelid);
 			if (reftable && reftable->relkind == RELKIND_PARTITIONED_TABLE)
 			{
-IndxInfo   *refidx;
 Oid			indexOid = atooid(PQgetvalue(res, j, i_conindid));
 
 if (indexOid != InvalidOid)
 {
 	for (int k = 0; k < reftable->numIndexes; k++)
 	{
-		SimplePtrListCell *cell;
+		IndxInfo   *refidx;
 
 		/* not our index? */
 		if (reftable->indexes[k].dobj.catId.oid != indexOid)
 			continue;
 
 		refidx = >indexes[k];
-		for (cell = refidx->partattaches.head; cell;
-			 cell = cell->next)
-			addObjectDependency([j].dobj,
-((DumpableObject *)
- cell->ptr)->dumpId);
+		addConstrChildIdxDeps([j].dobj, refidx);
 		break;
 	}
 }

Re: Dependencies for partitioned indexes are still a mess

Hi,

On 2020-08-12 18:29:16 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I've attached the diff between first.sql and second.sql. Here's the
> > highlights:
> 
> As I recall, the differences in b_star etc are expected, because
> pg_dump reorders that table's columns to match its inheritance parent,
> which they don't to start with because of ALTER TABLE operations.

Ugh.  Obviously applications shouldn't use INSERT or SELECT without a
target list, but that still seems somewhat nasty.

I guess we could script it so that we don't compare the "original" with
a restored database, but instead compare the restored version with one
restored from that. But that seems likely to hide bugs.

Given that pg_dump already re-orders the columns for DDL, could we make
it apply that re-ordering not just during the CREATE TABLE, but also
when dumping the table contents?

Greetings,

Andres Freund

run pgindent on a regular basis / scripted manner

Hi,

When developing patches I find it fairly painful that I cannot re-indent
patches with pgindent without also seeing a lot of indentation changes
in unmodified parts of files.  It is easy enough ([1]) to only re-indent
files that I have modified, but there's often a lot of independent
indentation changes in the files that I did modified.

I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
most of the hunks were entirely unrelated. Despite the development
window for 14 having only relatively recently opened. Based on my
experience it tends to get worse over time.


Is there any reason we don't just automatically run pgindent regularly?
Like once a week? And also update typedefs.list automatically, while
we're at it?

Currently the yearly pgindent runs are somewhat painful for patches that
didn't make it into the release, but if we were to reindent on a more
regular basis, that should be much less the case.  It'd also help newer
developers who we sometimes tell to use pgindent - which doesn't really
work.

Greetings,

Andres Freund

[1] ./src/tools/pgindent/pgindent $(git diff-tree --no-commit-id --name-only -r 
upstream/master..HEAD|grep -v src/test|grep -v READ ME|grep -v typedefs.list)

Re: Dependencies for partitioned indexes are still a mess

Andres Freund  writes:
> I've attached the diff between first.sql and second.sql. Here's the
> highlights:

As I recall, the differences in b_star etc are expected, because
pg_dump reorders that table's columns to match its inheritance parent,
which they don't to start with because of ALTER TABLE operations.

I'm pretty sure we set it up that way deliberately ages ago, because
pg_dump used to have bugs when contending with such cases.  Not sure
about a good way to mechanize recognizing that these diffs are
expected.

Dunno about test_type_diff2, but it might be a newer instance of
the same thing.

regards, tom lane

Re: Parallel query hangs after a smart shutdown is issued

Thomas Munro  writes:
> I think we also need:

> +   else if (Shutdown <= SmartShutdown &&
> +backend_type == BACKEND_TYPE_AUTOVAC)
> +   result = CAC_OK;

Hm, ok.

> Retesting the original complaint, I think we need:

> @@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime 
> start_time)
> +   case PM_WAIT_READONLY:
> +   case PM_WAIT_CLIENTS:
> case PM_RUN:

So the question here is whether time-based bgworkers should be allowed to
restart in this scenario.  I'm not quite sure --- depending on what the
bgworker's purpose is, you could make an argument either way, I think.
Do we need some way to control that?

In any case, we'd want to treat PM_WAIT_READONLY like PM_HOT_STANDBY not
PM_RUN, no?  Also, the state before PM_WAIT_READONLY could have been
PM_RECOVERY or PM_STARTUP, in which case we don't really want to think
it's like PM_HOT_STANDBY either; only the BgWorkerStart_PostmasterStart
case should be accepted.  That suggests that we need yet another pmState,
or else a more thoroughgoing refactoring of how the postmaster's state
is represented.

regards, tom lane

Re: Dependencies for partitioned indexes are still a mess

Hi,

On 2020-07-15 15:52:03 -0400, Tom Lane wrote:
> I've been experimenting with trying to dump-and-restore the
> regression database, which is a test case that for some reason
> we don't cover in the buildfarm (pg_upgrade is not the same thing).

Yea, we really should have that. IIRC I was trying to add that, and
tests that compare dumps from primary / standby, and failed due to some
differences that were hard to fix.

A quick test with pg_dumpall shows some odd differences after:
1) create new cluster
2) installcheck-parallel
3) drop table gtest30_1, gtest1_1;
4) pg_dumpall > first.sql
5) recreate cluster
6) psql < first.sql > first.sql.log
7) pg_dumpall > second.sql

I've attached the diff between first.sql and second.sql. Here's the
highlights:

@@ -15392,9 +15392,9 @@
 --
 
 CREATE TABLE public.test_type_diff2_c1 (
+int_two smallint,
 int_four bigint,
-int_eight bigint,
-int_two smallint
+int_eight bigint
 )
 INHERITS (public.test_type_diff2);
...

@@ -39194,10 +39194,10 @@
 -- Data for Name: b_star; Type: TABLE DATA; Schema: public; Owner: andres
 --
 
-COPY public.b_star (class, aa, bb, a) FROM stdin;
-b  3   mumble  \N
+COPY public.b_star (class, aa, a, bb) FROM stdin;
+b  3   \N  mumble
 b  4   \N  \N
-b  \N  bumble  \N
+b  \N  \N  bumble
 b  \N  \N  \N
 \.
 

@@ -323780,7 +323780,7 @@
 -- Data for Name: renamecolumnanother; Type: TABLE DATA; Schema: public; 
Owner: andres
 --
 
-COPY public.renamecolumnanother (d, a, c, w) FROM stdin;
+COPY public.renamecolumnanother (d, w, a, c) FROM stdin;
 \.
 
 

The primary / standby differences are caused by sequence logging. I
wonder if there's some good way to hide those, or to force them to be
the same between primary / standby, without hiding bugs.

Greetings,

Andres Freund
--- /tmp/first.sql	2020-08-12 15:01:11.810862861 -0700
+++ /tmp/second.sql	2020-08-12 15:02:05.877709572 -0700
@@ -15392,9 +15392,9 @@
 --
 
 CREATE TABLE public.test_type_diff2_c1 (
+int_two smallint,
 int_four bigint,
-int_eight bigint,
-int_two smallint
+int_eight bigint
 )
 INHERITS (public.test_type_diff2);
 
@@ -15406,9 +15406,9 @@
 --
 
 CREATE TABLE public.test_type_diff2_c2 (
-int_eight bigint,
 int_two smallint,
-int_four bigint
+int_four bigint,
+int_eight bigint
 )
 INHERITS (public.test_type_diff2);
 
@@ -39194,10 +39194,10 @@
 -- Data for Name: b_star; Type: TABLE DATA; Schema: public; Owner: andres
 --
 
-COPY public.b_star (class, aa, bb, a) FROM stdin;
-b	3	mumble	\N
+COPY public.b_star (class, aa, a, bb) FROM stdin;
+b	3	\N	mumble
 b	4	\N	\N
-b	\N	bumble	\N
+b	\N	\N	bumble
 b	\N	\N	\N
 \.
 
@@ -91102,10 +91102,10 @@
 -- Data for Name: c_star; Type: TABLE DATA; Schema: public; Owner: andres
 --
 
-COPY public.c_star (class, aa, cc, a) FROM stdin;
-c	5	hi mom	\N
+COPY public.c_star (class, aa, a, cc) FROM stdin;
+c	5	\N	hi mom
 c	6	\N	\N
-c	\N	hi paul	\N
+c	\N	\N	hi paul
 c	\N	\N	\N
 \.
 
@@ -91381,22 +91381,22 @@
 -- Data for Name: d_star; Type: TABLE DATA; Schema: public; Owner: andres
 --
 
-COPY public.d_star (class, aa, bb, cc, dd, a) FROM stdin;
-d	7	grumble	hi sunita	0	\N
-d	8	stumble	hi koko	\N	\N
-d	9	rumble	\N	1.1	\N
-d	10	\N	hi kristin	10.01	\N
-d	\N	crumble	hi boris	100.001	\N
-d	11	fumble	\N	\N	\N
-d	12	\N	hi avi	\N	\N
-d	13	\N	\N	1000.0001	\N
-d	\N	tumble	hi andrew	\N	\N
-d	\N	humble	\N	1.1	\N
-d	\N	\N	hi ginger	10.01	\N
+COPY public.d_star (class, aa, a, bb, cc, dd) FROM stdin;
+d	7	\N	grumble	hi sunita	0
+d	8	\N	stumble	hi koko	\N
+d	9	\N	rumble	\N	1.1
+d	10	\N	\N	hi kristin	10.01
+d	\N	\N	crumble	hi boris	100.001
+d	11	\N	fumble	\N	\N
+d	12	\N	\N	hi avi	\N
+d	13	\N	\N	\N	1000.0001
+d	\N	\N	tumble	hi andrew	\N
+d	\N	\N	humble	\N	1.1
+d	\N	\N	\N	hi ginger	10.01
 d	14	\N	\N	\N	\N
-d	\N	jumble	\N	\N	\N
-d	\N	\N	hi jolly	\N	\N
-d	\N	\N	\N	100.001	\N
+d	\N	\N	jumble	\N	\N
+d	\N	\N	\N	hi jolly	\N
+d	\N	\N	\N	\N	100.001
 d	\N	\N	\N	\N	\N
 \.
 
@@ -103095,14 +103095,14 @@
 -- Data for Name: e_star; Type: TABLE DATA; Schema: public; Owner: andres
 --
 
-COPY public.e_star (class, aa, cc, ee, e, a) FROM stdin;
-e	15	hi carol	-1	\N	\N
-e	16	hi bob	\N	\N	\N
-e	17	\N	-2	\N	\N
-e	\N	hi michelle	-3	\N	\N
+COPY public.e_star (class, aa, a, cc, ee, e) FROM stdin;
+e	15	\N	hi carol	-1	\N
+e	16	\N	hi bob	\N	\N
+e	17	\N	\N	-2	\N
+e	\N	\N	hi michelle	-3	\N
 e	18	\N	\N	\N	\N
-e	\N	hi elisa	\N	\N	\N
-e	\N	\N	-4	\N	\N
+e	\N	\N	hi elisa	\N	\N
+e	\N	\N	\N	-4	\N
 \.
 
 
@@ -103174,23 +103174,23 @@
 -- Data for Name: f_star; Type: TABLE DATA; Schema: public; Owner: andres
 --
 
-COPY public.f_star (class, aa, cc, ee, ff, f, e, a) FROM stdin;
-f	19	hi claire	-5	((1,3),(2,4))	10	\N	\N
-f	20	hi mike	-6	\N	10	\N	\N
-f	21	hi marcel	\N	((11,44),(22,55),(33,66))	10	\N	\N
-f	22	\N	-7	((111,555),(222,666),(333,777),(444,888))	10	\N	\N
-f	\N	hi keith	-8	((,),(,))	10	\N	\N
-f	24	hi marc	\N	\N	10	\N	\N

Re: Dependencies for partitioned indexes are still a mess

On 2020-Jul-15, Tom Lane wrote:

> Issue #1: "--clean" does not work
> 
> 1. createdb r2
> 2. pg_restore -d r2 regression.dump
> 3. pg_restore --clean -d r2 regression.dump
> 
> pg_restore: while PROCESSING TOC:
> pg_restore: from TOC entry 6606; 1259 35458 INDEX idxpart32_a_idx postgres
> pg_restore: error: could not execute query: ERROR:  cannot drop index 
> public.idxpart32_a_idx because index public.idxpart3_a_idx requires it
> HINT:  You can drop index public.idxpart3_a_idx instead.
> Command was: DROP INDEX public.idxpart32_a_idx;

I think this problem is just that we're trying to drop a partition index
that's not droppable.  This seems fixed with just leaving the dropStmt
empty, as in the attached.

One issue is that if you previously restored only that particular
partition and its indexes, but not the ATTACH command that would make it
dependent on the parent index, there would not be a DROP command to get
rid of it.  Do we need to be concerned about that case?  I'm inclined to
think not.

> (There seem to be some other problems as well, but most of the 54 complaints
> are related to partitioned indexes/constraints.)

In my run of it there's a good dozen remaining problems, all alike: we
do DROP TYPE widget CASCADE (which works) followed by DROP FUNCTION
public.widget_out(widget), which fails complaining that type widget
doesn't exist.  But in reality the error is innocuous, since that
function was dropped by the DROP TYPE CASCADE anyway.  You could say
that the same thing is happening with these noisy DROP INDEX of index
partitions: the complaints are right in that each partition's DROP INDEX
command doesn't actually work, but the indexes are dropped later anyway,
so the effect is the same.

> Issue #2: parallel restore does not work

Looking.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 8334445705c53bb0abff407ebb92ac67975a5898 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 12 Aug 2020 17:36:37 -0400
Subject: [PATCH] Don't dump DROP stmts for index partitions

They would fail to run in --clean mode anyway
---
 src/bin/pg_dump/pg_dump.c | 14 +-
 src/bin/pg_dump/pg_dump.h |  2 +-
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9c8436dde6..42391b0b2c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -16416,7 +16416,8 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 			  qindxname);
 		}
 
-		appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);
+		if (indxinfo->parentidx == InvalidOid)
+			appendPQExpBuffer(delq, "DROP INDEX %s;\n", qqindxname);
 
 		if (indxinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
 			ArchiveEntry(fout, indxinfo->dobj.catId, indxinfo->dobj.dumpId,
@@ -16695,10 +16696,13 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 	"pg_catalog.pg_class", "INDEX",
 	fmtQualifiedDumpable(indxinfo));
 
-		appendPQExpBuffer(delq, "ALTER %sTABLE ONLY %s ", foreign,
-		  fmtQualifiedDumpable(tbinfo));
-		appendPQExpBuffer(delq, "DROP CONSTRAINT %s;\n",
-		  fmtId(coninfo->dobj.name));
+		if (indxinfo->parentidx == InvalidOid)
+		{
+			appendPQExpBuffer(delq, "ALTER %sTABLE ONLY %s ", foreign,
+			  fmtQualifiedDumpable(tbinfo));
+			appendPQExpBuffer(delq, "DROP CONSTRAINT %s;\n",
+			  fmtId(coninfo->dobj.name));
+		}
 
 		tag = psprintf("%s %s", tbinfo->dobj.name, coninfo->dobj.name);
 
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index da97b731b1..2f051b83d9 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -368,7 +368,7 @@ typedef struct _indxInfo
  * contains both key and nonkey attributes */
 	bool		indisclustered;
 	bool		indisreplident;
-	Oid			parentidx;		/* if partitioned, parent index OID */
+	Oid			parentidx;		/* if a partition, parent index OID */
 	SimplePtrList partattaches; /* if partitioned, partition attach objects */
 
 	/* if there is an associated constraint object, its dumpId: */
-- 
2.20.1

Re: Parallel query hangs after a smart shutdown is issued

On Thu, Aug 13, 2020 at 8:59 AM Tom Lane  wrote:
> I wrote:
> > Oh, excellent point!  I'd not thought to look at tests of the Shutdown
> > variable, but yeah, those should be <= SmartShutdown if we want autovac
> > to continue to operate in this state.
>
> On looking closer, there's another problem: setting start_autovac_launcher
> isn't enough to get the AV launcher to run, because ServerLoop() won't
> launch it except in PM_RUN state.  Likewise, the other "relaunch a dead
> process" checks in ServerLoop() need to be generalized to support
> relaunching background processes while we're waiting out the foreground
> clients.  So that leads me to the attached v3.  I had to re-instantiate
> PM_WAIT_READONLY as an alternate state to PM_WAIT_CLIENTS; these states
> are about the same so far as PostmasterStateMachine is concerned, but
> some of the should-we-launch-FOO checks care about the difference.

I think we also need:

@@ -2459,6 +2459,9 @@ canAcceptConnections(int backend_type)
{
if (pmState == PM_WAIT_BACKUP)
result = CAC_WAITBACKUP;/* allow
superusers only */
+   else if (Shutdown <= SmartShutdown &&
+backend_type == BACKEND_TYPE_AUTOVAC)
+   result = CAC_OK;
else if (Shutdown > NoShutdown)
return CAC_SHUTDOWN;/* shutdown is pending */
else if (!FatalError &&


Retesting the original complaint, I think we need:

@@ -5911,11 +5912,11 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
case PM_SHUTDOWN_2:
case PM_SHUTDOWN:
case PM_WAIT_BACKENDS:
-   case PM_WAIT_READONLY:
-   case PM_WAIT_CLIENTS:
case PM_WAIT_BACKUP:
break;

+   case PM_WAIT_READONLY:
+   case PM_WAIT_CLIENTS:
case PM_RUN:
if (start_time == BgWorkerStart_RecoveryFinished)
return true;

Re: Parallel query hangs after a smart shutdown is issued

I wrote:
> Oh, excellent point!  I'd not thought to look at tests of the Shutdown
> variable, but yeah, those should be <= SmartShutdown if we want autovac
> to continue to operate in this state.

On looking closer, there's another problem: setting start_autovac_launcher
isn't enough to get the AV launcher to run, because ServerLoop() won't
launch it except in PM_RUN state.  Likewise, the other "relaunch a dead
process" checks in ServerLoop() need to be generalized to support
relaunching background processes while we're waiting out the foreground
clients.  So that leads me to the attached v3.  I had to re-instantiate
PM_WAIT_READONLY as an alternate state to PM_WAIT_CLIENTS; these states
are about the same so far as PostmasterStateMachine is concerned, but
some of the should-we-launch-FOO checks care about the difference.

The various pmState tests are getting messy enough to cry out for
refactorization, but I've not attempted that here.  There's enough
variance in the conditions for launching different subprocesses that
I'm not very sure what would be a nicer-looking way to write them.

regards, tom lane

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
stop mode shuts down the server that is running in
the specified data directory.  Three different
shutdown methods can be selected with the -m
-   option.  Smart mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  Smart mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
If the server is in hot standby, recovery and streaming replication
will be terminated once all clients have disconnected.
Fast mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..d134dade53 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER	0x0008	/* bgworker process */
 #define BACKEND_TYPE_ALL		0x000F	/* OR of all the above */
 
-#define BACKEND_TYPE_WORKER		(BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -319,10 +317,10 @@ static bool FatalError = false; /* T if recovering from backend crash */
  *
  * Notice that this state variable does not distinguish *why* we entered
  * states later than PM_RUN --- Shutdown and FatalError must be consulted
- * to find that out.  FatalError is never true in PM_RECOVERY_* or PM_RUN
- * states, nor in PM_SHUTDOWN states (because we don't enter those states
- * when trying to recover from a crash).  It can be true in PM_STARTUP state,
- * because we don't clear it until we've successfully started WAL redo.
+ * to find that out.  FatalError is never true in PM_RECOVERY, PM_HOT_STANDBY,
+ * or PM_RUN states, nor in PM_SHUTDOWN states (because we don't enter those
+ * states when trying to recover from a crash).  It can be true in PM_STARTUP
+ * state, because we don't clear it until we've successfully started WAL redo.
  */
 typedef enum
 {
@@ -332,8 +330,9 @@ typedef enum
 	PM_HOT_STANDBY,/* in hot standby mode */
 	PM_RUN,		/* normal "database is alive" state */
 	PM_WAIT_BACKUP,/* waiting for online backup mode to end */
-	PM_WAIT_READONLY,			/* waiting for read only backends to exit */
-	PM_WAIT_BACKENDS,			/* waiting for live backends to exit */
+	PM_WAIT_CLIENTS,			/* waiting for normal backends to exit */
+	PM_WAIT_READONLY,			/* likewise, when we had been in a RO state */
+	PM_WAIT_BACKENDS,			/* waiting for all backends to exit */
 	PM_SHUTDOWN,/* waiting for checkpointer to do shutdown
  * ckpt */
 	PM_SHUTDOWN_2,/* waiting for archiver and walsenders to
@@ -437,9 +436,10 @@ static void InitPostmasterDeathWatchHandle(void);
  * even during recovery.
  */
 #define PgArchStartupAllowed()	\
-	((XLogArchivingActive() && pmState == PM_RUN) ||	\
+	((XLogArchivingActive() &&	\
+	  (pmState == PM_RUN || pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_CLIENTS)) ||	\
 	 (XLogArchivingAlways() &&	\
-	  (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY)))
+	  (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY)))
 
 #ifdef EXEC_BACKEND
 
@@ -1750,7 +1750,8 @@ ServerLoop(void)
 		 * fails, we'll just try again later.  Likewise for the checkpointer.
 		 */
 		if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-			pmState == PM_HOT_STANDBY)
+			pmState == PM_HOT_STANDBY || pmState == PM_WAIT_BACKUP ||
+			pmState == PM_WAIT_CLIENTS || pmState == PM_WAIT_READONLY)
 		{
 			if (CheckpointerPID == 0)

Re: [BUG] Error in BRIN summarization

On 2020-Aug-12, Alvaro Herrera wrote:

> 'anyvisible' mode is not required AFAICS; reading the code, I think this
> could also hit REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY, which
> do not use that flag.  I didn't try to reproduce it there, though.
> Anyway, I'm going to remove that Assert() I added.

So this is what I propose as the final form of the fix.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From e0abe5d957155285980a40fb33c192100699e8c0 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 12 Aug 2020 14:02:58 -0400
Subject: [PATCH v4] fix HOT tuples while scanning for index builds

---
 src/backend/access/heap/heapam_handler.c | 20 
 src/backend/access/heap/pruneheap.c  |  5 +++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 267a6ee25a..ba44e30035 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1324,6 +1324,12 @@ heapam_index_build_range_scan(Relation heapRelation,
 		 * buffer continuously while visiting the page, so no pruning
 		 * operation can occur either.
 		 *
+		 * In cases with only ShareUpdateExclusiveLock on the table, it's
+		 * possible for some HOT tuples to appear that we didn't know about
+		 * when we first read the page.  To handle that case, we re-obtain the
+		 * list of root offsets when a HOT tuple points to a root item that we
+		 * don't know about.
+		 *
 		 * Also, although our opinions about tuple liveness could change while
 		 * we scan the page (due to concurrent transaction commits/aborts),
 		 * the chain root locations won't, so this info doesn't need to be
@@ -1625,6 +1631,20 @@ heapam_index_build_range_scan(Relation heapRelation,
 
 			offnum = ItemPointerGetOffsetNumber(>t_self);
 
+			/*
+			 * If a HOT tuple points to a root that we don't know
+			 * about, obtain root items afresh.  If that still fails,
+			 * report it as corruption.
+			 */
+			if (root_offsets[offnum - 1] == InvalidOffsetNumber)
+			{
+Page	page = BufferGetPage(hscan->rs_cbuf);
+
+LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);
+heap_get_root_tuples(page, root_offsets);
+LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_UNLOCK);
+			}
+
 			if (!OffsetNumberIsValid(root_offsets[offnum - 1]))
 ereport(ERROR,
 		(errcode(ERRCODE_DATA_CORRUPTED),
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 256df4de10..7e3d44dfd6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -732,7 +732,7 @@ heap_page_prune_execute(Buffer buffer,
  * root_offsets[k - 1] = j.
  *
  * The passed-in root_offsets array must have MaxHeapTuplesPerPage entries.
- * We zero out all unused entries.
+ * Unused entries are filled with InvalidOffsetNumber (zero).
  *
  * The function must be called with at least share lock on the buffer, to
  * prevent concurrent prune operations.
@@ -747,7 +747,8 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
 	OffsetNumber offnum,
 maxoff;
 
-	MemSet(root_offsets, 0, MaxHeapTuplesPerPage * sizeof(OffsetNumber));
+	MemSet(root_offsets, InvalidOffsetNumber,
+		   MaxHeapTuplesPerPage * sizeof(OffsetNumber));
 
 	maxoff = PageGetMaxOffsetNumber(page);
 	for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum))
-- 
2.20.1

Re: use pg_get_functiondef() in pg_dump

2020-08-12 Thread Robert Haas

On Wed, Aug 12, 2020 at 4:25 AM Peter Eisentraut
 wrote:
> Here is a patch to have pg_dump use pg_get_functiondef() instead of
> assembling the CREATE FUNCTION/PROCEDURE commands itself.  This should
> save on maintenance effort in the future.  It's also a prerequisite for
> being able to dump functions with SQL-standard function body discussed
> in [0].
>
> pg_get_functiondef() was meant for psql's \ef command, so its defaults
> are slightly different from what pg_dump would like, so this adds a few
> optional parameters for tweaking the behavior.  The naming of the
> parameters is up for discussion.

One problem with this, which I think Tom pointed out before, is that
it might make it to handle some forward-compatibility problems. In
other words, if something that the server is generating needs to be
modified for compatibility with a future release, it's not easy to do
that. Like if we needed to quote something we weren't previously
quoting, for example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel query hangs after a smart shutdown is issued

Thomas Munro  writes:
> On Thu, Aug 13, 2020 at 6:00 AM Tom Lane  wrote:
>> One other thing I changed here was to remove PM_WAIT_READONLY from the
>> set of states in which we'll allow promotion to occur or a new walreceiver
>> to start.  I'm not convinced that either of those behaviors aren't
>> bugs; although if someone thinks they're right, we can certainly put
>> back PM_WAIT_CLIENTS in those checks.  (But, for example, it does not
>> appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
>> Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
>> like confusingly dead code to me.  If we do want to allow restarting
>> the walreceiver in this state, the Shutdown condition needs fixed.)

> If a walreceiver is allowed to run, why should it not be allowed to
> restart?

I'd come to about the same conclusion after thinking more, so v2
attached undoes that change.  I think putting off promotion is fine
though; it'll get handled at the next postmaster start.  (It looks
like the state machine would just proceed to exit anyway if we allowed
the promotion, but that's a hard-to-test state transition that we could
do without.)

> Yeah, I suppose that other test'd need to be Shutdown <=
> SmartShutdown, just like we do in SIGHUP_handler().  Looking at other
> places where we test Shutdown == NoShutdown, one that jumps out is the
> autovacuum wraparound defence stuff and the nearby
> PMSIGNAL_START_AUTOVAC_WORKER code.

Oh, excellent point!  I'd not thought to look at tests of the Shutdown
variable, but yeah, those should be <= SmartShutdown if we want autovac
to continue to operate in this state.

I also noticed that where reaper() is dealing with startup process
exit(3), it unconditionally sets Shutdown = SmartShutdown which seems
pretty bogus; that variable's value should never be allowed to decrease,
but this could cause it.  In the attached I did

 StartupStatus = STARTUP_NOT_RUNNING;
-Shutdown = SmartShutdown;
+Shutdown = Max(Shutdown, SmartShutdown);
 TerminateChildren(SIGTERM);

But given that it's forcing immediate termination of all backends,
I wonder if that's not more like a FastShutdown?  (Scary here is
that the coverage report shows we're not testing this path, so who
knows if it works at all.)

regards, tom lane

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
stop mode shuts down the server that is running in
the specified data directory.  Three different
shutdown methods can be selected with the -m
-   option.  Smart mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  Smart mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
If the server is in hot standby, recovery and streaming replication
will be terminated once all clients have disconnected.
Fast mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..e8ad4b67a3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER	0x0008	/* bgworker process */
 #define BACKEND_TYPE_ALL		0x000F	/* OR of all the above */
 
-#define BACKEND_TYPE_WORKER		(BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -319,7 +317,7 @@ static bool FatalError = false; /* T if recovering from backend crash */
  *
  * Notice that this state variable does not distinguish *why* we entered
  * states later than PM_RUN --- Shutdown and FatalError must be consulted
- * to find that out.  FatalError is never true in PM_RECOVERY_* or PM_RUN
+ * to find that out.  FatalError is never true in PM_RECOVERY or PM_RUN
  * states, nor in PM_SHUTDOWN states (because we don't enter those states
  * when trying to recover from a crash).  It can be true in PM_STARTUP state,
  * because we don't clear it until we've successfully started WAL redo.
@@ -332,8 +330,8 @@ typedef enum
 	PM_HOT_STANDBY,/* in hot standby mode */
 	PM_RUN,		/* normal "database is alive" state */
 	PM_WAIT_BACKUP,/* waiting for online backup mode to end */
-	PM_WAIT_READONLY,			/* waiting for read only backends to exit */
-	PM_WAIT_BACKENDS,			/* waiting for live backends to exit */
+	PM_WAIT_CLIENTS,			/* waiting for normal backends to exit */
+	PM_WAIT_BACKENDS,			/* waiting for all backends to exit */
 	PM_SHUTDOWN,/* waiting for checkpointer to do shutdown
  * ckpt */
 	PM_SHUTDOWN_2,/* waiting for archiver and

Re: Parallel query hangs after a smart shutdown is issued

On Thu, Aug 13, 2020 at 6:00 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
> >  wrote:
> >> After a smart shutdown is issued(with pg_ctl), run a parallel query,
> >> then the query hangs.
>
> > Yeah, the current situation is not good.  I think your option 2 sounds
> > better, because the documented behaviour of smart shutdown is that it
> > "lets existing sessions end their work normally".  I think that means
> > that a query that is already running or allowed to start should be
> > able to start new workers and not have its existing workers
> > terminated.  Arseny Sher wrote a couple of different patches to try
> > that last year, but they fell through the cracks:
> > https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com
>
> I already commented on this in the other thread that Bharath started [1].
> I think the real issue here is why is the postmaster's SIGTERM handler
> doing *anything* other than disallowing new connections?  It seems quite
> premature to kill support processes of any sort, not only parallel
> workers.  The documentation says existing clients are allowed to end
> their work, not that their performance is going to be crippled until
> they end.

Right.  It's pretty strange that during smart shutdown, you could run
for hours with no autovacuum, walwriter, bgwriter.  I guess Arseny and
I were looking for a minimal change to fix a bug, but clearly there's a
more general problem and this change works out cleaner anyway.

> So I looked at moving the kills of all the support processes to happen
> after we detect that there are no remaining regular backends, and it
> seems to not be too hard.  I realized that the existing PM_WAIT_READONLY
> state is doing that already, but just for a subset of support processes
> that it thinks might be active in hot standby mode.  So what I did in the
> attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the
> right thing in either regular or hot standby mode.

Make sense, works as expected and passes check-world.

> One other thing I changed here was to remove PM_WAIT_READONLY from the
> set of states in which we'll allow promotion to occur or a new walreceiver
> to start.  I'm not convinced that either of those behaviors aren't
> bugs; although if someone thinks they're right, we can certainly put
> back PM_WAIT_CLIENTS in those checks.  (But, for example, it does not
> appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
> Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
> like confusingly dead code to me.  If we do want to allow restarting
> the walreceiver in this state, the Shutdown condition needs fixed.)

If a walreceiver is allowed to run, why should it not be allowed to
restart?  Yeah, I suppose that other test'd need to be Shutdown <=
SmartShutdown, just like we do in SIGHUP_handler().  Looking at other
places where we test Shutdown == NoShutdown, one that jumps out is the
autovacuum wraparound defence stuff and the nearby
PMSIGNAL_START_AUTOVAC_WORKER code.

ltree_plpython failure test on Cygwin for 12.4 test

2020-08-12 Thread Marco Atzeri


I am finally trying to move from python2.7 to python 3.x

for both 3.7 and 3.8 I have (attached log):

2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython 
ERROR:  incompatible library "/pub/devel/postgresql/prov
a38/postgresql-12.4-1.x86_64/build/tmp_install/usr/lib/postgresql/plpython3.dll": 
missing magic block
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython 
HINT:  Extension libraries are required to use the PG_MO

DULE_MAGIC macro.
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython 
STATEMENT:  CREATE EXTENSION ltree_plpython3u CASCADE;
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython 
ERROR:  language "plpython3u" does not exist
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython 
HINT:  Use CREATE EXTENSION to load the language into th

e database.
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython 
STATEMENT:  CREATE FUNCTION test1(val ltree) RETURNS int

LANGUAGE plpython3u
TRANSFORM FOR TYPE ltree
AS $$
plpy.info(repr(val))
return len(val)
$$;


Only the python tests fail

 $ grep FAIL postgresql-12.4-1-check.log
test python3/hstore_plpython  ... FAILED  423 ms
test python3/jsonb_plpython   ... FAILED  172 ms
test python3/ltree_plpython   ... FAILED  163 ms

never had problem with python2.7

Suggestion ?

Regards
Marco

2020-08-12 18:35:43.891 CEST [10397] LOG:  starting PostgreSQL 12.4 on 
x86_64-pc-cygwin, compiled by gcc (GCC) 9.3.0, 64-bit
2020-08-12 18:35:43.905 CEST [10397] LOG:  listening on Unix socket 
"/tmp/pg_regress-RPb7cl/.s.PGSQL.54468"
2020-08-12 18:35:43.987 CEST [10400] LOG:  database system was shut down at 
2020-08-12 18:35:43 CEST
2020-08-12 18:35:44.017 CEST [10401] [unknown] FATAL:  the database system is 
starting up
2020-08-12 18:35:44.192 CEST [10397] LOG:  database system is ready to accept 
connections
2020-08-12 18:35:45.366 CEST [10402] LOG:  checkpoint starting: immediate force 
wait flush-all
2020-08-12 18:35:45.370 CEST [10402] LOG:  checkpoint complete: wrote 3 buffers 
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.000 
s, total=0.004 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1 
kB, estimate=1 kB
2020-08-12 18:35:46.503 CEST [10402] LOG:  checkpoint starting: immediate force 
wait
2020-08-12 18:35:46.507 CEST [10402] LOG:  checkpoint complete: wrote 0 buffers 
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 
s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 
kB, estimate=1 kB
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython ERROR:  
incompatible library 
"/pub/devel/postgresql/prova38/postgresql-12.4-1.x86_64/build/tmp_install/usr/lib/postgresql/plpython3.dll":
 missing magic block
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython HINT:  
Extension libraries are required to use the PG_MODULE_MAGIC macro.
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython 
STATEMENT:  CREATE EXTENSION ltree_plpython3u CASCADE;
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython ERROR:  
language "plpython3u" does not exist
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython HINT:  
Use CREATE EXTENSION to load the language into the database.
2020-08-12 18:35:47.433 CEST [10418] pg_regress/python3/ltree_plpython 
STATEMENT:  CREATE FUNCTION test1(val ltree) RETURNS int
LANGUAGE plpython3u
TRANSFORM FOR TYPE ltree
AS $$
plpy.info(repr(val))
return len(val)
$$;
2020-08-12 18:35:47.434 CEST [10418] pg_regress/python3/ltree_plpython ERROR:  
function test1(ltree) does not exist at character 8
2020-08-12 18:35:47.434 CEST [10418] pg_regress/python3/ltree_plpython HINT:  
No function matches the given name and argument types. You might need to add 
explicit type casts.
2020-08-12 18:35:47.434 CEST [10418] pg_regress/python3/ltree_plpython 
STATEMENT:  SELECT test1('aa.bb.cc'::ltree);
2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython ERROR:  
language "plpython3u" does not exist
2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython HINT:  
Use CREATE EXTENSION to load the language into the database.
2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython 
STATEMENT:  CREATE FUNCTION test1n(val ltree) RETURNS int
LANGUAGE plpython3u
TRANSFORM FOR TYPE ltree
AS $$
plpy.info(repr(val))
return len(val)
$$;
2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython ERROR:  
function test1n(ltree) does not exist at character 8
2020-08-12 18:35:47.435 CEST [10418] pg_regress/python3/ltree_plpython HINT:  
No function matches the given name and argument types. You might need to add 
explicit type casts.
2020-08-12 18:35:47.435 CEST [10418]

Re: [BUG] Error in BRIN summarization

On 2020-Aug-11, Alvaro Herrera wrote:

> A much more troubling thought is what happens if the range is
> desummarized, then the index item is used for the summary of a different
> range.  Then the index might end up returning corrupt results.

Actually, this is not a concern because the brin tuple's bt_blkno is
rechecked before returning it, and if it doesn't match what we're
searching, the loop is restarted.  It becomes an infinite loop problem
if the revmap is pointing to a tuple that's labelled with a different
range's blkno.  So I think my patch as posted is a sufficient fix for
this problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel query hangs after a smart shutdown is issued

Thomas Munro  writes:
> On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
>  wrote:
>> After a smart shutdown is issued(with pg_ctl), run a parallel query,
>> then the query hangs.

> Yeah, the current situation is not good.  I think your option 2 sounds
> better, because the documented behaviour of smart shutdown is that it
> "lets existing sessions end their work normally".  I think that means
> that a query that is already running or allowed to start should be
> able to start new workers and not have its existing workers
> terminated.  Arseny Sher wrote a couple of different patches to try
> that last year, but they fell through the cracks:
> https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com

I already commented on this in the other thread that Bharath started [1].
I think the real issue here is why is the postmaster's SIGTERM handler
doing *anything* other than disallowing new connections?  It seems quite
premature to kill support processes of any sort, not only parallel
workers.  The documentation says existing clients are allowed to end
their work, not that their performance is going to be crippled until
they end.

So I looked at moving the kills of all the support processes to happen
after we detect that there are no remaining regular backends, and it
seems to not be too hard.  I realized that the existing PM_WAIT_READONLY
state is doing that already, but just for a subset of support processes
that it thinks might be active in hot standby mode.  So what I did in the
attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the
right thing in either regular or hot standby mode.

One other thing I changed here was to remove PM_WAIT_READONLY from the
set of states in which we'll allow promotion to occur or a new walreceiver
to start.  I'm not convinced that either of those behaviors aren't
bugs; although if someone thinks they're right, we can certainly put
back PM_WAIT_CLIENTS in those checks.  (But, for example, it does not
appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with
Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks
like confusingly dead code to me.  If we do want to allow restarting
the walreceiver in this state, the Shutdown condition needs fixed.)

regards, tom lane

[1] https://www.postgresql.org/message-id/65189.1597181322%40sss.pgh.pa.us

diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index e31275a04e..3946fa52ea 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -185,8 +185,8 @@ PostgreSQL documentation
stop mode shuts down the server that is running in
the specified data directory.  Three different
shutdown methods can be selected with the -m
-   option.  Smart mode waits for all active
-   clients to disconnect and any online backup to finish.
+   option.  Smart mode disallows new connections, then waits
+   for all existing clients to disconnect and any online backup to finish.
If the server is in hot standby, recovery and streaming replication
will be terminated once all clients have disconnected.
Fast mode (the default) does not wait for clients to disconnect and
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 38e2c16ac2..790948a4b2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -148,8 +148,6 @@
 #define BACKEND_TYPE_BGWORKER	0x0008	/* bgworker process */
 #define BACKEND_TYPE_ALL		0x000F	/* OR of all the above */

-#define BACKEND_TYPE_WORKER		(BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER)
-
 /*
  * List of active backends (or child processes anyway; we don't actually
  * know whether a given child has become a backend or is still in the
@@ -332,8 +330,8 @@ typedef enum
 	PM_HOT_STANDBY,/* in hot standby mode */
 	PM_RUN,		/* normal "database is alive" state */
 	PM_WAIT_BACKUP,/* waiting for online backup mode to end */
-	PM_WAIT_READONLY,			/* waiting for read only backends to exit */
-	PM_WAIT_BACKENDS,			/* waiting for live backends to exit */
+	PM_WAIT_CLIENTS,			/* waiting for normal backends to exit */
+	PM_WAIT_BACKENDS,			/* waiting for all backends to exit */
 	PM_SHUTDOWN,/* waiting for checkpointer to do shutdown
  * ckpt */
 	PM_SHUTDOWN_2,/* waiting for archiver and walsenders to
@@ -2793,35 +2791,19 @@ pmdie(SIGNAL_ARGS)
 			sd_notify(0, "STOPPING=1");
 #endif

-			if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
-			{
-/* autovac workers are told to shut down immediately */
-/* and bgworkers too; does this need tweaking? */
-SignalSomeChildren(SIGTERM,
-   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
-/* and the autovac launcher too */
-if (AutoVacPID != 0)
-	signal_child(AutoVacPID, SIGTERM);
-/* and the bgwriter

Re: EDB builds Postgres 13 with an obsolete ICU version

2020-08-12 Thread Jaime Casanova

On Tue, 11 Aug 2020 at 13:45, Thomas Kellerer  wrote:
>
> Jaime Casanova schrieb am 11.08.2020 um 20:39:
> >> As a follow-up to bug #16570 [1] and other previous discussions
> >> on the mailing-lists, I'm checking out PG13 beta for Windows
> >> from:
> >>   https://www.enterprisedb.com/postgresql-early-experience
> >> and it ships with the same obsolete ICU 53 that was used
> >> for PG 10,11,12.
> >> Besides not having the latest Unicode features and fixes, ICU 53
> >> ignores the BCP 47 tags syntax in collations used as examples
> >> in Postgres documentation, which leads to confusion and
> >> false bug reports.
> >> The current version is ICU 67.
> >>
> >
> > Sadly, that is managed by EDB and not by the community.
> >
> > You can try 
> > https://www.2ndquadrant.com/en/resources/postgresql-installer-2ndquadrant/
> > which uses ICU-62.2, is not the latest but should allow you to follow
> > the examples in the documentation.
>
>
> One of the reasons I prefer the EDB builds is, that they provide a ZIP file 
> without the installer overhead.
> Any chance 2ndQuadrant can supply something like that as well?
>

i don't think so, an unattended install mode is the closest

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel query hangs after a smart shutdown is issued

On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
 wrote:
> After a smart shutdown is issued(with pg_ctl), run a parallel query,
> then the query hangs. The postmaster doesn't inform backends about the
> smart shutdown(see pmdie()  ->  SIGTERM -> BACKEND_TYPE_NORMAL are not
> informed), so if they request parallel workers, the postmaster is
> unable to fork any workers as it's status(pmState) gets changed to
> PM_WAIT_BACKENDS(see maybe_start_bgworkers() -->
> bgworker_should_start_now() returns false).
>
> Few ways we could solve this:
> 1. Do we want to disallow parallelism when there is a pending smart
> shutdown? - If yes, then, we can let the postmaster know the regular
> backends whenever a smart shutdown is received and the backends use
> this info to not consider parallelism. If we use SIGTERM to notify,
> since the backends have die() as handlers, they just cancel the
> queries which is again an inconsistent behaviour[1]. Would any other
> signal like SIGUSR2(I think it's currently ignored by backends) be
> used here? If the signals are overloaded, can we multiplex SIGTERM
> similar to SIGUSR1? If we don't want to use signals at all, the
> postmaster can make an entry of it's status in bg worker shared memory
> i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can
> simply return, without requesting the postmaster for parallel workers.
>
> 2. If we want to allow parallelism, then, we can tweak
> bgworker_should_start_now(), detect that the pending bg worker fork
> requests are for parallelism, and let the postmaster start the
> workers.
>
> Thoughts?

Hello Bharath,

Yeah, the current situation is not good.  I think your option 2 sounds
better, because the documented behaviour of smart shutdown is that it
"lets existing sessions end their work normally".  I think that means
that a query that is already running or allowed to start should be
able to start new workers and not have its existing workers
terminated.  Arseny Sher wrote a couple of different patches to try
that last year, but they fell through the cracks:

https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com

pg_stat_statements and "IN" conditions

2020-08-12 Thread Dmitry Dolgov

I would like to start another thread to follow up on [1], mostly to bump up the
topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr in
queries like:

SELECT something FROM table WHERE col IN (1, 2, 3, ...)

The current implementation produces different jumble hash for every different
number of arguments for essentially the same query. Unfortunately a lot of ORMs
like to generate these types of queries, which in turn leads to
pg_stat_statements pollution. Ideally we want to prevent this and have only one
record for such a query.

As the result of [1] I've identified two highlighted approaches to improve this
situation:

* Reduce the generated ArrayExpr to an array Const immediately, in cases where
all the inputs are Consts.

* Make repeating Const to contribute nothing to the resulting hash.

I've tried to prototype both approaches to find out pros/cons and be more
specific. Attached patches could not be considered a completed piece of work,
but they seem to work, mostly pass the tests and demonstrate the point. I would
like to get some high level input about them and ideally make it clear what is
the preferred solution to continue with.

# Reducing ArrayExpr to an array Const

IIUC this requires producing a Const with ArrayType constvalue in
transformAExprIn for ScalarArrayOpExpr. This could be a general improvement,
since apparently it's being done later anyway. But it deals only with Const,
which leaves more on the table, e.g. Params and other similar types of
duplication we observe when repeating constants are wrapped into VALUES.

Another point here is that it's quite possible this approach will still require
corresponding changes in pg_stat_statements, since just preventing duplicates
to show also loses the information. Ideally we also need to have some
understanding how many elements are actually there and display it, e.g. in
cases when there is just one outlier query that contains a huge IN list.

# Contribute nothing to the hash

I guess there could be multiple ways of doing this, but the first idea I had in
mind is to skip jumbling when necessary. At the same time it can be implemented
more centralized for different types of queries (although in the attached patch
there are only Const & Values). In the simplest case we just identify sequence
of constants of the same type, which just ignores any other cases when stuff is
mixed. But I believe it's something that could be considered a rare corner case
and it's better to start with the simplest solution.

Having said that I believe the second approach of contributing nothing to the
hash sounds more appealing, but would love to hear other opinions.

[1]:
https://www.postgresql.org/message-id/flat/CAF42k%3DJCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ%40mail.gmail.com

0001-Reduce-ArrayExpr-into-const-array.patch
Description: Binary data

0001-Limit-jumbling-for-repeating-constants.patch
Description: Binary data

Re: [BUG] Error in BRIN summarization

On 2020-Aug-11, Alvaro Herrera wrote:

> I think this is more complicated than necessary.  It seems easier to
> solve this problem by just checking whether the given root pointer is
> set to InvalidOffsetNumber, which is already done in the existing coding
> of heap_get_root_tuples (only they spell it "0" rather than
> InvalidOffsetNumber, which I propose to change).  AFAIR this should only
> happen in the 'anyvisible' mode, so I added that in an assert.

'anyvisible' mode is not required AFAICS; reading the code, I think this
could also hit REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY, which
do not use that flag.  I didn't try to reproduce it there, though.
Anyway, I'm going to remove that Assert() I added.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [BUG] Error in BRIN summarization

On 2020-Jul-28, Peter Geoghegan wrote:

> On Mon, Jul 27, 2020 at 10:25 AM Alvaro Herrera
>  wrote:
> > (I was also considering whether it needs to be a loop to reobtain root
> > tuples, in case a concurrent transaction can create a new item while
> > we're checking that item; but I don't think that can really happen for
> > one individual tuple.)
> 
> I wonder if something like that is the underlying problem in a recent
> problem case involving a "REINDEX index
> pg_class_tblspc_relfilenode_index" command that runs concurrently with
> the regression tests:
> 
> https://postgr.es/m/CAH2-WzmBxu4o=pmsniur+bwhqcgcmv_aolkuk6buu7nga6e...@mail.gmail.com
> 
> We see a violation of the HOT invariant in this case, though only for
> a system catalog index, and only in fairly particular circumstances
> involving concurrency.

Hmm.  As far as I understand, the bug Anastasia reports can only hit an
index build that occurs concurrently to heap updates; and that cannot
happen for a regular index build, only for CREATE INDEX CONCURRENTLY and
REINDEX CONCURRENTLY.  So unless I miss something, it's not related to
that other bug.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Parallel query hangs after a smart shutdown is issued

2020-08-12 Thread Bharath Rupireddy

Hi,

After a smart shutdown is issued(with pg_ctl), run a parallel query,
then the query hangs. The postmaster doesn't inform backends about the
smart shutdown(see pmdie()  ->  SIGTERM -> BACKEND_TYPE_NORMAL are not
informed), so if they request parallel workers, the postmaster is
unable to fork any workers as it's status(pmState) gets changed to
PM_WAIT_BACKENDS(see maybe_start_bgworkers() -->
bgworker_should_start_now() returns false).

Few ways we could solve this:
1. Do we want to disallow parallelism when there is a pending smart
shutdown? - If yes, then, we can let the postmaster know the regular
backends whenever a smart shutdown is received and the backends use
this info to not consider parallelism. If we use SIGTERM to notify,
since the backends have die() as handlers, they just cancel the
queries which is again an inconsistent behaviour[1]. Would any other
signal like SIGUSR2(I think it's currently ignored by backends) be
used here? If the signals are overloaded, can we multiplex SIGTERM
similar to SIGUSR1? If we don't want to use signals at all, the
postmaster can make an entry of it's status in bg worker shared memory
i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can
simply return, without requesting the postmaster for parallel workers.

2. If we want to allow parallelism, then, we can tweak
bgworker_should_start_now(), detect that the pending bg worker fork
requests are for parallelism, and let the postmaster start the
workers.

Thoughts?

Note: this issue is identified while working on [1]

[1] - 
https://www.postgresql.org/message-id/CALj2ACWTAQ2uWgj4yRFLQ6t15MMYV_uc3GCT5F5p8R9pzrd7yQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: 回复：how to create index concurrently on partitioned table

On Wed, Aug 12, 2020 at 12:28:20AM -0500, Justin Pryzby wrote:
> On Wed, Aug 12, 2020 at 01:54:38PM +0900, Michael Paquier wrote:
>> +++ b/src/backend/catalog/index.c
>> @@ -3661,20 +3662,12 @@ reindex_relation(Oid relid, int flags, int options)
>> +elog(ERROR, "unsupported relation kind for relation \"%s\"",
>> + RelationGetRelationName(rel));
> 
> I guess it should show the relkind(%c) in the message, like these:

Sure, but I don't see much the point in adding the relkind here
knowing that we know which one it is.

> ISTM reindex_index is missing that, too:
> 
> 8b08f7d4820fd7a8ef6152a9dd8c6e3cb01e5f99
> +   if (iRel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
> +   elog(ERROR, "unsupported relation kind for index \"%s\"",
> +RelationGetRelationName(iRel));

The error string does not follow the usual project style either, so I
have updated both.

>>
>> -   Reindexing partitioned tables or partitioned indexes is not supported.
>> -   Each individual partition can be reindexed separately instead.
>> +   Reindexing partitioned indexes or partitioned tables is supported
>> +   with respectively REINDEX INDEX or
>> +   REINDEX TABLE.
> 
> Should say "..with REINDEX INDEX or REINDEX TABLE, respectively".
>
>> + Each partition of the partitioned
>> +   relation defined is rebuilt in its own transaction.
> 
> => Each partition of the specified partitioned relation is reindexed in a
> separate transaction.

Thanks, good idea.

I have been able to work more on this patch today, and finally added
an error context for the transaction block check, as that's cleaner.
In my manual tests, I have also bumped into a case that failed with
the original patch (where there were no session locks), and created
an isolation test based on it: drop of a partition leaf concurrently
to REINDEX done on the parent.  One last thing I have spotted is that
we failed to discard properly foreign tables defined as leaves of a
partition tree, causing a reindex to fail, so reindex_partitions()
ought to just use RELKIND_HAS_STORAGE() to do its filtering work.  I
am leaving this patch alone for a couple of days now, and I'll try to
come back to it after and potentially commit it.  The attached has
been indented by the way.
--
Michael
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index c26a102b17..22db3f660d 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -34,8 +34,10 @@ extern ObjectAddress DefineIndex(Oid relationId,
  bool check_not_in_use,
  bool skip_build,
  bool quiet);
-extern void ReindexIndex(RangeVar *indexRelation, int options, bool concurrent);
-extern Oid	ReindexTable(RangeVar *relation, int options, bool concurrent);
+extern void ReindexIndex(RangeVar *indexRelation, int options, bool concurrent,
+		 bool isTopLevel);
+extern Oid	ReindexTable(RangeVar *relation, int options, bool concurrent,
+		 bool isTopLevel);
 extern void ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
   int options, bool concurrent);
 extern char *makeObjectName(const char *name1, const char *name2,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1be27eec52..14e3464d0e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -77,6 +77,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/pg_rusage.h"
+#include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
@@ -3447,11 +3448,12 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 	 iRel->rd_rel->relam);
 
 	/*
-	 * The case of reindexing partitioned tables and indexes is handled
-	 * differently by upper layers, so this case shouldn't arise.
+	 * Partitioned indexes should never get processed here, as they have no
+	 * physical storage.
 	 */
 	if (iRel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
-		elog(ERROR, "unsupported relation kind for index \"%s\"",
+		elog(ERROR, "cannot reindex partitioned index \"%s.%s\"",
+			 get_namespace_name(RelationGetNamespace(iRel)),
 			 RelationGetRelationName(iRel));
 
 	/*
@@ -3661,20 +3663,13 @@ reindex_relation(Oid relid, int flags, int options)
 	rel = table_open(relid, ShareLock);
 
 	/*
-	 * This may be useful when implemented someday; but that day is not today.
-	 * For now, avoid erroring out when called in a multi-table context
-	 * (REINDEX SCHEMA) and happen to come across a partitioned table.  The
-	 * partitions may be reindexed on their own anyway.
+	 * Partitioned tables should never get processed here, as they have no
+	 * physical storage.
 	 */
 	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-	{
-		ereport(WARNING,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("REINDEX of partitioned tables is not yet implemented, skipping \"%s\"",
-

Re: recovering from "found xmin ... from before relfrozenxid ..."

2020-08-12 Thread Ashutosh Sharma

Thanks Robert for the review. Please find my comments inline below:

On Fri, Aug 7, 2020 at 9:21 PM Robert Haas  wrote:
>
> On Thu, Aug 6, 2020 at 9:23 AM Ashutosh Sharma  wrote:
> > Attached v4 patch fixes the latest comments from Robert and Masahiko-san.
>
> Compiler warning:
>
> heap_surgery.c:136:13: error: comparison of unsigned expression < 0 is
> always false [-Werror,-Wtautological-compare]
> if (blkno < 0 || blkno >= nblocks)
> ~ ^ ~
>

Fixed.

> There's a certain inconsistency to these messages:
>
> rhaas=# create table foo (a int);
> CREATE TABLE
> rhaas=# insert into foo values (1);
> INSERT 0 1
> rhaas=# select heap_force_kill('foo'::regclass, array['(0,2)'::tid]);
> NOTICE:  skipping tid (0, 2) because it contains an invalid offset
>  heap_force_kill
> -
>
> (1 row)
>
> rhaas=# select heap_force_kill('foo'::regclass, array['(1,0)'::tid]);
> ERROR:  invalid item pointer
> LOCATION:  tids_same_page_fetch_offnums, heap_surgery.c:347
> rhaas=# select heap_force_kill('foo'::regclass, array['(1,1)'::tid]);
> ERROR:  block number 1 is out of range for relation "foo"
>
> From a user perspective it seems like I've made three very similar
> mistakes: in the first case, the offset is too high, in the second
> case it's too low, and in the third case the block number is out of
> range. But in one case I get a NOTICE and in the other two cases I get
> an ERROR. In one case I get the relation name and in the other two
> cases I don't. The two complaints about an invalid offset are phrased
> completely differently from each other. For example, suppose you do
> this:
>
> ERROR: tid (%u, %u) is invalid for relation "%s" because the block
> number is out of range (%u..%u)
> ERROR: tid (%u, %u) is invalid for relation "%s" because the item
> number is out of range for this block (%u..%u)
> ERROR: tid (%u, %u) is invalid for relation "%s" because the item is unused
> ERROR: tid (%u, %u) is invalid for relation "%s" because the item is dead
>

Corrected.

> I think I misled you when I said to use pg_class_aclcheck. I think it
> should actually be pg_class_ownercheck.
>

okay, I've changed it to pg_class_ownercheck.

> I think the relkind sanity check should permit RELKIND_MATVIEW also.
>

Yeah, actually we should allow MATVIEW, don't know why I thought of
blocking it earlier.

> It's unclear to me why the freeze logic here shouldn't do this part
> what heap_prepare_freeze_tuple() does when freezing xmax:
>
> frz->t_infomask2 &= ~HEAP_HOT_UPDATED;
> frz->t_infomask2 &= ~HEAP_KEYS_UPDATED;
>

Yeah, we should have these changes when freezing the xmax.

> Likewise, why should we not freeze or invalidate xvac in the case
> where tuple->t_infomask & HEAP_MOVED, as heap_prepare_freeze_tuple()
> would do?
>

Again, we should have this as well.

Apart from above, this time I've also added the documentation on
pg_surgery module and added a few more test-cases.

Attached patch with above changes.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 2aa56b9632cf1d291f4433afd972dc647d354dcb Mon Sep 17 00:00:00 2001
From: ashu 
Date: Wed, 12 Aug 2020 14:38:14 +0530
Subject: [PATCH] Add contrib/pg_surgery to perform surgery on a damaged heap
 table.

This contrib module basically adds a couple of functions named
heap_force_kill and heap_force_freeze that can be used in the scripts
to perform surgery on the damaged heap tables.

Ashutosh Sharma.
---
 contrib/Makefile   |   1 +
 contrib/pg_surgery/Makefile|  23 ++
 contrib/pg_surgery/expected/pg_surgery.out | 161 +
 contrib/pg_surgery/heap_surgery.c  | 375 +
 contrib/pg_surgery/pg_surgery--1.0.sql |  18 ++
 contrib/pg_surgery/pg_surgery.control  |   5 +
 contrib/pg_surgery/sql/pg_surgery.sql  |  89 +++
 doc/src/sgml/contrib.sgml  |   1 +
 doc/src/sgml/filelist.sgml |   1 +
 doc/src/sgml/pgsurgery.sgml|  94 
 10 files changed, 768 insertions(+)
 create mode 100644 contrib/pg_surgery/Makefile
 create mode 100644 contrib/pg_surgery/expected/pg_surgery.out
 create mode 100644 contrib/pg_surgery/heap_surgery.c
 create mode 100644 contrib/pg_surgery/pg_surgery--1.0.sql
 create mode 100644 contrib/pg_surgery/pg_surgery.control
 create mode 100644 contrib/pg_surgery/sql/pg_surgery.sql
 create mode 100644 doc/src/sgml/pgsurgery.sgml

diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d41..07d5734 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -35,6 +35,7 @@ SUBDIRS = \
 		pg_standby	\
 		pg_stat_statements \
 		pg_trgm		\
+		pg_surgery	\
 		pgcrypto	\
 		pgrowlocks	\
 		pgstattuple	\
diff --git a/contrib/pg_surgery/Makefile b/contrib/pg_surgery/Makefile
new file mode 100644
index 000..ecf2e20
--- /dev/null
+++ b/contrib/pg_surgery/Makefile
@@ -0,0 +1,23 @@
+# contrib/pg_surgery/Makefile
+
+MODULE_big = pg_surgery

Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.

2020-08-12 Thread Andy Fan

On Wed, Aug 12, 2020 at 8:21 PM Dave Cramer 
wrote:

>
>
> On Wed, 12 Aug 2020 at 08:14, Andy Fan  wrote:
>
>>
>>
>> On Wed, Aug 12, 2020 at 8:11 PM Andy Fan 
>> wrote:
>>
>>>
>>>
>>> On Wed, Aug 12, 2020 at 5:54 PM Dave Cramer 
>>> wrote:
>>>



 On Tue, 11 Aug 2020 at 22:33, Andy Fan 
 wrote:

>
>
> On Mon, Jul 27, 2020 at 11:57 AM Andy Fan 
> wrote:
>
>>
>>> 2. Currently I want to add a new GUC parameter, if set it to true,
>>> server will
>>> create a holdable portal, or else nothing changed.  Then let the
>>> user set
>>> it to true in the above case and reset it to false afterward.  Is
>>> there any issue
>>> with this method?
>>>
>>>
>> I forget to say in this case, the user has to drop the holdable
>> portal  explicitly.
>>
>>
>>
> After some days's hack and testing, I found more issues to support the
> following case
>
> rs = prepared_stmt.execute(1);
> while(rs.next())
> {
> // do something with the result  (mainly DML )
> conn.commit();  or  conn.rollback();
>
> // commit / rollback to avoid the long lock holding.
> }
>
> The holdable portal is still be dropped in transaction
> aborted/rollbacked case since
> the HoldPortal doesn't happens before that and "abort/rollabck" means
> something
> wrong so it is risk to hold it again.  What I did to fix this issue is
> HoldPortal just after
> we define a Holdable portal.  However, that's bad for performance.
> Originally, we just
> needed to scan the result when needed, now we have to hold all the
> results and then fetch
> and the data one by one.
>
> The above user case looks reasonable to me IMO,  I would say it is
> kind of "tech debt"
> in postgres.  To support this completely, looks we have to decouple
> the snapshot/locking
> management with transaction? If so, it looks like a huge change. I
> wonder if anybody
> tried to resolve this issue and where do we get to that point?
>
> --
> Best Regards
> Andy Fan
>


 I think if you set the fetch size the driver will use a named cursor
 and this should work


>>> If the drivers can use the tempfile as an extra store, then things will
>>> be better than the server.
>>>
>>
>> Maybe not much better, just the same as each other.  Both need to
>> store all of them first and fetch them from the temp store again.
>>
>>
> Ya I thought about this after I answered it. If you have a resultset that
> you requested in a transaction and then you commit the transaction I think
> it is reasonable to expect that the resultset is no longer valid.
>
>
I checked JDBC, the resultset only uses memory to cache the resultset.
so we can't set  an inf+ fetch size with the hope that the client's
resultset
can cache all of them for us.

Basically I will use my above hack.

-- 
Best Regards
Andy Fan

Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.

2020-08-12 Thread Dave Cramer

On Wed, 12 Aug 2020 at 08:14, Andy Fan  wrote:

>
>
> On Wed, Aug 12, 2020 at 8:11 PM Andy Fan  wrote:
>
>>
>>
>> On Wed, Aug 12, 2020 at 5:54 PM Dave Cramer 
>> wrote:
>>
>>>
>>>
>>>
>>> On Tue, 11 Aug 2020 at 22:33, Andy Fan  wrote:
>>>

 On Mon, Jul 27, 2020 at 11:57 AM Andy Fan 
 wrote:

>
>> 2. Currently I want to add a new GUC parameter, if set it to true,
>> server will
>> create a holdable portal, or else nothing changed.  Then let the user
>> set
>> it to true in the above case and reset it to false afterward.  Is
>> there any issue
>> with this method?
>>
>>
> I forget to say in this case, the user has to drop the holdable
> portal  explicitly.
>
>
>
 After some days's hack and testing, I found more issues to support the
 following case

 rs = prepared_stmt.execute(1);
 while(rs.next())
 {
 // do something with the result  (mainly DML )
 conn.commit();  or  conn.rollback();

 // commit / rollback to avoid the long lock holding.
 }

 The holdable portal is still be dropped in transaction
 aborted/rollbacked case since
 the HoldPortal doesn't happens before that and "abort/rollabck" means
 something
 wrong so it is risk to hold it again.  What I did to fix this issue is
 HoldPortal just after
 we define a Holdable portal.  However, that's bad for performance.
 Originally, we just
 needed to scan the result when needed, now we have to hold all the
 results and then fetch
 and the data one by one.

 The above user case looks reasonable to me IMO,  I would say it is kind
 of "tech debt"
 in postgres.  To support this completely, looks we have to decouple the
 snapshot/locking
 management with transaction? If so, it looks like a huge change. I
 wonder if anybody
 tried to resolve this issue and where do we get to that point?

 --
 Best Regards
 Andy Fan

>>>
>>>
>>> I think if you set the fetch size the driver will use a named cursor and
>>> this should work
>>>
>>>
>> If the drivers can use the tempfile as an extra store, then things will
>> be better than the server.
>>
>
> Maybe not much better, just the same as each other.  Both need to
> store all of them first and fetch them from the temp store again.
>
>
Ya I thought about this after I answered it. If you have a resultset that
you requested in a transaction and then you commit the transaction I think
it is reasonable to expect that the resultset is no longer valid.

Dave Cramer
www.postgres.rocks

>

Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.

2020-08-12 Thread Andy Fan

On Wed, Aug 12, 2020 at 8:11 PM Andy Fan  wrote:

>
>
> On Wed, Aug 12, 2020 at 5:54 PM Dave Cramer 
> wrote:
>
>>
>>
>>
>> On Tue, 11 Aug 2020 at 22:33, Andy Fan  wrote:
>>
>>>
>>>
>>> On Mon, Jul 27, 2020 at 11:57 AM Andy Fan 
>>> wrote:
>>>

> 2. Currently I want to add a new GUC parameter, if set it to true,
> server will
> create a holdable portal, or else nothing changed.  Then let the user
> set
> it to true in the above case and reset it to false afterward.  Is
> there any issue
> with this method?
>
>
 I forget to say in this case, the user has to drop the holdable
 portal  explicitly.



>>> After some days's hack and testing, I found more issues to support the
>>> following case
>>>
>>> rs = prepared_stmt.execute(1);
>>> while(rs.next())
>>> {
>>> // do something with the result  (mainly DML )
>>> conn.commit();  or  conn.rollback();
>>>
>>> // commit / rollback to avoid the long lock holding.
>>> }
>>>
>>> The holdable portal is still be dropped in transaction
>>> aborted/rollbacked case since
>>> the HoldPortal doesn't happens before that and "abort/rollabck" means
>>> something
>>> wrong so it is risk to hold it again.  What I did to fix this issue is
>>> HoldPortal just after
>>> we define a Holdable portal.  However, that's bad for performance.
>>> Originally, we just
>>> needed to scan the result when needed, now we have to hold all the
>>> results and then fetch
>>> and the data one by one.
>>>
>>> The above user case looks reasonable to me IMO,  I would say it is kind
>>> of "tech debt"
>>> in postgres.  To support this completely, looks we have to decouple the
>>> snapshot/locking
>>> management with transaction? If so, it looks like a huge change. I
>>> wonder if anybody
>>> tried to resolve this issue and where do we get to that point?
>>>
>>> --
>>> Best Regards
>>> Andy Fan
>>>
>>
>>
>> I think if you set the fetch size the driver will use a named cursor and
>> this should work
>>
>>
> If the drivers can use the tempfile as an extra store, then things will be
> better than the server.
>

Maybe not much better, just the same as each other.  Both need to
store all of them first and fetch them from the temp store again.

-- 
Best Regards
Andy Fan

Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.

2020-08-12 Thread Andy Fan

On Wed, Aug 12, 2020 at 5:54 PM Dave Cramer 
wrote:

>
>
>
> On Tue, 11 Aug 2020 at 22:33, Andy Fan  wrote:
>
>>
>>
>> On Mon, Jul 27, 2020 at 11:57 AM Andy Fan 
>> wrote:
>>
>>>
 2. Currently I want to add a new GUC parameter, if set it to true,
 server will
 create a holdable portal, or else nothing changed.  Then let the user
 set
 it to true in the above case and reset it to false afterward.  Is there
 any issue
 with this method?


>>> I forget to say in this case, the user has to drop the holdable
>>> portal  explicitly.
>>>
>>>
>>>
>> After some days's hack and testing, I found more issues to support the
>> following case
>>
>> rs = prepared_stmt.execute(1);
>> while(rs.next())
>> {
>> // do something with the result  (mainly DML )
>> conn.commit();  or  conn.rollback();
>>
>> // commit / rollback to avoid the long lock holding.
>> }
>>
>> The holdable portal is still be dropped in transaction aborted/rollbacked
>> case since
>> the HoldPortal doesn't happens before that and "abort/rollabck" means
>> something
>> wrong so it is risk to hold it again.  What I did to fix this issue is
>> HoldPortal just after
>> we define a Holdable portal.  However, that's bad for performance.
>> Originally, we just
>> needed to scan the result when needed, now we have to hold all the
>> results and then fetch
>> and the data one by one.
>>
>> The above user case looks reasonable to me IMO,  I would say it is kind
>> of "tech debt"
>> in postgres.  To support this completely, looks we have to decouple the
>> snapshot/locking
>> management with transaction? If so, it looks like a huge change. I wonder
>> if anybody
>> tried to resolve this issue and where do we get to that point?
>>
>> --
>> Best Regards
>> Andy Fan
>>
>
>
> I think if you set the fetch size the driver will use a named cursor and
> this should work
>
>
I knew this point before working on that, but I heard from my customer that
the size
would be pretty big, so I gave up on this idea (too early).   However,
after working on
a Holdable solution, I see there is very little difference between caching
the result
on the server or client.   If the drivers can use the tempfile as an extra
store, then
things will be better than the server.  Or else,  things will be still
complex.  Thanks
for your reminder!

-- 
Best Regards
Andy Fan

Re: Make contrib modules' installation scripts more secure.

2020-08-12 Thread Christoph Berg

Re: Tom Lane
> > (The Debian regression tests remove plpgsql before testing all
> > extensions in turn.)
> 
> Meh.  I think that's testing a case that we don't guarantee to work.
> There was already a plpgsql dependency in hstore--1.1--1.2.sql,
> which I just cribbed from to make these fixes.

The key difference is that hstore--1.1--1.2.sql was never required for
installing an extension from scratch, only for upgrades. The practical
relevance of this distinction is that the upgrade scripts are only run
once, while install-time scripts (including the upgrade scripts for
versions that do not have a direct creation script) are also required
for dump-restore cycles. As an admin, I'd very much hate databases
that couldn't be restored without extra fiddling.

The thing that maybe saves us here is that while hstore is trusted, so
any user can create it, plpgsql is trusted as well, but owned by
postgres, so even database owners can't drop it from beneath hstore.
Only superusers can "mess up" a database in that way. But still.

> A band-aid sort of fix would be to roll up the base install scripts
> for these modules to the latest version, so that a plain install from
> scratch doesn't need to execute any of the catalog adjustments in
> their update scripts.  That's not terribly attractive from a maintenance
> or testing standpoint, though.

That's a pretty small price compared to the dump-reload
inconsistencies.

I can see the extra maintenance effort, but how many extensions would
require rewriting as direct-install.sql scripts? I guess it's only a
few that need plpgsql for upgrades.

Christoph

Re: Allows Extend Protocol support CURSOR_OPT_HOLD with prepared stmt.

2020-08-12 Thread Dave Cramer

On Tue, 11 Aug 2020 at 22:33, Andy Fan  wrote:

>
>
> On Mon, Jul 27, 2020 at 11:57 AM Andy Fan 
> wrote:
>
>>
>>> 2. Currently I want to add a new GUC parameter, if set it to true,
>>> server will
>>> create a holdable portal, or else nothing changed.  Then let the user
>>> set
>>> it to true in the above case and reset it to false afterward.  Is there
>>> any issue
>>> with this method?
>>>
>>>
>> I forget to say in this case, the user has to drop the holdable
>> portal  explicitly.
>>
>>
>>
> After some days's hack and testing, I found more issues to support the
> following case
>
> rs = prepared_stmt.execute(1);
> while(rs.next())
> {
> // do something with the result  (mainly DML )
> conn.commit();  or  conn.rollback();
>
> // commit / rollback to avoid the long lock holding.
> }
>
> The holdable portal is still be dropped in transaction aborted/rollbacked
> case since
> the HoldPortal doesn't happens before that and "abort/rollabck" means
> something
> wrong so it is risk to hold it again.  What I did to fix this issue is
> HoldPortal just after
> we define a Holdable portal.  However, that's bad for performance.
> Originally, we just
> needed to scan the result when needed, now we have to hold all the results
> and then fetch
> and the data one by one.
>
> The above user case looks reasonable to me IMO,  I would say it is kind of
> "tech debt"
> in postgres.  To support this completely, looks we have to decouple the
> snapshot/locking
> management with transaction? If so, it looks like a huge change. I wonder
> if anybody
> tried to resolve this issue and where do we get to that point?
>
> --
> Best Regards
> Andy Fan
>


I think if you set the fetch size the driver will use a named cursor and
this should work

Dave Cramer
www.postgres.rocks

use pg_get_functiondef() in pg_dump

2020-08-12 Thread Peter Eisentraut

Here is a patch to have pg_dump use pg_get_functiondef() instead of 
assembling the CREATE FUNCTION/PROCEDURE commands itself.  This should 
save on maintenance effort in the future.  It's also a prerequisite for 
being able to dump functions with SQL-standard function body discussed 
in [0].


pg_get_functiondef() was meant for psql's \ef command, so its defaults 
are slightly different from what pg_dump would like, so this adds a few
optional parameters for tweaking the behavior.  The naming of the 
parameters is up for discussion.



[0]: 
https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c0...@2ndquadrant.com


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 6d87daace8d5e9cec80d1b3b5d1a03ff9c2206ee Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Aug 2020 09:19:58 +0200
Subject: [PATCH v1 1/2] Drop final newline from pg_get_functiondef result

This makes it more consistent with other pg_get_*def() functions.
psql (for \ef) already ensures that a trailing newline is added as
necessary, so this change doesn't affect psql's functionality.
---
 src/backend/utils/adt/ruleutils.c   |  2 --
 src/test/regress/expected/create_function_3.out | 12 
 src/test/regress/expected/create_procedure.out  |  3 +--
 src/test/regress/expected/rules.out |  3 +--
 4 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 60dd80c23c..877f99cba3 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -2832,8 +2832,6 @@ pg_get_functiondef(PG_FUNCTION_ARGS)
appendStringInfoString(, prosrc);
appendBinaryStringInfo(, dq.data, dq.len);
 
-   appendStringInfoChar(, '\n');
-
ReleaseSysCache(proctup);
 
PG_RETURN_TEXT_P(string_to_text(buf.data));
diff --git a/src/test/regress/expected/create_function_3.out 
b/src/test/regress/expected/create_function_3.out
index ba260df996..5d0be17282 100644
--- a/src/test/regress/expected/create_function_3.out
+++ b/src/test/regress/expected/create_function_3.out
@@ -200,8 +200,7 @@ SELECT pg_get_functiondef('functest_A_1'::regproc);
  CREATE OR REPLACE FUNCTION temp_func_test.functest_a_1(text, date)+
   RETURNS boolean  +
   LANGUAGE sql +
- AS $function$SELECT $1 = 'abcd' AND $2 > '2001-01-01'$function$   +
- 
+ AS $function$SELECT $1 = 'abcd' AND $2 > '2001-01-01'$function$
 (1 row)
 
 SELECT pg_get_functiondef('functest_B_3'::regproc);
@@ -211,8 +210,7 @@ SELECT pg_get_functiondef('functest_B_3'::regproc);
   RETURNS boolean   +
   LANGUAGE sql  +
   STABLE+
- AS $function$SELECT $1 = 0$function$   +
- 
+ AS $function$SELECT $1 = 0$function$
 (1 row)
 
 SELECT pg_get_functiondef('functest_C_3'::regproc);
@@ -222,8 +220,7 @@ SELECT pg_get_functiondef('functest_C_3'::regproc);
   RETURNS boolean   +
   LANGUAGE sql  +
   SECURITY DEFINER  +
- AS $function$SELECT $1 < 0$function$   +
- 
+ AS $function$SELECT $1 < 0$function$
 (1 row)
 
 SELECT pg_get_functiondef('functest_F_2'::regproc);
@@ -233,8 +230,7 @@ SELECT pg_get_functiondef('functest_F_2'::regproc);
   RETURNS boolean   +
   LANGUAGE sql  +
   STRICT+
- AS $function$SELECT $1 = 50$function$  +
- 
+ AS $function$SELECT $1 = 50$function$
 (1 row)
 
 -- information_schema tests
diff --git a/src/test/regress/expected/create_procedure.out 
b/src/test/regress/expected/create_procedure.out
index 211a42cefa..2c94b9a4b6 100644
--- a/src/test/regress/expected/create_procedure.out
+++ b/src/test/regress/expected/create_procedure.out
@@ -29,8 +29,7 @@ SELECT pg_get_functiondef('ptest1'::regproc);
   LANGUAGE sql+
  AS $procedure$   +
  INSERT INTO cp_test VALUES (1, x);   +
- $procedure$  +
- 
+ $procedure$
 (1 row)
 
 -- show only normal functions
diff --git a/src/test/regress/expected/rules.out 
b/src/test/regress/expected/rules.out
index 601734a6f1..be434c97ac 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -3402,8 +3402,7 @@ SELECT 
pg_get_functiondef('func_with_set_params()'::regprocedure);
   SET work_mem TO '4MB'

Re: remove some ancient port hacks

2020-08-12 Thread Marco Atzeri


On 12.08.2020 09:12, Peter Eisentraut wrote:
There are two ancient hacks in the cygwin and solaris ports that appear 
to have been solved more than 10 years ago, so I think we can remove 
them.  See attached patches.




Hi Peter,
This is really archeology

  Check for b20.1

as it was released in 1998.
No problem at all to remove it

Regards Marco
Cygwin Package Maintainer

remove some ancient port hacks

2020-08-12 Thread Peter Eisentraut

There are two ancient hacks in the cygwin and solaris ports that appear 
to have been solved more than 10 years ago, so I think we can remove 
them.  See attached patches.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 6322cb1d9579ab8e2fd66139aa6a0f342925b7f0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Aug 2020 09:09:02 +0200
Subject: [PATCH 1/2] Remove obsolete HAVE_BUGGY_SOLARIS_STRTOD

Fixed more than 10 years ago.
---
 src/backend/utils/adt/float.c | 24 
 src/include/port/solaris.h| 12 
 2 files changed, 36 deletions(-)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index ffd1ce8c76..429c9280c0 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -271,18 +271,6 @@ float4in(PG_FUNCTION_ARGS)
 errmsg("invalid input syntax for type 
%s: \"%s\"",
"real", orig_num)));
}
-#ifdef HAVE_BUGGY_SOLARIS_STRTOD
-   else
-   {
-   /*
-* Many versions of Solaris have a bug wherein strtod sets 
endptr to
-* point one byte beyond the end of the string when given "inf" 
or
-* "infinity".
-*/
-   if (endptr != num && endptr[-1] == '\0')
-   endptr--;
-   }
-#endif /* 
HAVE_BUGGY_SOLARIS_STRTOD */
 
/* skip trailing whitespace */
while (*endptr != '\0' && isspace((unsigned char) *endptr))
@@ -499,18 +487,6 @@ float8in_internal_opt_error(char *num, char **endptr_p,

 type_name, orig_string))),
 have_error);
}
-#ifdef HAVE_BUGGY_SOLARIS_STRTOD
-   else
-   {
-   /*
-* Many versions of Solaris have a bug wherein strtod sets 
endptr to
-* point one byte beyond the end of the string when given "inf" 
or
-* "infinity".
-*/
-   if (endptr != num && endptr[-1] == '\0')
-   endptr--;
-   }
-#endif /* 
HAVE_BUGGY_SOLARIS_STRTOD */
 
/* skip trailing whitespace */
while (*endptr != '\0' && isspace((unsigned char) *endptr))
diff --git a/src/include/port/solaris.h b/src/include/port/solaris.h
index eeb1a320bd..e63a3bd824 100644
--- a/src/include/port/solaris.h
+++ b/src/include/port/solaris.h
@@ -24,15 +24,3 @@
 #if defined(__i386__)
 #include 
 #endif
-
-/*
- * Many versions of Solaris have broken strtod() --- see bug #4751182.
- * This has been fixed in current versions of Solaris:
- *
- * 
http://sunsolve.sun.com/search/document.do?assetkey=1-21-108993-62-1=108993-62
- * 
http://sunsolve.sun.com/search/document.do?assetkey=1-21-112874-34-1=112874-34
- *
- * However, many people might not have patched versions, so
- * still use our own fix for the buggy version.
- */
-#define HAVE_BUGGY_SOLARIS_STRTOD
-- 
2.28.0

From 71a0e828454f2c473c65824cfdbf6a14640ffcf6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Aug 2020 09:09:02 +0200
Subject: [PATCH 2/2] Remove obsolete cygwin.h hack

The version being checked for is 20 years old.
---
 src/include/port/cygwin.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/src/include/port/cygwin.h b/src/include/port/cygwin.h
index f1fc1a93d7..64d69936e5 100644
--- a/src/include/port/cygwin.h
+++ b/src/include/port/cygwin.h
@@ -1,14 +1,5 @@
 /* src/include/port/cygwin.h */
 
-#include 
-
-/*
- * Check for b20.1 and disable AF_UNIX family socket support.
- */
-#if CYGWIN_VERSION_DLL_MAJOR < 1001
-#undef HAVE_UNIX_SOCKETS
-#endif
-
 #ifdef BUILDING_DLL
 #define PGDLLIMPORT __declspec (dllexport)
 #else
-- 
2.28.0

Re: SyncRepLock acquired exclusively in default configuration

2020-08-12 Thread Masahiko Sawada

On Wed, 12 Aug 2020 at 14:06, Asim Praveen  wrote:
>
>
>
> > On 11-Aug-2020, at 8:57 PM, Robert Haas  wrote:
> >
> > I think this gets to the root of the issue. If we check the flag
> > without a lock, we might see a slightly stale value. But, considering
> > that there's no particular amount of time within which configuration
> > changes are guaranteed to take effect, maybe that's OK. However, there
> > is one potential gotcha here: if the walsender declares the standby to
> > be synchronous, a user can see that, right? So maybe there's this
> > problem: a user sees that the standby is synchronous and expects a
> > transaction committing afterward to provoke a wait, but really it
> > doesn't. Now the user is unhappy, feeling that the system didn't
> > perform according to expectations.
>
> Yes, pg_stat_replication reports a standby in sync as soon as walsender 
> updates priority of the standby to something other than 0.
>
> The potential gotcha referred above doesn’t seem too severe.  What is the 
> likelihood of someone setting synchronous_standby_names GUC with either “*” 
> or a standby name and then immediately promoting that standby?  If the 
> standby is promoted before the checkpointer on master gets a chance to update 
> sync_standbys_defined in shared memory, commits made during this interval on 
> master may not make it to standby.  Upon promotion, those commits may be lost.

I think that if the standby is quite behind the primary and in case of
the primary crashes, the likelihood of losing commits might get
higher. The user can see the standby became synchronous standby via
pg_stat_replication but commit completes without a wait because the
checkpointer doesn't update sync_standbys_defined yet. If the primary
crashes before standby catching up and the user does failover, the
committed transaction will be lost, even though the user expects that
transaction commit has been replicated to the standby synchronously.
And this can happen even without the patch, right?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Handing off SLRU fsyncs to the checkpointer