Re: Commitfest manager for 2022-03

2022-02-27 Thread Julien Rouhaud
On Sat, Feb 26, 2022 at 06:37:21PM -0600, Justin Pryzby wrote:
> Can I suggest to update the CF APP to allow:
> | Target version: 16
> 
> I also suggest to update patches to indicate which are (not) being considered
> for v15.

I don't really understand what that field is supposed to mean.  But now that
we're in the final pg15 commit fest, wouldn't it be simpler to actually move
the patches for which there's a agreement that they can't make it to pg15?
Tagging them and letting them rot for a month isn't helpful for the authors or
the CFMs, especially when there are already 250 patches that needs to be
handled.

There's already an on-going a discussion for the PGTT patchset, maybe it should
also happen for some of the thread you mentioned.




Re: Commitfest manager for 2022-03

2022-02-27 Thread Michael Paquier
On Sun, Feb 27, 2022 at 04:51:16PM +0800, Julien Rouhaud wrote:
> I don't really understand what that field is supposed to mean.  But now that
> we're in the final pg15 commit fest, wouldn't it be simpler to actually move
> the patches for which there's a agreement that they can't make it to pg15?
> Tagging them and letting them rot for a month isn't helpful for the authors or
> the CFMs, especially when there are already 250 patches that needs to be
> handled.

Yes, I don't see any point in having a new tag just to mark patches
that will have to be moved to the next CF anyway.  These should just
be moved to the July one rather than stay in the March one.
--
Michael


signature.asc
Description: PGP signature


Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-02-27 Thread Michael Paquier
On Sat, Feb 26, 2022 at 01:39:42PM -0800, Andres Freund wrote:
> I suspect the easiest is to just convert that usleep to a WaitLatch(). That'd
> require adding a new enum value to WaitEventTimeout in 14. Which probably is
> fine?

We've added wait events in back-branches in the past, so this does not
strike me as a problem as long as you add the new entry at the end of
the enum, while keeping things ordered on HEAD.  In recent memory, I
think that only some of the extensions published by PostgresPro rely
on the enums in this area. 
--
Michael


signature.asc
Description: PGP signature


Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-02-27 Thread Julien Rouhaud
On Sun, Feb 27, 2022 at 06:10:45PM +0900, Michael Paquier wrote:
> On Sat, Feb 26, 2022 at 01:39:42PM -0800, Andres Freund wrote:
> > I suspect the easiest is to just convert that usleep to a WaitLatch(). 
> > That'd
> > require adding a new enum value to WaitEventTimeout in 14. Which probably is
> > fine?
> 
> We've added wait events in back-branches in the past, so this does not
> strike me as a problem as long as you add the new entry at the end of
> the enum, while keeping things ordered on HEAD.

+1

> In recent memory, I
> think that only some of the extensions published by PostgresPro rely
> on the enums in this area. 

Indeed, I only know of pg_wait_sampling which uses it.  Note that it relies on
pgstat_get_wait_event* functions, so it should only returns "???" / "unknown
wait event" until you recompile it for a newer minor version and not report
errors or crash.  All other extensions I know of simply use whatever
pg_stat_activity returns, so no impact.




Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

2022-02-27 Thread Bharath Rupireddy
On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi  
> wrote:
> > > Here's the v2 patch. Please provide your thoughts.
> >
> > Thanks!  I have three comments on this version.
>
> Thanks for your review.
>
> > - I still think "acquire/release" logs on creation/dropping should be
> >   silenced.  Adding the third parameter to ReplicationSlotAcquire()
> >   that can mute the acquiring (and as well as the corresponding
> >   releasing) log will work.
>
> Done.
>
> > can piggy-back on log_replication_commands for the purpose, changing
> > its meaning slightly to "log replication commands and related
> > activity".
> > - Need to mute the logs by log_replication_commands.  (We could add
> >   another choice for the variable for this purpose but I think we
> >   don't need it.)
>
> Done.
>
> > - The messages are not translatable as the variable parts are
> >   adjectives. They need to consist of static sentences.  The
> >   combinations of the two properties are 6 (note that persistence is
> >   tristate) but I don't think we accept that complexity for the
> >   information.  So I recommend to just remove the variable parts from
> >   the messages.
>
> Done.
>
> Here's v3, please review it further.

Here's the rebased v4 patch.

Regards,
Bharath Rupireddy.


v4-0001-Add-LOG-messages-when-replication-slots-become-ac.patch
Description: Binary data


Re: Some optimisations for numeric division

2022-02-27 Thread Dean Rasheed
On Fri, 25 Feb 2022 at 18:30, Tom Lane  wrote:
>
> Dean Rasheed  writes:
> > And another update following feedback from the cfbot.
>
> This patchset LGTM.  On my machine there doesn't seem to be any
> measurable performance change for the numeric regression test,
> but numeric_big gets about 15% faster.
>

Yes, that matches my observations. Thanks for reviewing and testing.

> The only additional thought I have, based on your comments about
> 0001, is that maybe in mul_var we should declare var1digit as
> being NumericDigit not int.  I tried that here and didn't see
> any material change in the generated assembly code (using gcc
> 8.5.0), so you're right that modern gcc doesn't need any help
> there; but I wonder if it could help on other compiler versions.
>

Yes, that makes sense. Done that way.

Regards,
Dean




pg_stat_statements: remove redundant function call in pg_stat_statements_internal

2022-02-27 Thread Dong Wook Lee
Hi,

I found some redundant function calls in
pg_stat_statements.c/pg_stat_statements_internal(),
There is no need to call GetUserId() again because the value was
previously obtained.
so I propose a patch to fix it.

--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1508,7 +1508,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
pgssEntry  *entry;

/* Superusers or members of pg_read_all_stats members are allowed */
-   is_allowed_role = is_member_of_role(GetUserId(),
ROLE_PG_READ_ALL_STATS);
+   is_allowed_role = is_member_of_role(userid, ROLE_PG_READ_ALL_STATS);

/* hash table must exist already */
if (!pgss || !pgss_hash)

Regards,
Lee Dong Wook.


v1_tiny_improvement_pg_stat_statements.patch
Description: Binary data


Re: pg_stat_statements: remove redundant function call in pg_stat_statements_internal

2022-02-27 Thread Julien Rouhaud
Hi,

On Sun, Feb 27, 2022 at 08:45:13PM +0900, Dong Wook Lee wrote:
> 
> I found some redundant function calls in
> pg_stat_statements.c/pg_stat_statements_internal(),
> There is no need to call GetUserId() again because the value was
> previously obtained.

Indeed.  I doubt it will make any real difference but it doesn't hurt to fix
it.

Patch looks good to me.




Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-27 Thread Michael Paquier
On Sat, Feb 26, 2022 at 09:55:20AM +0100, Gunnar "Nick" Bluth wrote:
> Am 26.02.22 um 06:51 schrieb Michael Paquier:
>> Shouldn't this one use appendShellString() on config_file?
> 
> It probably should, yes. I don't fancy this repetitive code myself.
> But then, pg_rewind as a whole could use an overhaul. I don't see any use of
> PQExpBuffer in it, but a lot of char* ...

Having a lot of char* does not necessarily mean that all of them have
to be changed to accomodate with PQExpBuffer.  My guess that this is a
case-by-case, where we should apply that in places where it makes the
code cleaner to follow.  In the case of your patch though, the two
areas changed would make the logic correct, instead, because we have
to apply correct quoting rules to any commands executed.

> GSOC project? ;-)

It does not seem so, though there are surely more areas that could
gain in clarity.
--
Michael


signature.asc
Description: PGP signature


Re: pg_stat_statements: remove redundant function call in pg_stat_statements_internal

2022-02-27 Thread Michael Paquier
On Sun, Feb 27, 2022 at 07:55:26PM +0800, Julien Rouhaud wrote:
> Indeed.  I doubt it will make any real difference but it doesn't hurt to fix
> it.
> 
> Patch looks good to me.

Yes, let's clean up that on HEAD.  No objections from here.  I'll do
that tomorrow or so.
--
Michael


signature.asc
Description: PGP signature


Support for grabbing multiple consecutive values with nextval()

2022-02-27 Thread Jille Timmermans

Hi,

First time PostgreSQL contributor here :)

I wanted to be able to allocate a bunch of numbers from a sequence at 
once. Multiple people seem to be struggling with this 
(https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, 
https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).


I propose to add an extra argument to nextval() that specifies how many 
numbers you want to allocate (default 1).


The attached patch (based on master) passes `./configure 
--enable-cassert --enable-debug && make && make check`, including the 
newly added regression tests.


It does change the signature of nextval_internal(), not sure if that's 
considered backwards compatibility breaking (for extensions?).


-- JilleFrom 403993dfea71068070185dd14fa3f5ff26d5f791 Mon Sep 17 00:00:00 2001
From: Jille Timmermans 
Date: Sun, 27 Feb 2022 10:20:22 +0100
Subject: [PATCH] Add an argument to nextval() to grab multiple consecutive
 sequence numbers

---
 doc/src/sgml/func.sgml |  8 +++--
 src/backend/commands/sequence.c| 46 +-
 src/backend/executor/execExprInterp.c  |  2 +-
 src/backend/optimizer/util/clauses.c   |  2 +-
 src/include/catalog/pg_proc.dat|  3 ++
 src/include/commands/sequence.h|  2 +-
 src/test/regress/expected/sequence.out | 41 +++
 src/test/regress/sql/sequence.sql  | 12 +++
 8 files changed, 102 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df3cd5987b..5923ecc38e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17605,7 +17605,7 @@ $.* ? (@ like_regex "^\\d+$")
 
  nextval
 
-nextval ( regclass )
+nextval ( regclass , bigint  )
 bigint


@@ -17618,7 +17618,11 @@ $.* ? (@ like_regex "^\\d+$")
 values beginning with 1.  Other behaviors can be obtained by using
 appropriate parameters in the 
 command.
-  
+   
+   
+To grab multiple values you can pass an integer to nextval.
+It will allocate that many consecutive numbers from the sequence and return the last value.
+   

 This function requires USAGE
 or UPDATE privilege on the sequence.
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index ab592ce2f1..79e2a1e7c0 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -570,7 +570,7 @@ nextval(PG_FUNCTION_ARGS)
 	 */
 	relid = RangeVarGetRelid(sequence, NoLock, false);
 
-	PG_RETURN_INT64(nextval_internal(relid, true));
+	PG_RETURN_INT64(nextval_internal(relid, true, 1));
 }
 
 Datum
@@ -578,11 +578,20 @@ nextval_oid(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
 
-	PG_RETURN_INT64(nextval_internal(relid, true));
+	PG_RETURN_INT64(nextval_internal(relid, true, 1));
+}
+
+Datum
+nextval_oid_num(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	int64		num = PG_GETARG_INT64(1);
+
+	PG_RETURN_INT64(nextval_internal(relid, true, num));
 }
 
 int64
-nextval_internal(Oid relid, bool check_permissions)
+nextval_internal(Oid relid, bool check_permissions, int64 request)
 {
 	SeqTable	elm;
 	Relation	seqrel;
@@ -605,6 +614,17 @@ nextval_internal(Oid relid, bool check_permissions)
 	bool		cycle;
 	bool		logit = false;
 
+	if (request < 1)
+	{
+		char		buf[100];
+
+		snprintf(buf, sizeof(buf), INT64_FORMAT, request);
+		ereport(ERROR,
+(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("nextval: can't request %s values from a sequence",
+		buf)));
+	}
+
 	/* open and lock sequence */
 	init_sequence(relid, &elm, &seqrel);
 
@@ -627,11 +647,10 @@ nextval_internal(Oid relid, bool check_permissions)
 	 */
 	PreventCommandIfParallelMode("nextval()");
 
-	if (elm->last != elm->cached)	/* some numbers were cached */
+	if (elm->increment != 0 && (elm->cached - elm->last) / elm->increment >= request)	/* enough numbers were cached */
 	{
 		Assert(elm->last_valid);
-		Assert(elm->increment != 0);
-		elm->last += elm->increment;
+		elm->last += elm->increment * request;
 		relation_close(seqrel, NoLock);
 		last_used_seq = elm;
 		return elm->last;
@@ -652,8 +671,17 @@ nextval_internal(Oid relid, bool check_permissions)
 	seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
 	page = BufferGetPage(buf);
 
+	if (elm->cached != elm->last && elm->cached == seq->last_value) {
+		/*
+		 * There are some numbers in the cache, and we can grab the numbers directly following those.
+		 * We can fetch fewer new numbers and claim the numbers from the cache.
+		 */
+		request -= elm->cached - elm->last;
+	}
+
 	elm->increment = incby;
 	last = next = result = seq->last_value;
+	cache += request-1;
 	fetch = cache;
 	log = seq->log_cnt;
 
@@ -703,7 +731,7 @@ nextval_internal(Oid relid, bool check_permissions)
 			if ((maxv >= 0 && next > maxv - incby) ||
 (maxv < 0 && next + incby > maxv))
 			{
-		

Re: Support for grabbing multiple consecutive values with nextval()

2022-02-27 Thread Julien Rouhaud
Hi,

On Sun, Feb 27, 2022 at 10:42:25AM +0100, Jille Timmermans wrote:
>
> First time PostgreSQL contributor here :)

Welcome!

> I wanted to be able to allocate a bunch of numbers from a sequence at once.
> Multiple people seem to be struggling with this 
> (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence,
> https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).
>
> I propose to add an extra argument to nextval() that specifies how many
> numbers you want to allocate (default 1).
>
> The attached patch (based on master) passes `./configure --enable-cassert
> --enable-debug && make && make check`, including the newly added regression
> tests.
>
> It does change the signature of nextval_internal(), not sure if that's
> considered backwards compatibility breaking (for extensions?).

Please register this patch to the next commit fest (and last for pg15
inclusion) at https://commitfest.postgresql.org/37/ if not done already.




Re: PATCH: add "--config-file=" option to pg_rewind

2022-02-27 Thread Gunnar "Nick" Bluth

Am 27.02.22 um 13:06 schrieb Michael Paquier:

On Sat, Feb 26, 2022 at 09:55:20AM +0100, Gunnar "Nick" Bluth wrote:

Am 26.02.22 um 06:51 schrieb Michael Paquier:

Shouldn't this one use appendShellString() on config_file?


It probably should, yes. I don't fancy this repetitive code myself.
But then, pg_rewind as a whole could use an overhaul. I don't see any use of
PQExpBuffer in it, but a lot of char* ...


Having a lot of char* does not necessarily mean that all of them have
to be changed to accomodate with PQExpBuffer.  My guess that this is a
case-by-case, where we should apply that in places where it makes the
code cleaner to follow.  In the case of your patch though, the two
areas changed would make the logic correct, instead, because we have
to apply correct quoting rules to any commands executed.


Alas! v3 attached.



GSOC project? ;-)


It does not seem so, though there are surely more areas that could
gain in clarity.


That's universally true ;-)


Best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
__
"Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 33e6bb64ad..31a1a1dedb 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -241,6 +241,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --config-file=filepath
+  
+   
+In case the postgresql.conf of your target cluster is not in the 
+target pgdata and you want to use the -c / --restore-target-wal option,
+provide a (relative or absolute) path to the postgresql.conf using this option.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index efb82a4034..b8c92c5590 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -23,6 +23,7 @@
 #include "common/restricted_token.h"
 #include "common/string.h"
 #include "fe_utils/recovery_gen.h"
+#include "fe_utils/string_utils.h"
 #include "file_ops.h"
 #include "filemap.h"
 #include "getopt_long.h"
@@ -60,6 +61,7 @@ char	   *datadir_target = NULL;
 char	   *datadir_source = NULL;
 char	   *connstr_source = NULL;
 char	   *restore_command = NULL;
+char	   *config_file = NULL;
 
 static bool debug = false;
 bool		showprogress = false;
@@ -86,6 +88,7 @@ usage(const char *progname)
 	printf(_("Options:\n"));
 	printf(_("  -c, --restore-target-wal   use restore_command in target configuration to\n"
 			 " retrieve WAL files from archives\n"));
+	printf(_("  --config-file=FILE path to postgresql.conf if outside target-pgdata (for -c)\n"));
 	printf(_("  -D, --target-pgdata=DIRECTORY  existing data directory to modify\n"));
 	printf(_("  --source-pgdata=DIRECTORY  source data directory to synchronize with\n"));
 	printf(_("  --source-server=CONNSTRsource server to synchronize with\n"));
@@ -120,6 +123,7 @@ main(int argc, char **argv)
 		{"no-sync", no_argument, NULL, 'N'},
 		{"progress", no_argument, NULL, 'P'},
 		{"debug", no_argument, NULL, 3},
+		{"config-file", required_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 	int			option_index;
@@ -204,6 +208,10 @@ main(int argc, char **argv)
 			case 4:
 no_ensure_shutdown = true;
 break;
+
+			case 5:
+config_file = pg_strdup(optarg);
+break;
 		}
 	}
 
@@ -1016,8 +1024,8 @@ getRestoreCommand(const char *argv0)
 {
 	int			rc;
 	char		postgres_exec_path[MAXPGPATH],
-postgres_cmd[MAXPGPATH],
 cmd_output[MAXPGPATH];
+	PQExpBuffer	postgres_cmd;
 
 	if (!restore_wal)
 		return;
@@ -1051,11 +1059,21 @@ getRestoreCommand(const char *argv0)
 	 * Build a command able to retrieve the value of GUC parameter
 	 * restore_command, if set.
 	 */
-	snprintf(postgres_cmd, sizeof(postgres_cmd),
-			 "\"%s\" -D \"%s\" -C restore_command",
-			 postgres_exec_path, datadir_target);
+	postgres_cmd = createPQExpBuffer();
 
-	if (!pipe_read_line(postgres_cmd, cmd_output, sizeof(cmd_output)))
+	/* path to postgres, properly quoted */
+	appendShellString(postgres_cmd, postgres_exec_path);
+
+	appendPQExpBufferStr(postgres_cmd, " -D ");
+	appendShellString(postgres_cmd, datadir_target);
+	if (config_file != NULL)
+	{
+		appendPQExpBufferStr(postgres_cmd, " --config_file=");
+		appendShellString(postgres_cmd, config_file);
+	}
+	appendPQExpBufferStr(postgres_cmd, " -C restore_command");
+
+	if (!pipe_read_line(postgres_cmd->data, cmd_output, sizeof(cmd_output)))
 		exit(1);
 
 	(void) pg_strip_crlf(cmd_output);
diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index db9201f38e..3c395ece12 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -190,5 +190,6 @@ in primary, before promotion
 run_test('local');
 run_test('remote');
 run_te

CREATE DATABASE IF NOT EXISTS in PostgreSQL

2022-02-27 Thread Japin Li

Hi, hackers

When I try to use CREATE DATABASE IF NOT EXISTS in PostgreSQL, it complains
this syntax is not supported.  We can use the following command to achieve
this, however, it's not straightforward.

 SELECT 'CREATE DATABASE mydb'
 WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'mydb')\gexec

Why don't support CREATE DATABASE IF NOT EXISTS syntax in PostgreSQL?

I create a patch for this, any suggestions?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

>From 7971893868e6eedc7229d28442f07890f251c42b Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Sun, 27 Feb 2022 22:02:59 +0800
Subject: [PATCH v1] Add CREATE DATABASE IF NOT EXISTS syntax

---
 doc/src/sgml/ref/create_database.sgml |  2 +-
 src/backend/commands/dbcommands.c | 19 +++
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/parser/gram.y |  9 +
 src/include/nodes/parsenodes.h|  1 +
 6 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index f70d0c75b4..74af9c586e 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE DATABASE name
+CREATE DATABASE name [ IF NOT EXISTS ]
 [ [ WITH ] [ OWNER [=] user_name ]
[ TEMPLATE [=] template ]
[ ENCODING [=] encoding ]
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index c37e3c9a9a..f3e5e930f9 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -557,10 +557,21 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * message than "unique index violation".  There's a race condition but
 	 * we're willing to accept the less friendly message in that case.
 	 */
-	if (OidIsValid(get_database_oid(dbname, true)))
-		ereport(ERROR,
-(errcode(ERRCODE_DUPLICATE_DATABASE),
- errmsg("database \"%s\" already exists", dbname)));
+	{
+		Oid	check_dboid = get_database_oid(dbname, true);
+		if (OidIsValid(check_dboid))
+		{
+			if (!stmt->if_not_exists)
+ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_DATABASE),
+	 errmsg("database \"%s\" already exists", dbname)));
+
+			ereport(NOTICE,
+	(errcode(ERRCODE_DUPLICATE_DATABASE),
+	 errmsg("database \"%s\" already exists, skipping", dbname)));
+			return check_dboid;
+		}
+	}
 
 	/*
 	 * The source DB can't have any active backends, except this one
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d4f8455a2b..83ead22931 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -4045,6 +4045,7 @@ _copyCreatedbStmt(const CreatedbStmt *from)
 
 	COPY_STRING_FIELD(dbname);
 	COPY_NODE_FIELD(options);
+	COPY_SCALAR_FIELD(if_not_exists);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index f1002afe7a..f9a89fa4a8 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1722,6 +1722,7 @@ _equalCreatedbStmt(const CreatedbStmt *a, const CreatedbStmt *b)
 {
 	COMPARE_STRING_FIELD(dbname);
 	COMPARE_NODE_FIELD(options);
+	COMPARE_SCALAR_FIELD(if_not_exists);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a03b33b53b..72e4f642d9 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10395,6 +10395,15 @@ CreatedbStmt:
 	CreatedbStmt *n = makeNode(CreatedbStmt);
 	n->dbname = $3;
 	n->options = $5;
+	n->if_not_exists = false;
+	$$ = (Node *)n;
+}
+			| CREATE DATABASE IF_P NOT EXISTS name opt_with createdb_opt_list
+{
+	CreatedbStmt *n = makeNode(CreatedbStmt);
+	n->dbname = $6;
+	n->options = $8;
+	n->if_not_exists = true;
 	$$ = (Node *)n;
 }
 		;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 1617702d9d..71c828ac4d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3295,6 +3295,7 @@ typedef struct CreatedbStmt
 	NodeTag		type;
 	char	   *dbname;			/* name of database to create */
 	List	   *options;		/* List of DefElem nodes */
+	boolif_not_exists;  /* just do nothing if it already exists? */
 } CreatedbStmt;
 
 /* --
-- 
2.25.1



Re: Support for grabbing multiple consecutive values with nextval()

2022-02-27 Thread Jille Timmermans

On 2022-02-27 14:22, Julien Rouhaud wrote:

Hi,

On Sun, Feb 27, 2022 at 10:42:25AM +0100, Jille Timmermans wrote:


First time PostgreSQL contributor here :)


Welcome!

Thanks!



I wanted to be able to allocate a bunch of numbers from a sequence at 
once.
Multiple people seem to be struggling with this 
(https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence,

https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/).

I propose to add an extra argument to nextval() that specifies how 
many

numbers you want to allocate (default 1).

The attached patch (based on master) passes `./configure 
--enable-cassert
--enable-debug && make && make check`, including the newly added 
regression

tests.

It does change the signature of nextval_internal(), not sure if that's
considered backwards compatibility breaking (for extensions?).


Please register this patch to the next commit fest (and last for pg15
inclusion) at https://commitfest.postgresql.org/37/ if not done 
already.
Done: https://commitfest.postgresql.org/37/3577/ (I was waiting for 
mailman approval before I got the thread id.)





Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-27 Thread Bharath Rupireddy
On Fri, Feb 25, 2022 at 8:38 PM Nitin Jadhav
 wrote:

Had a quick look over the v3 patch. I'm not sure if it's the best way
to have pg_stat_get_progress_checkpoint_type,
pg_stat_get_progress_checkpoint_kind and
pg_stat_get_progress_checkpoint_start_time just for printing info in
readable format in pg_stat_progress_checkpoint. I don't think these
functions will ever be useful for the users.

1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
or checkpoint instead of having a new function
pg_stat_get_progress_checkpoint_type?

2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks
directly instead of new function pg_stat_get_progress_checkpoint_kind?
+ snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s",
+ (flags == 0) ? "unknown" : "",
+ (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
+ (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
+ (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "",
+ (flags & CHECKPOINT_FORCE) ? "force " : "",
+ (flags & CHECKPOINT_WAIT) ? "wait " : "",
+ (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "",
+ (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "",
+ (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : "");

3) Why do we need this extra calculation for start_lsn? Do you ever
see a negative LSN or something here?
+('0/0'::pg_lsn + (
+CASE
+WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric)
+ELSE (0)::numeric
+END + (s.param3)::numeric)) AS start_lsn,

4) Can't you use timestamptz_in(to_char(s.param4))  instead of
pg_stat_get_progress_checkpoint_start_time? I don't quite understand
the reasoning for having this function and it's named as *checkpoint*
when it doesn't do anything specific to the checkpoint at all?

Having 3 unnecessary functions that aren't useful to the users at all
in proc.dat will simply eatup the function oids IMO. Hence, I suggest
let's try to do without extra functions.

Regards,
Bharath Rupireddy.




Re: Add index scan progress to pg_stat_progress_vacuum

2022-02-27 Thread Imseih (AWS), Sami
> On Wed, Feb 23, 2022 at 10:02 AM Imseih (AWS), Sami  
wrote:
>> If the failsafe kicks in midway through a vacuum, the number 
indexes_total will not be reset to 0. If INDEX_CLEANUP is turned off, then the 
value will be 0 at the start of the vacuum.
>
> The way that this works with num_index_scans is that we "round up"
> when there has been non-zero work in lazy_vacuum_all_indexes(), but
> not if the precheck in lazy_vacuum_all_indexes() fails. That seems
> like a good model to generalize from here. Note that this makes
> INDEX_CLEANUP=off affect num_index_scans in much the same way as a
> VACUUM where the failsafe kicks in very early, during the initial heap
> pass. That is, if the failsafe kicks in before we reach lazy_vacuum()
> for the first time (which is not unlikely), or even in the
> lazy_vacuum_all_indexes() precheck, then num_index_scans will remain
> at 0, just like INDEX_CLEANUP=off.
>
> The actual failsafe WARNING shows num_index_scans, possibly before it
> gets incremented one last time (by "rounding up"). So it's reasonably
> clear how this all works from that context (assuming that the
> autovacuum logging stuff, which reports num_index_scans, outputs a
> report for a table where the failsafe kicked in).

>I am confused.  If failsafe kicks in during the middle of a vacuum, I
>(perhaps naively) would expect indexes_total and indexes_processed to go to
>zero, and I'd expect to no longer see the "vacuuming indexes" and "cleaning
>up indexes" phases.  Otherwise, how would I know that we are now skipping
>indexes?  Of course, you won't have any historical context about the index
>work done before failsafe kicked in, but IMO it is misleading to still
>include it in the progress view.

Failsafe occurring in the middle of a vacuum and resetting "indexes_total" to 0 
will be misleading. I am thinking that it is a better idea to expose only one 
column "indexes_remaining".

If index_cleanup is set to OFF, the values of indexes_remaining will be 0 at 
the start of the vacuum.
If failsafe kicks in during a vacuum in-progress, "indexes_remaining" will be 
calculated to 0.

This approach will provide a progress based on how many indexes remaining with 
no ambiguity.

--
Sami Imseih
Amazon Web Services



Re: support for MERGE

2022-02-27 Thread Alvaro Herrera
On 2022-Jan-28, Andres Freund wrote:

> Any chance you could split this into something more reviewable? The overall
> diff stats are: 102 files changed, 8589 insertions(+), 234 deletions(-) thats
> pretty hard to really review. Incremental commits don't realy help with that.

I'll work on splitting this next week.

> One thing from skimming: There's not enough documentation about the approach
> imo - it's a complicated enough feature to deserve more than the one paragraph
> in src/backend/executor/README.

Sure, I'll have a look at that.

> I'm still [1],[2] uncomfortable with execMerge.c kinda-but-not-really being an
> executor node.

I think we should make a decision on code arrangement here.  From my
perspective, MERGE isn't its own executor node; rather it's just another
"method" in ModifyTable.  Which makes sense, given that all it does is
call parts of INSERT, UPDATE, DELETE which are the other ModifyTable
methods.  Having a separate file doesn't strike me as great, but on the
other hand it's true that merely moving all the execMerge.c code into
nodeModifyTable.c makes the latter too large.  However I don't want to
create a .h file that means exposing all those internal functions to the
outside world.  My ideal would be to have each INSERT, UPDATE, DELETE,
MERGE as its own separate .c file, which would be #included from
nodeModifyTable.c.  We don't use that pattern anywhere though.  Any
opposition to that?  (The prototypes for each file would have to live in
nodeModifyTable.c itself.)


> > diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> > index 1a9c1ac290..280ac40e63 100644
> > --- a/src/backend/commands/trigger.c
> > +++ b/src/backend/commands/trigger.c
> 
> This stuff seems like one candidate for splitting out.

Yeah, I had done that.  It's now posted as 0001.

> > +   /*
> > +* We maintain separate transition tables for UPDATE/INSERT/DELETE since
> > +* MERGE can run all three actions in a single statement. Note that 
> > UPDATE
> > +* needs both old and new transition tables whereas INSERT needs only 
> > new,
> > +* and DELETE needs only old.
> > +*/
> > +
> > +   /* "old" transition table for UPDATE, if any */
> > +   Tuplestorestate *old_upd_tuplestore;
> > +   /* "new" transition table for UPDATE, if any */
> > +   Tuplestorestate *new_upd_tuplestore;
> > +   /* "old" transition table for DELETE, if any */
> > +   Tuplestorestate *old_del_tuplestore;
> > +   /* "new" transition table INSERT, if any */
> > +   Tuplestorestate *new_ins_tuplestore;
> > +
> > TupleTableSlot *storeslot;  /* for converting to tuplestore's 
> > format */
> >  };
> 
> Do we need to do something slightly clever about the memory limits for the
> tuplestore? Each of them having a separate work_mem makes nodeModifyTable.c
> one memory hungry node (the worst of all maybe?).

Not sure how that would work.  The memory handling is inside
tuplestore.c IIRC ... I think that would require some largish
refactoring.  Merely dividing by four doesn't seem like a great answer
either.  Perhaps we could divide by two and be optimistic about it.

> > +   /*
> > +* Project the tuple.  In case of a partitioned 
> > table, the
> > +* projection was already built to use the 
> > root's descriptor,
> > +* so we don't need to map the tuple here.
> > +*/
> > +   actionInfo.actionState = action;
> > +   insert_slot = ExecProject(action->mas_proj);
> > +
> > +   (void) ExecInsert(mtstate, rootRelInfo,
> > + insert_slot, 
> > slot,
> > + estate, 
> > &actionInfo,
> > + 
> > mtstate->canSetTag);
> > +   InstrCountFiltered1(&mtstate->ps, 1);
> 
> What happens if somebody concurrently inserts a conflicting row?

An open question to which I don't have any good answer RN.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)




Re: support for MERGE

2022-02-27 Thread Tom Lane
Alvaro Herrera  writes:
> I think we should make a decision on code arrangement here.  From my
> perspective, MERGE isn't its own executor node; rather it's just another
> "method" in ModifyTable.  Which makes sense, given that all it does is
> call parts of INSERT, UPDATE, DELETE which are the other ModifyTable
> methods.  Having a separate file doesn't strike me as great, but on the
> other hand it's true that merely moving all the execMerge.c code into
> nodeModifyTable.c makes the latter too large.  However I don't want to
> create a .h file that means exposing all those internal functions to the
> outside world.  My ideal would be to have each INSERT, UPDATE, DELETE,
> MERGE as its own separate .c file, which would be #included from
> nodeModifyTable.c.  We don't use that pattern anywhere though.  Any
> opposition to that?  (The prototypes for each file would have to live in
> nodeModifyTable.c itself.)

Yeah, I don't like that.  The point of having separate .c files is that
you know that there's no interactions of non-global symbols across files.
This pattern breaks that assumption, making it harder to see what's
connected to what; and what's it buying exactly?  I'd rather keep all the
ModifyTable code in one .c file, even if that one is bigger than our
usual practice.

regards, tom lane




Re: CREATE DATABASE IF NOT EXISTS in PostgreSQL

2022-02-27 Thread Tom Lane
Japin Li  writes:
> Why don't support CREATE DATABASE IF NOT EXISTS syntax in PostgreSQL?

FWIW, I'm generally hostile to CREATE IF NOT EXISTS semantics across
the board, because of its exceedingly squishy semantics: it ensures
that an object by that name exists, but you have exactly no guarantees
about its properties or contents.  The more complex the object, the
bigger that problem becomes ... and a whole database is the most
complex sort of object we have.  So IMV, the fact that we don't have
this "feature" is a good thing.

We do have DROP DATABASE IF EXISTS, and I think using that followed
by CREATE is a much better-defined approach.

regards, tom lane




Re: Commitfest manager for 2022-03

2022-02-27 Thread Euler Taveira
On Sat, Feb 26, 2022, at 9:37 PM, Justin Pryzby wrote:
> |https://commitfest.postgresql.org/37/2906/
> |Row filtering for logical replication
> If I'm not wrong, this is merged and should be closed?
I think Amit forgot to mark it as committed. Done.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: support for MERGE

2022-02-27 Thread Zhihong Yu
On Sun, Feb 27, 2022 at 9:25 AM Alvaro Herrera 
wrote:

> I attach v12 of MERGE.  Significant effort went into splitting
> ExecUpdate and ExecDelete into parts that can be reused from the MERGE
> routines in a way that doesn't involve jumping out in the middle of
> TM_Result processing to merge-specific EvalPlanQual handling.  So in
> terms of code flow, this is much cleaner.  It does mean that MERGE now
> has to call three routines instead of just one, but that doesn't seem a
> big deal.
>
> So now the main ExecUpdate first has to call ExecUpdatePrologue, then
> ExecUpdateAct, and complete with ExecUpdateEpilogue.  In the middle of
> all this, it does its own thing to deal with foreign tables, and result
> processing for regular tables including EvalPlanQual.  ExecUpdate has
> been split similarly.
>
> So MERGE now does these three things, and then its own result
> processing.
>
> Now, naively patching in that way results in absurd argument lists for
> these routines, so I introduced a new ModifyTableContext struct to carry
> the context for each operation.  I think this is good, but perhaps it
> could stand some more improvement in terms of what goes in there and
> what doesn't.  It took me a while to find an arrangement that really
> worked.  (Initially I had resultRelInfo and the tuple slot and some
> flags for DELETE, but it turned out to be better to keep them out.)
>
> Regarding code arrangement: I decided that exporting all those
> ModifyTable internal functions was a bad idea, so I made it all static
> again.  I think the MERGE routines are merely additional ModifyTable
> methods; trying to make it a separate executor node doesn't seem like
> it'd be an improvement.  For now, I just made nodeModifyTable.c #include
> execMerge.c, though I'm not sure if moving all the code into
> nodeModifyTable.c would be a good idea, or whether we should create
> separate files for each of those methods.  Generally speaking I'm not
> enthusiastic about creating an exported API of these methods.  If we
> think nodeModifyTable.c is too large, maybe we can split it in smaller
> files and smash them together with #include "foo.c".  (If we do expose
> it in some .h then the ModifyTableContext would have to be exported as
> well, which doesn't sound too exciting.)
>
>
> Sadly, because I've been spending a lot of time preparing for an
> international move, I didn't have time to produce a split patch that
> would first restructure nodeModifyTable.c making this more reviewable,
> but I'll do that as soon as I'm able.  There is also a bit of a bug in a
> corner case of an UPDATE that involves a cross-partition tuple migration
> with another concurrent update.  I was unable to figure out how to fix
> that, so I commented out the affected line from merge-update.spec.
> Again, I'll get that fixed as soon as the dust has settled here.
>
> There are some review points that are still unaddressed, such as
> Andres' question of what happens in case of a concurrent INSERT.
>
> Thanks
>
> --
> Álvaro Herrera   39°49'30"S 73°17'W  —
> https://www.EnterpriseDB.com/
> "La fuerza no está en los medios físicos
> sino que reside en una voluntad indomable" (Gandhi)
>

Hi,

+   else if (node->operation == CMD_MERGE)
+   {
+   /* EXPLAIN ANALYZE display of actual outcome for each tuple
proposed */

I know the comment was copied. But it seems `proposed` should be
`processed`.

For ExecMergeNotMatched():

+   foreach(l, actionStates)
+   {
...
+   break;
+   }

If there is only one state in the List, maybe add an assertion for the
length of the actionStates List.

+   ModifyTableContext sc;

It seems the variable name can be more intuitive. How about naming it mtc
(or context, as what later code uses) ?

Cheers


Re: support for MERGE

2022-02-27 Thread Daniel Gustafsson
> On 27 Feb 2022, at 18:42, Tom Lane  wrote:

> I'd rather keep all the ModifyTable code in one .c file, even if that one is
> bigger than our usual practice.

Agreed, I also prefer a (too) large file over a set of .c #include's.

--
Daniel Gustafsson   https://vmware.com/





Re: Separate the result of \watch for each query execution (psql)

2022-02-27 Thread Noboru Saito
Hi,

2022年2月25日(金) 13:42 Pavel Stehule :
>
>
>
> pá 25. 2. 2022 v 5:23 odesílatel Noboru Saito  napsal:
>>
>> Hi,
>>
>> Pavel Stehule :
>> > > I strongly agree. It was a lot of work to find a workable solution for 
>> > > pspg. Special chars that starting result and maybe other, that ending 
>> > > result can significantly increase robustness and can reduce code. I 
>> > > think it can be better to use form feed at the end of form - like it is 
>> > > semantic of form feed. You know, at this moment, the result is complete. 
>> > > https://en.wikipedia.org/wiki/Page_break
>> >
>> > It's easier to print a form feed before the result, but it's okay at the 
>> > end.
>> >
>> > > I don't think using it by default can be the best. Lot of people don't 
>> > > use specialized pagers, but it can be set by \pset. Form feed should be 
>> > > used on end
>> > >
>> > > \pset formfeed [on, off]
>> >
>> > I think it's a good idea to be able to switch with \pset.
>>
>> I have created a patch that allows you to turn it on and off in \pset.
>> The attached patch adds the following features.
>>
>>
>> Formfeed can be turned on with the command line option or \pset.
>> Formfeed (\f\n) is output after the query execution result by \watch.
>>
>> I think the considerations are as follows.
>>
>> * Is formfeed output after the result, not before?
>
>
> We are talking about the first iteration. In the second and other iteration 
> this question has no sense. You know the starting point. You don't know the 
> endpoint. So I think so using formfeed on the end is good idea.

Yes. However, as you know, in the case of \ watch, there is a
difference in waiting before and after the specified time.

>> * Is the formfeed output only "\f\n"?
>
>
> yes
>
>>
>> * Is the formfeed output only executed by \watch?
>
>
> This is a good question. I think the implementation for \watch is a good 
> start. But it can help with normal results too. In pspg it can be very useful 
> for incremental load or for streaming mode. But if it will be used 
> everywhere, then it should be used just for some specified pagers.
>
>
>>
>> * Is the name "formfeed" appropriate?
>
>
> If it will do work of formfeed, then the formfeed is good name.
>
>>
>>
>> If the formfeed is output before the query result,
>> it will be better if the screen is reset when the formfeed is read.
>
>
> I think, so you propose another feature - reset terminal sequence - it can be 
> a good idea. Not for pspg, but generally, why not.

Yes, it is. I was wondering if I could just add formfeed or add more features.

Thank you.
Following your advice, I will register this for the commitfest
 (I am registering for the commitfest for the first time. Thank you
for your cooperation).




Re: Missed condition-variable wakeups on FreeBSD

2022-02-27 Thread Tom Lane
Andres Freund  writes:
> On 2022-02-26 14:07:05 -0500, Tom Lane wrote:
>> I have observed this three times in the REL_11 branch, once
>> in REL_12, and a couple of times last summer before it occurred
>> to me to start keeping notes.  Over that time the machine has
>> been running various patchlevels of FreeBSD 13.0.

> It's certainly interesting that it appears to happen only in the branches
> using poll rather than kqueue to implement latches. That changed between 12
> and 13.

Yeah, and there was no PHJ in v10, so that's a pretty good theory as
to why I've only seen it in those two branches.

> Have you tried running the core regression tests with force_parallel_mode =
> on, or with the parallel costs lowered, to see if that makes the problem
> appear more often?
> The next time this happens / if you still have this open, perhaps it could be
> worth checking if there's a byte in the self pipe?
> Besides trying to make the issue more likely as suggested above, it might be
> worth checking if signalling the stuck processes with SIGUSR1 gets them
> unstuck.

I've now wasted a bunch of kilowatt-hours fruitlessly trying to
reproduce this outside the confines of the buildfarm script.
I'm at a loss to figure out what the buildfarm is doing differently,
but apparently there's something.  I'm going to re-enable the
machine's buildfarm job and just wait for it to hang up again.
More info eventually ...

regards, tom lane




Re: Proposal: Support custom authentication methods using hooks

2022-02-27 Thread Jeff Davis
On Fri, 2022-02-25 at 14:10 -0500, Tom Lane wrote:
> I'm happy to add support for custom auth methods if they can use
> a protocol that's safer than cleartext-password.  But if that's the
> only feasible option, then we're just encouraging people to use
> insecure methods.

FWIW, I'd like to be able to use a token in the password field, and
then authenticate that token. So I didn't intend to send an actual
password in plaintext.

Maybe we should have a new "token" connection parameter so that libpq
can allow sending a token plaintext but refuse to send a password in
plaintext?

> I also have in mind here that there has been discussion of giving
> libpq a feature to refuse, on the client side, to send cleartext
> passwords.

I am generally in favor of that idea, but I'm not sure that will
completely resolve the issue. For instance, should it also refuse MD5?

Regards,
Jeff Davis






Re: Missed condition-variable wakeups on FreeBSD

2022-02-27 Thread Thomas Munro
On Sun, Feb 27, 2022 at 8:07 AM Tom Lane  wrote:
> I don't know much about how gdb interacts with kernel calls on
> FreeBSD, but I speculate that the poll(2) call returns with EINTR
> after gdb releases the process, and then things resume fine,

Yeah, at least FreeBSD and macOS interrupt system calls when you send
SIGSTOP + SIGCONT directly, or when gdb and lldb do something similar
via ptrace.  The same applies on Linux, except that Linux restarts the
system call automatically (even though the man page has this in the
list of system calls that never restart) so you don't usually notice
(though you can see it with strace).  It's not really clear to me what
should happen given the language around restarts is all about signal
handlers, and stop + cont are system magic, not signal handlers.
Anyway...

> suggesting that we lost an interrupt somewhere.

So it's happening on an i386 kernel, with WAIT_USE_POLL (not
WAIT_USE_KQUEUE), but only under the build farm script (not when you
ran it manually in loop)  hmmm.




Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-02-27 Thread Thomas Munro
On Sun, Feb 27, 2022 at 10:29 PM Julien Rouhaud  wrote:
> On Sun, Feb 27, 2022 at 06:10:45PM +0900, Michael Paquier wrote:
> > On Sat, Feb 26, 2022 at 01:39:42PM -0800, Andres Freund wrote:
> > > I suspect the easiest is to just convert that usleep to a WaitLatch(). 
> > > That'd
> > > require adding a new enum value to WaitEventTimeout in 14. Which probably 
> > > is
> > > fine?
> >
> > We've added wait events in back-branches in the past, so this does not
> > strike me as a problem as long as you add the new entry at the end of
> > the enum, while keeping things ordered on HEAD.
>
> +1

+1

Sleeps like these are also really bad for ProcSignalBarrier, which I
was expecting to be the impetus for fixing any remaining loops like
this.

With the attached, 027_stream_regress.pl drops from ~29.5s to ~19.6s
on my FreeBSD workstation!

It seems a little strange to introduce a new wait event that will very
often appear into a stable branch, but ... it is actually telling the
truth, so there is that.

The sleep/poll loop in RegisterSyncRequest() may also have another
problem.  The comment explains that it was a deliberate choice not to
do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't
think there's an excuse to ignore postmaster death in a loop that
presumably becomes infinite if the checkpointer exits.  I guess we
could do:

-   pg_usleep(1L);
+   WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10,
WAIT_EVENT_SYNC_REQUEST);

But... really, this should be waiting on a condition variable that the
checkpointer broadcasts on when the queue goes from full to not full,
no?  Perhaps for master only?
From 23abdf95dea74b914dc56a46739a63ff86035f51 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 28 Feb 2022 11:27:05 +1300
Subject: [PATCH] Wake up for latches in CheckpointWriteDelay().

The checkpointer shouldn't ignore its latch.  Other backends may be
waiting for it to drain the request queue.  Hopefully real systems don't
have a full queue often, but the condition is reached easily
when shared buffers is very small.

This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.

Back-patch only to 14.  Even though the problem exists in earlier
branches too, it's hard to hit there.  In 14 we stopped using signal
handers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).

Reported-by: Andres Freund 
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml| 4 
 src/backend/postmaster/checkpointer.c   | 5 -
 src/backend/utils/activity/wait_event.c | 3 +++
 src/include/utils/wait_event.h  | 1 +
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf7625d988..9dc3978f35 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2236,6 +2236,10 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   BaseBackupThrottle
   Waiting during base backup when throttling activity.
  
+ 
+  CheckpointerWriteDelay
+  Waiting between writes while performing a checkpoint.
+ 
  
   PgSleep
   Waiting due to a call to pg_sleep or
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4488e3a443..9d9aad166e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -726,7 +726,10 @@ CheckpointWriteDelay(int flags, double progress)
 		 * Checkpointer and bgwriter are no longer related so take the Big
 		 * Sleep.
 		 */
-		pg_usleep(10L);
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+  100,
+  WAIT_EVENT_CHECKPOINT_WRITE_DELAY);
+		ResetLatch(MyLatch);
 	}
 	else if (--absorb_counter <= 0)
 	{
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 60972c3a75..0706e922b5 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -485,6 +485,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
 		case WAIT_EVENT_BASE_BACKUP_THROTTLE:
 			event_name = "BaseBackupThrottle";
 			break;
+		case WAIT_EVENT_CHECKPOINT_WRITE_DELAY:
+			event_name = "CheckpointWriteDelay";
+			break;
 		case WAIT_EVENT_PG_SLEEP:
 			event_name = "PgSleep";
 			break;
diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index 395d325c5f..d0345c6b49 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -141,6 +141,7 @@ typedef enum
 typedef enum
 {
 	WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
+	WAIT_EVENT_CHECKPOINT_WRITE_DELAY,
 	WAIT_EVENT_PG_SLEEP,
 	WAIT_EVENT_RECOVERY_APPLY_D

Re: CREATE DATABASE IF NOT EXISTS in PostgreSQL

2022-02-27 Thread Japin Li


On Mon, 28 Feb 2022 at 01:53, Tom Lane  wrote:
> Japin Li  writes:
>> Why don't support CREATE DATABASE IF NOT EXISTS syntax in PostgreSQL?
>
> FWIW, I'm generally hostile to CREATE IF NOT EXISTS semantics across
> the board, because of its exceedingly squishy semantics: it ensures
> that an object by that name exists, but you have exactly no guarantees
> about its properties or contents.

Thanks for the explanation!  I think it is the database user who should
guarantee the properties and contents of a database.
CREATE IF NOT EXISTS is just a syntax sugar.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Masahiko Sawada
On Sat, Feb 26, 2022 at 11:51 AM Amit Kapila  wrote:
>
> On Thu, Feb 24, 2022 at 9:20 PM Masahiko Sawada  wrote:
> >
>
> I have reviewed the latest version and made a few changes along with
> fixing some of the pending comments by Peter Smith.

Thank you for updating the patch!

> The changes are as
> follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as
> that is not required now; (b) changed the struct name
> PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it
> similar to DropDb; (c) changed the view name to
> pg_stat_subscription_stats, we can reconsider it in future if there is
> a consensus on some other name, accordingly changed the reset function
> name to pg_stat_reset_subscription_stats; (d) moved some of the newly
> added subscription stats functions adjacent to slots to main the
> consistency in code; (e) changed comments at few places;

Agreed.

> (f) added
> LATERAL back to system_views query as we refer pg_subscription's oid
> in the function call, previously that was not clear.

I think LATERAL is still unnecessary as you pointed out before. The
documentation[1] says,

LATERAL can also precede a function-call FROM item, but in this case
it is a noise word, because the function expression can refer to
earlier FROM items in any case.

The rest looks good to me.

Regards,

[1] https://www.postgresql.org/docs/devel/sql-select.html


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-02-27 Thread Andres Freund
Hi, 

On February 27, 2022 4:19:21 PM PST, Thomas Munro  
wrote:
>With the attached, 027_stream_regress.pl drops from ~29.5s to ~19.6s
>on my FreeBSD workstation!

That's impressive - wouldn't have guessed it to make that much of a difference. 
I assume that running the tests on freebsd for an older pg with a similar s_b & 
max_wal_size  doesn't benefit as much? I wonder how much windows will improve.


>It seems a little strange to introduce a new wait event that will very
>often appear into a stable branch, but ... it is actually telling the
>truth, so there is that.

In the back branches it needs to be at the end of the enum - I assume you 
intended that just to be for HEAD.

I wonder whether in HEAD we shouldn't make that sleep duration be computed from 
the calculation in IsOnSchedule...


>The sleep/poll loop in RegisterSyncRequest() may also have another
>problem.  The comment explains that it was a deliberate choice not to
>do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't
>think there's an excuse to ignore postmaster death in a loop that
>presumably becomes infinite if the checkpointer exits.  I guess we
>could do:
>
>-   pg_usleep(1L);
>+   WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10,
>WAIT_EVENT_SYNC_REQUEST);
>
>But... really, this should be waiting on a condition variable that the
>checkpointer broadcasts on when the queue goes from full to not full,
>no?  Perhaps for master only?

Looks worth improving, but yes, I'd not do it in the back branches. 

I do think it's worth giving that sleep a proper wait event though, even in the 
back branches.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?

2022-02-27 Thread Kyotaro Horiguchi
At Sat, 26 Feb 2022 12:11:15 +0900, Michael Paquier  wrote 
in 
> On Fri, Feb 25, 2022 at 01:09:53PM -0800, Nathan Bossart wrote:
> > This one has been quiet for a while.  Should we mark it as
> > returned-with-feedback?
> 
> Yes, that's my feeling and I got cold feet about this change.  This
> patch would bring some extra visibility for something that's not
> incorrect either on HEAD, as end-of-recovery checkpoints are the same
> things as shutdown checkpoints.  And there is an extra argument where
> back-patching would become a bit more tricky in an area that's already
> a lot sensitive.

That sounds like we should reject the patch as we don't agree to its
objective.  If someday end-of-recovery checkpoints functionally
diverge from shutdown checkpoints but leave (somehow) the transition
alone, we may visit this again but it would be another proposal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?

2022-02-27 Thread Michael Paquier
On Mon, Feb 28, 2022 at 10:51:06AM +0900, Kyotaro Horiguchi wrote:
> That sounds like we should reject the patch as we don't agree to its
> objective.  If someday end-of-recovery checkpoints functionally
> diverge from shutdown checkpoints but leave (somehow) the transition
> alone, we may visit this again but it would be another proposal.

The patch has been already withdrawn in the CF app.
--
Michael


signature.asc
Description: PGP signature


Re: pg_stat_statements: remove redundant function call in pg_stat_statements_internal

2022-02-27 Thread Michael Paquier
On Sun, Feb 27, 2022 at 09:08:56PM +0900, Michael Paquier wrote:
> Yes, let's clean up that on HEAD.  No objections from here.  I'll do
> that tomorrow or so.

And done.
--
Michael


signature.asc
Description: PGP signature


Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Saturday, February 26, 2022 11:51 AM Amit Kapila  
> wrote:
> > I have reviewed the latest version and made a few changes along with fixing
> > some of the pending comments by Peter Smith. The changes are as
> > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as that is
> > not required now; (b) changed the struct name PgStat_MsgSubscriptionPurge
> > to PgStat_MsgSubscriptionDrop to make it similar to DropDb; (c) changed the
> > view name to pg_stat_subscription_stats, we can reconsider it in future if 
> > there
> > is a consensus on some other name, accordingly changed the reset function
> > name to pg_stat_reset_subscription_stats; (d) moved some of the newly
> > added subscription stats functions adjacent to slots to main the 
> > consistency in
> > code; (e) changed comments at few places; (f) added LATERAL back to
> > system_views query as we refer pg_subscription's oid in the function call,
> > previously that was not clear.
> >
> > Do let me know what you think of the attached?
> Hi, thank you for updating the patch !
>
>
> I have a couple of comments on v4.
>
> (1)
>
> I'm not sure if I'm correct, but I'd say the sync_error_count
> can come next to the subname as the order of columns.
> I felt there's case that the column order is somewhat
> related to the time/processing order (I imagined
> pg_stat_replication's LSN related columns).
> If this was right, table sync related column could be the
> first column as a counter within this patch.
>

I am not sure if there is such a correlation but even if it is there
it doesn't seem to fit here completely as sync errors can happen after
apply errors in multiple ways like via Alter Subscription ... Refresh
...

So, I don't see the need to change the order here. What do you or others think?

>
> (2) doc/src/sgml/monitoring.sgml
>
> +Resets statistics for a single subscription shown in the
> +pg_stat_subscription_stats view to zero. If
> +the argument is NULL, reset statistics for all
> +subscriptions.
> 
>
> I felt we could improve the first sentence.
>
> From:
> Resets statistics for a single subscription shown in the..
>
> To(idea1):
> Resets statistics for a single subscription defined by the argument to zero.
>

Okay, I can use this one.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Masahiko Sawada
On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila  wrote:
>
> On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Saturday, February 26, 2022 11:51 AM Amit Kapila 
> >  wrote:
> > > I have reviewed the latest version and made a few changes along with 
> > > fixing
> > > some of the pending comments by Peter Smith. The changes are as
> > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as that 
> > > is
> > > not required now; (b) changed the struct name PgStat_MsgSubscriptionPurge
> > > to PgStat_MsgSubscriptionDrop to make it similar to DropDb; (c) changed 
> > > the
> > > view name to pg_stat_subscription_stats, we can reconsider it in future 
> > > if there
> > > is a consensus on some other name, accordingly changed the reset function
> > > name to pg_stat_reset_subscription_stats; (d) moved some of the newly
> > > added subscription stats functions adjacent to slots to main the 
> > > consistency in
> > > code; (e) changed comments at few places; (f) added LATERAL back to
> > > system_views query as we refer pg_subscription's oid in the function call,
> > > previously that was not clear.
> > >
> > > Do let me know what you think of the attached?
> > Hi, thank you for updating the patch !
> >
> >
> > I have a couple of comments on v4.
> >
> > (1)
> >
> > I'm not sure if I'm correct, but I'd say the sync_error_count
> > can come next to the subname as the order of columns.
> > I felt there's case that the column order is somewhat
> > related to the time/processing order (I imagined
> > pg_stat_replication's LSN related columns).
> > If this was right, table sync related column could be the
> > first column as a counter within this patch.
> >
>
> I am not sure if there is such a correlation but even if it is there
> it doesn't seem to fit here completely as sync errors can happen after
> apply errors in multiple ways like via Alter Subscription ... Refresh
> ...
>
> So, I don't see the need to change the order here. What do you or others 
> think?

I'm also not sure about it, both sound good to me. Probably we can
change the order later.

>
> >
> > (2) doc/src/sgml/monitoring.sgml
> >
> > +Resets statistics for a single subscription shown in the
> > +pg_stat_subscription_stats view to zero. 
> > If
> > +the argument is NULL, reset statistics for all
> > +subscriptions.
> > 
> >
> > I felt we could improve the first sentence.
> >
> > From:
> > Resets statistics for a single subscription shown in the..
> >
> > To(idea1):
> > Resets statistics for a single subscription defined by the argument to zero.
> >
>
> Okay, I can use this one.

Are you going to remove the part "shown in the
pg_stat_subsctiption_stats view"? I think it's better to keep it in
order to make it clear which statistics the function resets as we have
pg_stat_subscription and pg_stat_subscription_stats.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada  wrote:
>
> On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila  wrote:
> >
> > >
> > > (2) doc/src/sgml/monitoring.sgml
> > >
> > > +Resets statistics for a single subscription shown in the
> > > +pg_stat_subscription_stats view to 
> > > zero. If
> > > +the argument is NULL, reset statistics for all
> > > +subscriptions.
> > > 
> > >
> > > I felt we could improve the first sentence.
> > >
> > > From:
> > > Resets statistics for a single subscription shown in the..
> > >
> > > To(idea1):
> > > Resets statistics for a single subscription defined by the argument to 
> > > zero.
> > >
> >
> > Okay, I can use this one.
>
> Are you going to remove the part "shown in the
> pg_stat_subsctiption_stats view"? I think it's better to keep it in
> order to make it clear which statistics the function resets as we have
> pg_stat_subscription and pg_stat_subscription_stats.
>

How about the following:
"Resets statistics for a single subscription defined by the argument
shown in the pg_stat_subscription_stats view
to zero. If the argument is NULL, reset statistics
for all subscriptions."

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Masahiko Sawada
On Mon, Feb 28, 2022 at 11:52 AM Amit Kapila  wrote:
>
> On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada  wrote:
> >
> > On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila  
> > wrote:
> > >
> > > >
> > > > (2) doc/src/sgml/monitoring.sgml
> > > >
> > > > +Resets statistics for a single subscription shown in the
> > > > +pg_stat_subscription_stats view to 
> > > > zero. If
> > > > +the argument is NULL, reset statistics for 
> > > > all
> > > > +subscriptions.
> > > > 
> > > >
> > > > I felt we could improve the first sentence.
> > > >
> > > > From:
> > > > Resets statistics for a single subscription shown in the..
> > > >
> > > > To(idea1):
> > > > Resets statistics for a single subscription defined by the argument to 
> > > > zero.
> > > >
> > >
> > > Okay, I can use this one.
> >
> > Are you going to remove the part "shown in the
> > pg_stat_subsctiption_stats view"? I think it's better to keep it in
> > order to make it clear which statistics the function resets as we have
> > pg_stat_subscription and pg_stat_subscription_stats.
> >
>
> How about the following:
> "Resets statistics for a single subscription defined by the argument
> shown in the pg_stat_subscription_stats view
> to zero. If the argument is NULL, reset statistics
> for all subscriptions."

Sounds good but I'm not sure it's correct in terms of English grammar.
Shouldn't it be something like "subscription that is defined by the
argument and shown in the pg_stat_subscription_stats"?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread osumi.takami...@fujitsu.com
On Monday, February 28, 2022 11:34 AM Amit Kapila  
wrote:
> On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Saturday, February 26, 2022 11:51 AM Amit Kapila
>  wrote:
> > > I have reviewed the latest version and made a few changes along with
> > > fixing some of the pending comments by Peter Smith. The changes are
> > > as
> > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as
> > > that is not required now; (b) changed the struct name
> > > PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it
> > > similar to DropDb; (c) changed the view name to
> > > pg_stat_subscription_stats, we can reconsider it in future if there
> > > is a consensus on some other name, accordingly changed the reset
> > > function name to pg_stat_reset_subscription_stats; (d) moved some of
> > > the newly added subscription stats functions adjacent to slots to
> > > main the consistency in code; (e) changed comments at few places;
> > > (f) added LATERAL back to system_views query as we refer
> pg_subscription's oid in the function call, previously that was not clear.
> > >
> > > Do let me know what you think of the attached?
> > Hi, thank you for updating the patch !
> > I have a couple of comments on v4.
> >
> > (1)
> >
> > I'm not sure if I'm correct, but I'd say the sync_error_count can come
> > next to the subname as the order of columns.
> > I felt there's case that the column order is somewhat related to the
> > time/processing order (I imagined pg_stat_replication's LSN related
> > columns).
> > If this was right, table sync related column could be the first column
> > as a counter within this patch.
> >
> 
> I am not sure if there is such a correlation but even if it is there it 
> doesn't seem
> to fit here completely as sync errors can happen after apply errors in 
> multiple
> ways like via Alter Subscription ... Refresh ...
> 
> So, I don't see the need to change the order here. What do you or others 
> think?
In the alter subscription case, any errors after the table sync would increment
apply_error_count.

I mentioned this, because this point of view would impact on the doc read by 
users
and internal source codes for developers.
I had a concern that when we extend and increase a lot of statistics (not only 
for this view,
but also other statistics in general), writing doc for statistics needs some 
alignment for better
readability.

*But*, as you mentioned, in case we don't have such a correlation, I'm okay 
with the current patch.
Thank you for replying.


Best Regards,
Takamichi Osumi



Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Peter Smith
Below are my comments for the v4 patch.

These are only nitpicking comments now; Otherwise, it LGTM.

(Sorry, now I see there are some overlaps with comments posted in the
last 20 mins so take or leave these as you wish)

==

1. doc/src/sgml/monitoring.sgml

-  
-   OID of the relation that the worker was processing when the
-   error occurred
+   Number of times an error occurred during the application of changes
   
  

BEFORE
Number of times an error occurred during the application of changes
SUGGESTED
Number of times an error occurred while applying changes

~~~

2. doc/src/sgml/monitoring.sgml

+Resets statistics for a single subscription shown in the
+pg_stat_subscription_stats view to zero. If
+the argument is NULL, reset statistics for all
+subscriptions.


SUGGESTED (simpler description, more similar to pg_stat_reset_replication_slot)
Reset statistics to zero for a single subscription. If the argument is
NULL, reset statistics for all subscriptions.

~~~

3. src/backend/replication/logical/worker.c - comment

+ /*  */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

BEFORE
Report the worker failed during the application of the change
SUGGESTED
Report the worker failed while applying changes

~~~

4. src/include/pgstat.h - comment

+typedef struct PgStat_MsgResetsubcounter
+{
+ PgStat_MsgHdr m_hdr;
+ Oid m_subid; /* InvalidOid for clearing all subscription
+ * stats */
+} PgStat_MsgResetsubcounter;

BEFORE
InvalidOid for clearing all subscription stats
SUGGESTED
InvalidOid means reset all subscription stats

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Commitfest manager for 2022-03

2022-02-27 Thread Amit Kapila
On Sun, Feb 27, 2022 at 11:37 PM Euler Taveira  wrote:
>
> On Sat, Feb 26, 2022, at 9:37 PM, Justin Pryzby wrote:
>
> |https://commitfest.postgresql.org/37/2906/
> |Row filtering for logical replication
> If I'm not wrong, this is merged and should be closed?
>
> I think Amit forgot to mark it as committed. Done.
>

I was waiting for some build farm cycles to complete but it is fine to
mark it now.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Mon, Feb 28, 2022 at 8:59 AM Peter Smith  wrote:
>
>
> 2. doc/src/sgml/monitoring.sgml
>
> +Resets statistics for a single subscription shown in the
> +pg_stat_subscription_stats view to zero. If
> +the argument is NULL, reset statistics for all
> +subscriptions.
> 
>
> SUGGESTED (simpler description, more similar to 
> pg_stat_reset_replication_slot)
> Reset statistics to zero for a single subscription. If the argument is
> NULL, reset statistics for all subscriptions.
>

As discussed, it is better to keep the view name in this description
important as we have another view (pg_stat_susbcription) as well. So,
I am planning to retain the current wording.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Mon, Feb 28, 2022 at 8:49 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, February 28, 2022 11:34 AM Amit Kapila  
> wrote:
> > On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Saturday, February 26, 2022 11:51 AM Amit Kapila
> >  wrote:
> > > > I have reviewed the latest version and made a few changes along with
> > > > fixing some of the pending comments by Peter Smith. The changes are
> > > > as
> > > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as
> > > > that is not required now; (b) changed the struct name
> > > > PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it
> > > > similar to DropDb; (c) changed the view name to
> > > > pg_stat_subscription_stats, we can reconsider it in future if there
> > > > is a consensus on some other name, accordingly changed the reset
> > > > function name to pg_stat_reset_subscription_stats; (d) moved some of
> > > > the newly added subscription stats functions adjacent to slots to
> > > > main the consistency in code; (e) changed comments at few places;
> > > > (f) added LATERAL back to system_views query as we refer
> > pg_subscription's oid in the function call, previously that was not clear.
> > > >
> > > > Do let me know what you think of the attached?
> > > Hi, thank you for updating the patch !
> > > I have a couple of comments on v4.
> > >
> > > (1)
> > >
> > > I'm not sure if I'm correct, but I'd say the sync_error_count can come
> > > next to the subname as the order of columns.
> > > I felt there's case that the column order is somewhat related to the
> > > time/processing order (I imagined pg_stat_replication's LSN related
> > > columns).
> > > If this was right, table sync related column could be the first column
> > > as a counter within this patch.
> > >
> >
> > I am not sure if there is such a correlation but even if it is there it 
> > doesn't seem
> > to fit here completely as sync errors can happen after apply errors in 
> > multiple
> > ways like via Alter Subscription ... Refresh ...
> >
> > So, I don't see the need to change the order here. What do you or others 
> > think?
> In the alter subscription case, any errors after the table sync would 
> increment
> apply_error_count.
>

Sure, but the point I was trying to explain was that there is no
certainty in the order of these errors.

-- 
With Regards,
Amit Kapila.




RE: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread osumi.takami...@fujitsu.com
On Monday, February 28, 2022 12:57 PM Amit Kapila 
> On Mon, Feb 28, 2022 at 8:49 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Monday, February 28, 2022 11:34 AM Amit Kapila
>  wrote:
> > > On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> > > > On Saturday, February 26, 2022 11:51 AM Amit Kapila
> > >  wrote:
> > > > > I have reviewed the latest version and made a few changes along
> > > > > with fixing some of the pending comments by Peter Smith. The
> > > > > changes are as
> > > > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError
> > > > > as that is not required now; (b) changed the struct name
> > > > > PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to
> > > > > make it similar to DropDb; (c) changed the view name to
> > > > > pg_stat_subscription_stats, we can reconsider it in future if
> > > > > there is a consensus on some other name, accordingly changed the
> > > > > reset function name to pg_stat_reset_subscription_stats; (d)
> > > > > moved some of the newly added subscription stats functions
> > > > > adjacent to slots to main the consistency in code; (e) changed
> > > > > comments at few places;
> > > > > (f) added LATERAL back to system_views query as we refer
> > > pg_subscription's oid in the function call, previously that was not clear.
> > > > >
> > > > > Do let me know what you think of the attached?
> > > > Hi, thank you for updating the patch !
> > > > I have a couple of comments on v4.
> > > >
> > > > (1)
> > > >
> > > > I'm not sure if I'm correct, but I'd say the sync_error_count can
> > > > come next to the subname as the order of columns.
> > > > I felt there's case that the column order is somewhat related to
> > > > the time/processing order (I imagined pg_stat_replication's LSN
> > > > related columns).
> > > > If this was right, table sync related column could be the first
> > > > column as a counter within this patch.
> > > >
> > >
> > > I am not sure if there is such a correlation but even if it is there
> > > it doesn't seem to fit here completely as sync errors can happen
> > > after apply errors in multiple ways like via Alter Subscription ... 
> > > Refresh ...
> > >
> > > So, I don't see the need to change the order here. What do you or others
> think?
> > In the alter subscription case, any errors after the table sync would
> > increment apply_error_count.
> >
> 
> Sure, but the point I was trying to explain was that there is no certainty in 
> the
> order of these errors.
I got it. Thank you so much for your explanation.


I don't have other new comments on this patch.
It looks good to me as well.


Best Regards,
Takamichi Osumi



Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada  wrote:
>
> On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila  wrote:
> >
> > >
> > > (2) doc/src/sgml/monitoring.sgml
> > >
> > > +Resets statistics for a single subscription shown in the
> > > +pg_stat_subscription_stats view to 
> > > zero. If
> > > +the argument is NULL, reset statistics for all
> > > +subscriptions.
> > > 
> > >
> > > I felt we could improve the first sentence.
> > >
> > > From:
> > > Resets statistics for a single subscription shown in the..
> > >
> > > To(idea1):
> > > Resets statistics for a single subscription defined by the argument to 
> > > zero.
> > >
> >
> > Okay, I can use this one.
>
> Are you going to remove the part "shown in the
> pg_stat_subsctiption_stats view"? I think it's better to keep it in
> order to make it clear which statistics the function resets as we have
> pg_stat_subscription and pg_stat_subscription_stats.
>

I decided to keep this part of the docs as it is and fixed a few other
minor comments raised by you and Peter. Additionally, I have bumped
the PGSTAT_FILE_FORMAT_ID. I'll push this patch tomorrow unless there
are any other major comments.

-- 
With Regards,
Amit Kapila.


v5-0001-Reconsider-pg_stat_subscription_workers-view.patch
Description: Binary data


Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-27 Thread Bharath Rupireddy
On Sun, Feb 27, 2022 at 8:44 PM Bharath Rupireddy
 wrote:
>
> On Fri, Feb 25, 2022 at 8:38 PM Nitin Jadhav
>  wrote:
>
> Had a quick look over the v3 patch. I'm not sure if it's the best way
> to have pg_stat_get_progress_checkpoint_type,
> pg_stat_get_progress_checkpoint_kind and
> pg_stat_get_progress_checkpoint_start_time just for printing info in
> readable format in pg_stat_progress_checkpoint. I don't think these
> functions will ever be useful for the users.
>
> 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
> or checkpoint instead of having a new function
> pg_stat_get_progress_checkpoint_type?
>
> 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks
> directly instead of new function pg_stat_get_progress_checkpoint_kind?
> + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s",
> + (flags == 0) ? "unknown" : "",
> + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
> + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
> + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "",
> + (flags & CHECKPOINT_FORCE) ? "force " : "",
> + (flags & CHECKPOINT_WAIT) ? "wait " : "",
> + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "",
> + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "",
> + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : "");
>
> 3) Why do we need this extra calculation for start_lsn? Do you ever
> see a negative LSN or something here?
> +('0/0'::pg_lsn + (
> +CASE
> +WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric)
> +ELSE (0)::numeric
> +END + (s.param3)::numeric)) AS start_lsn,
>
> 4) Can't you use timestamptz_in(to_char(s.param4))  instead of
> pg_stat_get_progress_checkpoint_start_time? I don't quite understand
> the reasoning for having this function and it's named as *checkpoint*
> when it doesn't do anything specific to the checkpoint at all?
>
> Having 3 unnecessary functions that aren't useful to the users at all
> in proc.dat will simply eatup the function oids IMO. Hence, I suggest
> let's try to do without extra functions.

Another thought for my review comment:
> 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
> or checkpoint instead of having a new function
> pg_stat_get_progress_checkpoint_type?

I don't think using pg_is_in_recovery work here as it is taken after
the checkpoint has started. So, I think the right way here is to send
1 in CreateCheckPoint  and 2 in CreateRestartPoint and use
CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint".

Continuing my review:

5) Do we need a special phase for this checkpoint operation? I'm not
sure in which cases it will take a long time, but it looks like
there's a wait loop here.
vxids = GetVirtualXIDsDelayingChkpt(&nvxids);
if (nvxids > 0)
{
do
{
pg_usleep(1L); /* wait for 10 msec */
} while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids));
}

Also, how about special phases for SyncPostCheckpoint(),
SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
it might be increase in future (?)), TruncateSUBTRANS()?

6) SLRU (Simple LRU) isn't a phase here, you can just say
PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES.
+
+ pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,
+ PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES);
  CheckPointPredicate();

And :s/checkpointing SLRU pages/checkpointing predicate lock pages
+  WHEN 9 THEN 'checkpointing SLRU pages'


7) 
:s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS

And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN
'processing file sync requests'

8) :s/Finalizing/finalizing
+  WHEN 14 THEN 'Finalizing'

9) :s/checkpointing snapshots/checkpointing logical replication snapshot files
+  WHEN 3 THEN 'checkpointing snapshots'
:s/checkpointing logical rewrite mappings/checkpointing logical
replication rewrite mapping files
+  WHEN 4 THEN 'checkpointing logical rewrite mappings'

10) I'm not sure if it's discussed, how about adding the number of
snapshot/mapping files so far the checkpoint has processed in file
processing while loops of
CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
be many logical snapshot or mapping files and users may be interested
in knowing the so-far-processed-file-count.

11) I think it's discussed, are we going to add the pid of the
checkpoint requestor?

Regards,
Bharath Rupireddy.




Re: Synchronizing slots from primary to standby

2022-02-27 Thread Bharath Rupireddy
On Thu, Feb 24, 2022 at 12:46 AM James Coleman  wrote:
> I've been working on adding test coverage to prove this out, but I've
> encountered the problem reported in [1].
>
> My assumption, but Andres please correct me if I'm wrong, that we
> should see issues with the following steps (given the primary,
> physical replica, and logical subscriber already created in the test):
>
> 1. Ensure both logical subscriber and physical replica are caught up
> 2. Disable logical subscription
> 3. Make a catalog change on the primary (currently renaming the
> primary key column)
> 4. Vacuum pg_class
> 5. Ensure physical replication is caught up
> 6. Stop primary and promote the replica
> 7. Write to the changed table
> 8. Update subscription to point to promoted replica
> 9. Re-enable logical subscription
>
> I'm attaching my test as an additional patch in the series for
> reference. Currently I have steps 3 and 4 commented out to show that
> the issues in [1] occur without any attempt to trigger the catalog
> xmin problem.
>
> Given this error seems pretty significant in terms of indicating
> fundamental lack of test coverage (the primary stated benefit of the
> patch is physical failover), and it currently is a blocker to testing
> more deeply.

Few of my initial concerns specified at [1] are this:

1) Instead of a new LIST_SLOT command, can't we use
READ_REPLICATION_SLOT (slight modifications needs to be done to make
it support logical replication slots and to get more information from
the subscriber).

2) How frequently the new bg worker is going to sync the slot info?
How can it ensure that the latest information exists say when the
subscriber is down/crashed before it picks up the latest slot
information?

4) IIUC, the proposal works only for logical replication slots but do
you also see the need for supporting some kind of synchronization of
physical replication slots as well? IMO, we need a better and
consistent way for both types of replication slots. If the walsender
can somehow push the slot info from the primary (for physical
replication slots)/publisher (for logical replication slots) to the
standby/subscribers, this will be a more consistent and simplistic
design. However, I'm not sure if this design is doable at all.

Can anyone help clarify these?

[1] 
https://www.postgresql.org/message-id/CALj2ACUGNGfWRtwwZwT-Y6feEP8EtOMhVTE87rdeY14mBpsRUA%40mail.gmail.com

Regards,
Bharath Rupireddy.




Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-27 Thread Amit Kapila
On Sat, Feb 26, 2022 at 9:17 PM Melanie Plageman
 wrote:
>
> On Fri, Feb 25, 2022 at 11:17 PM Amit Kapila  wrote:
> >
> > On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman
> >  wrote:
> > >
> > > Since _hash_alloc_buckets() WAL-logs the last page of the
> > > splitpoint, is it safe to skip the smgrimmedsync()? What if the last
> > > page of the splitpoint doesn't end up having any tuples added to it
> > > during the index build and the redo pointer is moved past the WAL for
> > > this page and then later there is a crash sometime before this page
> > > makes it to permanent storage. Does it matter that this page is lost? If
> > > not, then why bother WAL-logging it?
> > >
> >
> > I think we don't care if the page is lost before we update the
> > meta-page in the caller because we will try to reallocate in that
> > case. But we do care after meta page update (having the updated
> > information about this extension via different masks) in which case we
> > won't lose this last page because it would have registered the sync
> > request for it via sgmrextend before meta page update.
>
> and could it happen that during smgrextend() for the last page, a
> checkpoint starts and finishes between FileWrite() and
> register_dirty_segment(), then index build finishes, and then a crash
> occurs before another checkpoint completes the pending fsync for that
> last page?
>

Yeah, this seems to be possible and then the problem could be that
index's idea and smgr's idea for EOF could be different which could
lead to a problem when we try to get a new page via _hash_getnewbuf().
If this theory turns out to be true then probably, we can get an error
either because of disk full or the index might request a block that is
beyond EOF as determined by RelationGetNumberOfBlocksInFork() in
_hash_getnewbuf().

Can we try to reproduce this scenario with the help of a debugger to
see if we are missing something?

-- 
With Regards,
Amit Kapila.




Re: BufferAlloc: don't take two simultaneous locks

2022-02-27 Thread Yura Sokolov
В Пт, 25/02/2022 в 09:01 -0800, Andres Freund пишет:
> Hi,
> 
> On 2022-02-25 12:51:22 +0300, Yura Sokolov wrote:
> > > > +* The usage_count starts out at 1 so that the buffer can 
> > > > survive one
> > > > +* clock-sweep pass.
> > > > +*
> > > > +* We use direct atomic OR instead of Lock+Unlock since no 
> > > > other backend
> > > > +* could be interested in the buffer. But StrategyGetBuffer,
> > > > +* Flush*Buffers, Drop*Buffers are scanning all buffers and 
> > > > locks them to
> > > > +* compare tag, and UnlockBufHdr does raw write to state. So we 
> > > > have to
> > > > +* spin if we found buffer locked.
> > > 
> > > So basically the first half of of the paragraph is wrong, because no, we
> > > can't?
> > 
> > Logically, there are no backends that could be interesting in the buffer.
> > Physically they do LockBufHdr/UnlockBufHdr just to check they are not 
> > interesting.
> 
> Yea, but that's still being interested in the buffer...
> 
> 
> > > > +* Note that we write tag unlocked. It is also safe since there 
> > > > is always
> > > > +* check for BM_VALID when tag is compared.
> > > >  */
> > > > buf->tag = newTag;
> > > > -   buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED |
> > > > -  BM_CHECKPOINT_NEEDED | BM_IO_ERROR | 
> > > > BM_PERMANENT |
> > > > -  BUF_USAGECOUNT_MASK);
> > > > if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == 
> > > > INIT_FORKNUM)
> > > > -   buf_state |= BM_TAG_VALID | BM_PERMANENT | 
> > > > BUF_USAGECOUNT_ONE;
> > > > +   new_bits = BM_TAG_VALID | BM_PERMANENT | 
> > > > BUF_USAGECOUNT_ONE;
> > > > else
> > > > -   buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE;
> > > > -
> > > > -   UnlockBufHdr(buf, buf_state);
> > > > +   new_bits = BM_TAG_VALID | BUF_USAGECOUNT_ONE;
> > > >  
> > > > -   if (oldPartitionLock != NULL)
> > > > +   buf_state = pg_atomic_fetch_or_u32(&buf->state, new_bits);
> > > > +   while (unlikely(buf_state & BM_LOCKED))
> > > 
> > > I don't think it's safe to atomic in arbitrary bits. If somebody else has
> > > locked the buffer header in this moment, it'll lead to completely bogus
> > > results, because unlocking overwrites concurrently written contents (which
> > > there shouldn't be any, but here there are)...
> > 
> > That is why there is safety loop in the case buf->state were locked just
> > after first optimistic atomic_fetch_or. 99.999% times this loop will not
> > have a job. But in case other backend did lock buf->state, loop waits
> > until it releases lock and retry atomic_fetch_or.
> > > And or'ing contents in also doesn't make sense because we it doesn't work 
> > > to
> > > actually unset any contents?
> > 
> > Sorry, I didn't understand sentence :((
> 
> You're OR'ing multiple bits into buf->state. LockBufHdr() only ORs in
> BM_LOCKED. ORing BM_LOCKED is fine:
> Either the buffer is not already locked, in which case it just sets the
> BM_LOCKED bit, acquiring the lock. Or it doesn't change anything, because
> BM_LOCKED already was set.
> 
> But OR'ing in multiple bits is *not* fine, because it'll actually change the
> contents of ->state while the buffer header is locked.

First, both states are valid: before atomic_or and after.
Second, there are no checks for buffer->state while buffer header is locked.
All LockBufHdr users uses result of LockBufHdr. (I just checked that).

> > > Why don't you just use LockBufHdr/UnlockBufHdr?
> > 
> > This pair makes two atomic writes to memory. Two writes are heavier than
> > one write in this version (if optimistic case succeed).
> 
> UnlockBufHdr doesn't use a locked atomic op. It uses a write barrier and an
> unlocked write.

Write barrier is not free on any platform.

Well, while I don't see problem with modifying buffer->state, there is problem
with modifying buffer->tag: I missed Drop*Buffers doesn't check BM_TAG_VALID
flag. Therefore either I had to add this check to those places, or return to
LockBufHdr+UnlockBufHdr pair.

For patch simplicity I'll return Lock+UnlockBufHdr pair. But it has measurable
impact on low connection numbers on many-sockets.

> 
> Greetings,
> 
> Andres Freund





Re: generic plans and "initial" pruning

2022-02-27 Thread Amit Langote
Hi Andres,

On Fri, Feb 11, 2022 at 10:29 AM Andres Freund  wrote:
> On 2022-02-10 17:13:52 +0900, Amit Langote wrote:
> > The attached patch implements this idea.  Sorry for the delay in
> > getting this out and thanks to Robert for the off-list discussions on
> > this.
>
> I did not follow this thread at all. And I only skimmed the patch. So I'm
> probably wrong.

Thanks for your interest in this and sorry about the delay in replying
(have been away due to illness).

> I'm a wary of this increasing executor overhead even in cases it won't
> help. Without this patch, for simple queries, I see small allocations
> noticeably in profiles. This adds a bunch more, even if
> !context->stmt->usesPreExecPruning:

Ah, if any new stuff added by the patch runs in
!context->stmt->usesPreExecPruning paths, then it's just poor coding
on my part, which I'm now looking to fix.  Maybe not all of it is
avoidable, but I think whatever isn't should be trivial...

> - makeNode(ExecPrepContext)
> - makeNode(ExecPrepOutput)
> - palloc0(sizeof(PlanPrepOutput *) * result->numPlanNodes)
> - stmt_execprep_list = lappend(stmt_execprep_list, execprep);
> - AllocSetContextCreate(CurrentMemoryContext,
>   "CachedPlan execprep list", ...
> - ...
>
> That's a lot of extra for something that's already a bottleneck.

If all these allocations are limited to the usesPreExecPruning path,
IMO, they would amount to trivial overhead compared to what is going
to be avoided -- locking say 1000 partitions when only 1 will be
scanned.  Although, maybe there's a way to code this to have even less
overhead than what's in the patch now.
--
Amit Langote
EDB: http://www.enterprisedb.com




Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-02-27 Thread Julien Rouhaud
Hi,

On Mon, Feb 28, 2022 at 10:21:23AM +0530, Bharath Rupireddy wrote:
> 
> Another thought for my review comment:
> > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint
> > or checkpoint instead of having a new function
> > pg_stat_get_progress_checkpoint_type?
> 
> I don't think using pg_is_in_recovery work here as it is taken after
> the checkpoint has started. So, I think the right way here is to send
> 1 in CreateCheckPoint  and 2 in CreateRestartPoint and use
> CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint".

I suggested upthread to store the starting timeline instead.  This way you can
deduce whether it's a restartpoint or a checkpoint, but you can also deduce
other information, like what was the starting WAL.

> 11) I think it's discussed, are we going to add the pid of the
> checkpoint requestor?

As mentioned upthread, there can be multiple backends that request a
checkpoint, so unless we want to store an array of pid we should store a number
of backend that are waiting for a new checkpoint.




Re: parse/analyze API refactoring

2022-02-27 Thread Peter Eisentraut
You set this commit fest entry to Waiting on Author, but there were no 
reviews posted and the patch still applies and builds AFAICT, so I don't 
know what you meant by that.



On 13.01.22 00:49, Bossart, Nathan wrote:

On 12/28/21, 8:25 AM, "Peter Eisentraut"  
wrote:

(The "withcb" naming maybe isn't great; better ideas welcome.)


FWIW I immediately understood that this meant "with callback," so it
might be okay.


Not included in this patch set, but food for further thought:  The
pg_analyze_and_rewrite_*() functions aren't all that useful (anymore).
One might as well write

  pg_rewrite_query(parse_analyze_xxx(...))


I had a similar thought while reading through the patches.  If further
deduplication isn't too much trouble, I'd vote for that.

Nathan







Re: support for MERGE

2022-02-27 Thread Julien Rouhaud
On Sun, Feb 27, 2022 at 09:17:13PM +0100, Daniel Gustafsson wrote:
> > On 27 Feb 2022, at 18:42, Tom Lane  wrote:
> 
> > I'd rather keep all the ModifyTable code in one .c file, even if that one is
> > bigger than our usual practice.
> 
> Agreed, I also prefer a (too) large file over a set of .c #include's.

+1.  Also, if a split is really needed isn't our usual approach to create some
*_private.h files so that third-party code is aware that this is in no way an
API meant for them?




RE: Logical replication timeout problem

2022-02-27 Thread wangw.f...@fujitsu.com
On Wed, Feb 22, 2022 at 4:56 PM Amit Kapila  wrote:
>
Thanks for your review.

> On Tue, Feb 22, 2022 at 9:17 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Fri, Feb 18, 2022 at 10:51 AM Ajin Cherian  wrote:
> > > Some comments:
> > Thanks for your review.
> >
> > >  I see you only track skipped Inserts/Updates and Deletes. What about
> > > DDL operations that are skipped, what about truncate.
> > > What about changes made to unpublished tables? I wonder if you could
> > > create a test script that only did DDL operations
> > > and truncates, would this timeout happen?
> > According to your suggestion, I tested with DDL and truncate.
> > While testing, I ran only 20,000 DDLs and 10,000 truncations in one
> > transaction.
> > If I set wal_sender_timeout and wal_receiver_timeout to 30s, it will time 
> > out.
> > And if I use the default values, it will not time out.
> > IMHO there should not be long transactions that only contain DDL and
> > truncation. I'm not quite sure, do we need to handle this kind of use case?
> >
> 
> I think it is better to handle such cases as well and changes related
> to unpublished tables as well. BTW, it seems Kuroda-San has also given
> some comments [1] which I am not sure are addressed.
Add handling of related use cases.

> I think instead of keeping the skipping threshold w.r.t
> wal_sender_timeout, we can use some conservative number like 1 or
> so which we are sure won't impact performance and won't lead to
> timeouts.
Yes, it would be better. Set the threshold conservatively to 1.

> *
> + /*
> + * skipped_changes_count is reset when processing changes that do not need
> to
> + * be skipped.
> + */
> + skipped_changes_count = 0
> 
> When the skipped_changes_count is reset, the sendTime should also be
> reset. Can we reset it whenever the UpdateProgress function is called
> with send_keep_alive as false?
Fixed.

Attached a new patch that addresses following improvements I have got so far as
comments:
1. Consider other changes that need to be skipped(truncate, DDL and function
calls pg_logical_emit_message). [suggestion by Ajin, Amit]
(Add a new function SendKeepaliveIfNecessary for trying to send keepalive 
message.)
2. Set the threshold conservatively to a static value of 1.[suggestion by 
Amit, Kuroda-San]
3. Reset sendTime in function WalSndUpdateProgress when send_keep_alive is
false. [suggestion by Amit]

Regards,
Wang wei


0001-Fix-the-timeout-of-subscriber-in-long-transactions.patch
Description:  0001-Fix-the-timeout-of-subscriber-in-long-transactions.patch


RE: Logical replication timeout problem

2022-02-27 Thread wangw.f...@fujitsu.com
On Thur, Feb 24, 2022 at 4:06 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> Dear Wang,
Thanks for your review.

> > According to our discussion, we need to send keepalive messages to
> > subscriber when skipping changes.
> > One approach is that **for each skipped change**, we try to send
> > keepalive message by calculating whether a timeout will occur based on
> > the current time and the last time the keepalive was sent. But this will 
> > brings
> slight overhead.
> > So I want to try another approach: after **constantly skipping some
> > changes**, we try to send keepalive message by calculating whether a
> > timeout will occur based on the current time and the last time the keepalive
> was sent.
> 
> You meant that calling system calls like GetCurrentTimestamp() should be
> reduced, right? I'm not sure how it affects but it seems reasonable.
Yes. There is no need to invoke frequently, and it will bring overhead.

> > IMO, we should send keepalive message after skipping a certain number
> > of changes constantly.
> > And I want to calculate the threshold dynamically by using a fixed
> > value to avoid adding too much code.
> > In addition, different users have different machine performance, and
> > users can modify wal_sender_timeout, so the threshold should be
> > dynamically calculated according to wal_sender_timeout.
> 
> Your experiment seems not bad, but the background cannot be understand
> from code comments. I prefer a static threshold because it's more simple, 
> which
> as Amit said in the following too:
> 
> https://www.postgresql.org/message-id/CAA4eK1%2B-
> p_K_j%3DNiGGD6tCYXiJH0ypT4REX5PBKJ4AcUoF3gZQ%40mail.gmail.com
Yes, you are right. Fixed.
And I set the threshold to 1.

> BTW, this patch cannot be applied to current master.
Thanks for reminder. Rebase it.
Kindly have a look at new patch shared in [1].

[1] 
https://www.postgresql.org/message-id/OS3PR01MB6275FEB9F83081F1C87539B99E019%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei


RE: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread tanghy.f...@fujitsu.com
On Mon, Feb 28, 2022 12:32 PM Amit Kapila  wrote:
> 
> On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada 
> wrote:
> >
> > On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila 
> wrote:
> > >
> > > >
> > > > (2) doc/src/sgml/monitoring.sgml
> > > >
> > > > +Resets statistics for a single subscription shown in the
> > > > +pg_stat_subscription_stats view to 
> > > > zero. If
> > > > +the argument is NULL, reset statistics for 
> > > > all
> > > > +subscriptions.
> > > > 
> > > >
> > > > I felt we could improve the first sentence.
> > > >
> > > > From:
> > > > Resets statistics for a single subscription shown in the..
> > > >
> > > > To(idea1):
> > > > Resets statistics for a single subscription defined by the argument to 
> > > > zero.
> > > >
> > >
> > > Okay, I can use this one.
> >
> > Are you going to remove the part "shown in the
> > pg_stat_subsctiption_stats view"? I think it's better to keep it in
> > order to make it clear which statistics the function resets as we have
> > pg_stat_subscription and pg_stat_subscription_stats.
> >
> 
> I decided to keep this part of the docs as it is and fixed a few other
> minor comments raised by you and Peter. Additionally, I have bumped
> the PGSTAT_FILE_FORMAT_ID. I'll push this patch tomorrow unless there
> are any other major comments.
> 

Thanks for your patch. I have finished the review/test for this patch.
The patch LGTM.

Regards,
Tang


Re: make tuplestore helper function

2022-02-27 Thread Michael Paquier
On Thu, Feb 24, 2022 at 08:25:06PM +0900, Michael Paquier wrote:
> This is the remaining piece, as attached, that I have not been able to
> poke much at yet.

So, I have finally poked at this last part of the patch set, and I
found that we can be more aggressive with the refactoring, by moving
into MakeFuncResultTuplestore() the parts where we save the tuplestore
and the tupledesc in the per-query memory context.  There are two
pieces that matter once things are reshaped:
- The tuple descriptor may need some extra validation via
BlessTupleDesc() when it comes from a transient record datatype,
something that happens for most of the subroutines related to the JSON
functions.
- expectedDesc is sometimes required by the caller, though most of the
time just needs to be built with the more expensive
get_call_result_type().

In order to keep things pluggable at will, MakeFuncResultTuplestore()
has been changed to access a set of bits32 flags, able to control the
two options above.  With this facility in place, I have been able to
cut much more code than the initial patch, roughly twice as of:
 24 files changed, 157 insertions(+), 893 deletions(-)

This seems in rather good shape to me, the changes are
straight-forward and the code cut is really good, so I'd like to move
on with that.  0001 is the initial patch, and 0002 is the extra
refactoring I have been working on.  The plan would be to merge both,
but I am sending a split to ease any checks on what I have changed.

Comments or objections?
--
Michael
From d45e9a7031017e13e5429ff985b36ddafdfdb443 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 24 Feb 2022 17:27:55 +0900
Subject: [PATCH v9 1/2] Introduce MakeFuncResultTuplestore()

This is the main patch from Melanie, that I have tweaked a bit.  Note
that I am not completely done with it ;p
---
 src/include/funcapi.h |  2 +
 src/backend/access/transam/xlogfuncs.c| 16 +---
 src/backend/commands/event_trigger.c  | 32 +---
 src/backend/commands/extension.c  | 48 +---
 src/backend/commands/prepare.c| 17 +
 src/backend/foreign/foreign.c | 14 +---
 src/backend/libpq/hba.c   | 19 +
 src/backend/replication/logical/launcher.c| 16 +---
 .../replication/logical/logicalfuncs.c| 16 +---
 src/backend/replication/logical/origin.c  | 19 +
 src/backend/replication/slotfuncs.c   | 16 +---
 src/backend/replication/walsender.c   | 16 +---
 src/backend/storage/ipc/shmem.c   | 16 +---
 src/backend/utils/adt/datetime.c  | 17 +
 src/backend/utils/adt/genfile.c   | 31 +---
 src/backend/utils/adt/jsonfuncs.c | 73 +++
 src/backend/utils/adt/mcxtfuncs.c | 16 +---
 src/backend/utils/adt/misc.c  | 14 +---
 src/backend/utils/adt/pgstatfuncs.c   | 48 +---
 src/backend/utils/adt/varlena.c   | 14 +---
 src/backend/utils/fmgr/funcapi.c  | 38 ++
 src/backend/utils/misc/guc.c  | 15 +---
 src/backend/utils/misc/pg_config.c| 13 +---
 src/backend/utils/mmgr/portalmem.c| 17 +
 24 files changed, 82 insertions(+), 461 deletions(-)

diff --git a/src/include/funcapi.h b/src/include/funcapi.h
index ba927c2f33..b9f9e92d1a 100644
--- a/src/include/funcapi.h
+++ b/src/include/funcapi.h
@@ -229,6 +229,8 @@ extern TupleDesc BlessTupleDesc(TupleDesc tupdesc);
 extern AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc);
 extern HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values);
 extern Datum HeapTupleHeaderGetDatum(HeapTupleHeader tuple);
+extern Tuplestorestate *MakeFuncResultTuplestore(FunctionCallInfo fcinfo,
+		TupleDesc *result_tupdesc);
 
 
 /*--
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 12e2bf4135..2fc1ed023c 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -178,24 +178,10 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 	XLogRecPtr	stoppoint;
 	SessionBackupState status = get_backup_status();
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
 
-	tupstore = tuplestore_b