Re: Commitfest manager for 2022-03
On Sat, Feb 26, 2022 at 06:37:21PM -0600, Justin Pryzby wrote: > Can I suggest to update the CF APP to allow: > | Target version: 16 > > I also suggest to update patches to indicate which are (not) being considered > for v15. I don't really understand what that field is supposed to mean. But now that we're in the final pg15 commit fest, wouldn't it be simpler to actually move the patches for which there's a agreement that they can't make it to pg15? Tagging them and letting them rot for a month isn't helpful for the authors or the CFMs, especially when there are already 250 patches that needs to be handled. There's already an on-going a discussion for the PGTT patchset, maybe it should also happen for some of the thread you mentioned.
Re: Commitfest manager for 2022-03
On Sun, Feb 27, 2022 at 04:51:16PM +0800, Julien Rouhaud wrote: > I don't really understand what that field is supposed to mean. But now that > we're in the final pg15 commit fest, wouldn't it be simpler to actually move > the patches for which there's a agreement that they can't make it to pg15? > Tagging them and letting them rot for a month isn't helpful for the authors or > the CFMs, especially when there are already 250 patches that needs to be > handled. Yes, I don't see any point in having a new tag just to mark patches that will have to be moved to the next CF anyway. These should just be moved to the July one rather than stay in the March one. -- Michael signature.asc Description: PGP signature
Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
On Sat, Feb 26, 2022 at 01:39:42PM -0800, Andres Freund wrote: > I suspect the easiest is to just convert that usleep to a WaitLatch(). That'd > require adding a new enum value to WaitEventTimeout in 14. Which probably is > fine? We've added wait events in back-branches in the past, so this does not strike me as a problem as long as you add the new entry at the end of the enum, while keeping things ordered on HEAD. In recent memory, I think that only some of the extensions published by PostgresPro rely on the enums in this area. -- Michael signature.asc Description: PGP signature
Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
On Sun, Feb 27, 2022 at 06:10:45PM +0900, Michael Paquier wrote: > On Sat, Feb 26, 2022 at 01:39:42PM -0800, Andres Freund wrote: > > I suspect the easiest is to just convert that usleep to a WaitLatch(). > > That'd > > require adding a new enum value to WaitEventTimeout in 14. Which probably is > > fine? > > We've added wait events in back-branches in the past, so this does not > strike me as a problem as long as you add the new entry at the end of > the enum, while keeping things ordered on HEAD. +1 > In recent memory, I > think that only some of the extensions published by PostgresPro rely > on the enums in this area. Indeed, I only know of pg_wait_sampling which uses it. Note that it relies on pgstat_get_wait_event* functions, so it should only returns "???" / "unknown wait event" until you recompile it for a newer minor version and not report errors or crash. All other extensions I know of simply use whatever pg_stat_activity returns, so no impact.
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
On Mon, Jan 10, 2022 at 6:50 AM Bharath Rupireddy wrote: > > On Wed, Jan 5, 2022 at 12:13 PM Kyotaro Horiguchi > wrote: > > > Here's the v2 patch. Please provide your thoughts. > > > > Thanks! I have three comments on this version. > > Thanks for your review. > > > - I still think "acquire/release" logs on creation/dropping should be > > silenced. Adding the third parameter to ReplicationSlotAcquire() > > that can mute the acquiring (and as well as the corresponding > > releasing) log will work. > > Done. > > > can piggy-back on log_replication_commands for the purpose, changing > > its meaning slightly to "log replication commands and related > > activity". > > - Need to mute the logs by log_replication_commands. (We could add > > another choice for the variable for this purpose but I think we > > don't need it.) > > Done. > > > - The messages are not translatable as the variable parts are > > adjectives. They need to consist of static sentences. The > > combinations of the two properties are 6 (note that persistence is > > tristate) but I don't think we accept that complexity for the > > information. So I recommend to just remove the variable parts from > > the messages. > > Done. > > Here's v3, please review it further. Here's the rebased v4 patch. Regards, Bharath Rupireddy. v4-0001-Add-LOG-messages-when-replication-slots-become-ac.patch Description: Binary data
Re: Some optimisations for numeric division
On Fri, 25 Feb 2022 at 18:30, Tom Lane wrote: > > Dean Rasheed writes: > > And another update following feedback from the cfbot. > > This patchset LGTM. On my machine there doesn't seem to be any > measurable performance change for the numeric regression test, > but numeric_big gets about 15% faster. > Yes, that matches my observations. Thanks for reviewing and testing. > The only additional thought I have, based on your comments about > 0001, is that maybe in mul_var we should declare var1digit as > being NumericDigit not int. I tried that here and didn't see > any material change in the generated assembly code (using gcc > 8.5.0), so you're right that modern gcc doesn't need any help > there; but I wonder if it could help on other compiler versions. > Yes, that makes sense. Done that way. Regards, Dean
pg_stat_statements: remove redundant function call in pg_stat_statements_internal
Hi, I found some redundant function calls in pg_stat_statements.c/pg_stat_statements_internal(), There is no need to call GetUserId() again because the value was previously obtained. so I propose a patch to fix it. --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -1508,7 +1508,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo, pgssEntry *entry; /* Superusers or members of pg_read_all_stats members are allowed */ - is_allowed_role = is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS); + is_allowed_role = is_member_of_role(userid, ROLE_PG_READ_ALL_STATS); /* hash table must exist already */ if (!pgss || !pgss_hash) Regards, Lee Dong Wook. v1_tiny_improvement_pg_stat_statements.patch Description: Binary data
Re: pg_stat_statements: remove redundant function call in pg_stat_statements_internal
Hi, On Sun, Feb 27, 2022 at 08:45:13PM +0900, Dong Wook Lee wrote: > > I found some redundant function calls in > pg_stat_statements.c/pg_stat_statements_internal(), > There is no need to call GetUserId() again because the value was > previously obtained. Indeed. I doubt it will make any real difference but it doesn't hurt to fix it. Patch looks good to me.
Re: PATCH: add "--config-file=" option to pg_rewind
On Sat, Feb 26, 2022 at 09:55:20AM +0100, Gunnar "Nick" Bluth wrote: > Am 26.02.22 um 06:51 schrieb Michael Paquier: >> Shouldn't this one use appendShellString() on config_file? > > It probably should, yes. I don't fancy this repetitive code myself. > But then, pg_rewind as a whole could use an overhaul. I don't see any use of > PQExpBuffer in it, but a lot of char* ... Having a lot of char* does not necessarily mean that all of them have to be changed to accomodate with PQExpBuffer. My guess that this is a case-by-case, where we should apply that in places where it makes the code cleaner to follow. In the case of your patch though, the two areas changed would make the logic correct, instead, because we have to apply correct quoting rules to any commands executed. > GSOC project? ;-) It does not seem so, though there are surely more areas that could gain in clarity. -- Michael signature.asc Description: PGP signature
Re: pg_stat_statements: remove redundant function call in pg_stat_statements_internal
On Sun, Feb 27, 2022 at 07:55:26PM +0800, Julien Rouhaud wrote: > Indeed. I doubt it will make any real difference but it doesn't hurt to fix > it. > > Patch looks good to me. Yes, let's clean up that on HEAD. No objections from here. I'll do that tomorrow or so. -- Michael signature.asc Description: PGP signature
Support for grabbing multiple consecutive values with nextval()
Hi, First time PostgreSQL contributor here :) I wanted to be able to allocate a bunch of numbers from a sequence at once. Multiple people seem to be struggling with this (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). I propose to add an extra argument to nextval() that specifies how many numbers you want to allocate (default 1). The attached patch (based on master) passes `./configure --enable-cassert --enable-debug && make && make check`, including the newly added regression tests. It does change the signature of nextval_internal(), not sure if that's considered backwards compatibility breaking (for extensions?). -- JilleFrom 403993dfea71068070185dd14fa3f5ff26d5f791 Mon Sep 17 00:00:00 2001 From: Jille Timmermans Date: Sun, 27 Feb 2022 10:20:22 +0100 Subject: [PATCH] Add an argument to nextval() to grab multiple consecutive sequence numbers --- doc/src/sgml/func.sgml | 8 +++-- src/backend/commands/sequence.c| 46 +- src/backend/executor/execExprInterp.c | 2 +- src/backend/optimizer/util/clauses.c | 2 +- src/include/catalog/pg_proc.dat| 3 ++ src/include/commands/sequence.h| 2 +- src/test/regress/expected/sequence.out | 41 +++ src/test/regress/sql/sequence.sql | 12 +++ 8 files changed, 102 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index df3cd5987b..5923ecc38e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -17605,7 +17605,7 @@ $.* ? (@ like_regex "^\\d+$") nextval -nextval ( regclass ) +nextval ( regclass , bigint ) bigint @@ -17618,7 +17618,11 @@ $.* ? (@ like_regex "^\\d+$") values beginning with 1. Other behaviors can be obtained by using appropriate parameters in the command. - + + +To grab multiple values you can pass an integer to nextval. +It will allocate that many consecutive numbers from the sequence and return the last value. + This function requires USAGE or UPDATE privilege on the sequence. diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index ab592ce2f1..79e2a1e7c0 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -570,7 +570,7 @@ nextval(PG_FUNCTION_ARGS) */ relid = RangeVarGetRelid(sequence, NoLock, false); - PG_RETURN_INT64(nextval_internal(relid, true)); + PG_RETURN_INT64(nextval_internal(relid, true, 1)); } Datum @@ -578,11 +578,20 @@ nextval_oid(PG_FUNCTION_ARGS) { Oid relid = PG_GETARG_OID(0); - PG_RETURN_INT64(nextval_internal(relid, true)); + PG_RETURN_INT64(nextval_internal(relid, true, 1)); +} + +Datum +nextval_oid_num(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + int64 num = PG_GETARG_INT64(1); + + PG_RETURN_INT64(nextval_internal(relid, true, num)); } int64 -nextval_internal(Oid relid, bool check_permissions) +nextval_internal(Oid relid, bool check_permissions, int64 request) { SeqTable elm; Relation seqrel; @@ -605,6 +614,17 @@ nextval_internal(Oid relid, bool check_permissions) bool cycle; bool logit = false; + if (request < 1) + { + char buf[100]; + + snprintf(buf, sizeof(buf), INT64_FORMAT, request); + ereport(ERROR, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("nextval: can't request %s values from a sequence", + buf))); + } + /* open and lock sequence */ init_sequence(relid, &elm, &seqrel); @@ -627,11 +647,10 @@ nextval_internal(Oid relid, bool check_permissions) */ PreventCommandIfParallelMode("nextval()"); - if (elm->last != elm->cached) /* some numbers were cached */ + if (elm->increment != 0 && (elm->cached - elm->last) / elm->increment >= request) /* enough numbers were cached */ { Assert(elm->last_valid); - Assert(elm->increment != 0); - elm->last += elm->increment; + elm->last += elm->increment * request; relation_close(seqrel, NoLock); last_used_seq = elm; return elm->last; @@ -652,8 +671,17 @@ nextval_internal(Oid relid, bool check_permissions) seq = read_seq_tuple(seqrel, &buf, &seqdatatuple); page = BufferGetPage(buf); + if (elm->cached != elm->last && elm->cached == seq->last_value) { + /* + * There are some numbers in the cache, and we can grab the numbers directly following those. + * We can fetch fewer new numbers and claim the numbers from the cache. + */ + request -= elm->cached - elm->last; + } + elm->increment = incby; last = next = result = seq->last_value; + cache += request-1; fetch = cache; log = seq->log_cnt; @@ -703,7 +731,7 @@ nextval_internal(Oid relid, bool check_permissions) if ((maxv >= 0 && next > maxv - incby) || (maxv < 0 && next + incby > maxv)) { -
Re: Support for grabbing multiple consecutive values with nextval()
Hi, On Sun, Feb 27, 2022 at 10:42:25AM +0100, Jille Timmermans wrote: > > First time PostgreSQL contributor here :) Welcome! > I wanted to be able to allocate a bunch of numbers from a sequence at once. > Multiple people seem to be struggling with this > (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, > https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). > > I propose to add an extra argument to nextval() that specifies how many > numbers you want to allocate (default 1). > > The attached patch (based on master) passes `./configure --enable-cassert > --enable-debug && make && make check`, including the newly added regression > tests. > > It does change the signature of nextval_internal(), not sure if that's > considered backwards compatibility breaking (for extensions?). Please register this patch to the next commit fest (and last for pg15 inclusion) at https://commitfest.postgresql.org/37/ if not done already.
Re: PATCH: add "--config-file=" option to pg_rewind
Am 27.02.22 um 13:06 schrieb Michael Paquier: On Sat, Feb 26, 2022 at 09:55:20AM +0100, Gunnar "Nick" Bluth wrote: Am 26.02.22 um 06:51 schrieb Michael Paquier: Shouldn't this one use appendShellString() on config_file? It probably should, yes. I don't fancy this repetitive code myself. But then, pg_rewind as a whole could use an overhaul. I don't see any use of PQExpBuffer in it, but a lot of char* ... Having a lot of char* does not necessarily mean that all of them have to be changed to accomodate with PQExpBuffer. My guess that this is a case-by-case, where we should apply that in places where it makes the code cleaner to follow. In the case of your patch though, the two areas changed would make the logic correct, instead, because we have to apply correct quoting rules to any commands executed. Alas! v3 attached. GSOC project? ;-) It does not seem so, though there are surely more areas that could gain in clarity. That's universally true ;-) Best regards, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bl...@pro-open.de __ "Ceterum censeo SystemD esse delendam" - Catodiff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 33e6bb64ad..31a1a1dedb 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -241,6 +241,17 @@ PostgreSQL documentation + + --config-file=filepath + + +In case the postgresql.conf of your target cluster is not in the +target pgdata and you want to use the -c / --restore-target-wal option, +provide a (relative or absolute) path to the postgresql.conf using this option. + + + + --debug diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c index efb82a4034..b8c92c5590 100644 --- a/src/bin/pg_rewind/pg_rewind.c +++ b/src/bin/pg_rewind/pg_rewind.c @@ -23,6 +23,7 @@ #include "common/restricted_token.h" #include "common/string.h" #include "fe_utils/recovery_gen.h" +#include "fe_utils/string_utils.h" #include "file_ops.h" #include "filemap.h" #include "getopt_long.h" @@ -60,6 +61,7 @@ char *datadir_target = NULL; char *datadir_source = NULL; char *connstr_source = NULL; char *restore_command = NULL; +char *config_file = NULL; static bool debug = false; bool showprogress = false; @@ -86,6 +88,7 @@ usage(const char *progname) printf(_("Options:\n")); printf(_(" -c, --restore-target-wal use restore_command in target configuration to\n" " retrieve WAL files from archives\n")); + printf(_(" --config-file=FILE path to postgresql.conf if outside target-pgdata (for -c)\n")); printf(_(" -D, --target-pgdata=DIRECTORY existing data directory to modify\n")); printf(_(" --source-pgdata=DIRECTORY source data directory to synchronize with\n")); printf(_(" --source-server=CONNSTRsource server to synchronize with\n")); @@ -120,6 +123,7 @@ main(int argc, char **argv) {"no-sync", no_argument, NULL, 'N'}, {"progress", no_argument, NULL, 'P'}, {"debug", no_argument, NULL, 3}, + {"config-file", required_argument, NULL, 5}, {NULL, 0, NULL, 0} }; int option_index; @@ -204,6 +208,10 @@ main(int argc, char **argv) case 4: no_ensure_shutdown = true; break; + + case 5: +config_file = pg_strdup(optarg); +break; } } @@ -1016,8 +1024,8 @@ getRestoreCommand(const char *argv0) { int rc; char postgres_exec_path[MAXPGPATH], -postgres_cmd[MAXPGPATH], cmd_output[MAXPGPATH]; + PQExpBuffer postgres_cmd; if (!restore_wal) return; @@ -1051,11 +1059,21 @@ getRestoreCommand(const char *argv0) * Build a command able to retrieve the value of GUC parameter * restore_command, if set. */ - snprintf(postgres_cmd, sizeof(postgres_cmd), - "\"%s\" -D \"%s\" -C restore_command", - postgres_exec_path, datadir_target); + postgres_cmd = createPQExpBuffer(); - if (!pipe_read_line(postgres_cmd, cmd_output, sizeof(cmd_output))) + /* path to postgres, properly quoted */ + appendShellString(postgres_cmd, postgres_exec_path); + + appendPQExpBufferStr(postgres_cmd, " -D "); + appendShellString(postgres_cmd, datadir_target); + if (config_file != NULL) + { + appendPQExpBufferStr(postgres_cmd, " --config_file="); + appendShellString(postgres_cmd, config_file); + } + appendPQExpBufferStr(postgres_cmd, " -C restore_command"); + + if (!pipe_read_line(postgres_cmd->data, cmd_output, sizeof(cmd_output))) exit(1); (void) pg_strip_crlf(cmd_output); diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl index db9201f38e..3c395ece12 100644 --- a/src/bin/pg_rewind/t/001_basic.pl +++ b/src/bin/pg_rewind/t/001_basic.pl @@ -190,5 +190,6 @@ in primary, before promotion run_test('local'); run_test('remote'); run_te
CREATE DATABASE IF NOT EXISTS in PostgreSQL
Hi, hackers When I try to use CREATE DATABASE IF NOT EXISTS in PostgreSQL, it complains this syntax is not supported. We can use the following command to achieve this, however, it's not straightforward. SELECT 'CREATE DATABASE mydb' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'mydb')\gexec Why don't support CREATE DATABASE IF NOT EXISTS syntax in PostgreSQL? I create a patch for this, any suggestions? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. >From 7971893868e6eedc7229d28442f07890f251c42b Mon Sep 17 00:00:00 2001 From: Japin Li Date: Sun, 27 Feb 2022 22:02:59 +0800 Subject: [PATCH v1] Add CREATE DATABASE IF NOT EXISTS syntax --- doc/src/sgml/ref/create_database.sgml | 2 +- src/backend/commands/dbcommands.c | 19 +++ src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c| 1 + src/backend/parser/gram.y | 9 + src/include/nodes/parsenodes.h| 1 + 6 files changed, 28 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml index f70d0c75b4..74af9c586e 100644 --- a/doc/src/sgml/ref/create_database.sgml +++ b/doc/src/sgml/ref/create_database.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -CREATE DATABASE name +CREATE DATABASE name [ IF NOT EXISTS ] [ [ WITH ] [ OWNER [=] user_name ] [ TEMPLATE [=] template ] [ ENCODING [=] encoding ] diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index c37e3c9a9a..f3e5e930f9 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -557,10 +557,21 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) * message than "unique index violation". There's a race condition but * we're willing to accept the less friendly message in that case. */ - if (OidIsValid(get_database_oid(dbname, true))) - ereport(ERROR, -(errcode(ERRCODE_DUPLICATE_DATABASE), - errmsg("database \"%s\" already exists", dbname))); + { + Oid check_dboid = get_database_oid(dbname, true); + if (OidIsValid(check_dboid)) + { + if (!stmt->if_not_exists) +ereport(ERROR, + (errcode(ERRCODE_DUPLICATE_DATABASE), + errmsg("database \"%s\" already exists", dbname))); + + ereport(NOTICE, + (errcode(ERRCODE_DUPLICATE_DATABASE), + errmsg("database \"%s\" already exists, skipping", dbname))); + return check_dboid; + } + } /* * The source DB can't have any active backends, except this one diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index d4f8455a2b..83ead22931 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -4045,6 +4045,7 @@ _copyCreatedbStmt(const CreatedbStmt *from) COPY_STRING_FIELD(dbname); COPY_NODE_FIELD(options); + COPY_SCALAR_FIELD(if_not_exists); return newnode; } diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index f1002afe7a..f9a89fa4a8 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -1722,6 +1722,7 @@ _equalCreatedbStmt(const CreatedbStmt *a, const CreatedbStmt *b) { COMPARE_STRING_FIELD(dbname); COMPARE_NODE_FIELD(options); + COMPARE_SCALAR_FIELD(if_not_exists); return true; } diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index a03b33b53b..72e4f642d9 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10395,6 +10395,15 @@ CreatedbStmt: CreatedbStmt *n = makeNode(CreatedbStmt); n->dbname = $3; n->options = $5; + n->if_not_exists = false; + $$ = (Node *)n; +} + | CREATE DATABASE IF_P NOT EXISTS name opt_with createdb_opt_list +{ + CreatedbStmt *n = makeNode(CreatedbStmt); + n->dbname = $6; + n->options = $8; + n->if_not_exists = true; $$ = (Node *)n; } ; diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 1617702d9d..71c828ac4d 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -3295,6 +3295,7 @@ typedef struct CreatedbStmt NodeTag type; char *dbname; /* name of database to create */ List *options; /* List of DefElem nodes */ + boolif_not_exists; /* just do nothing if it already exists? */ } CreatedbStmt; /* -- -- 2.25.1
Re: Support for grabbing multiple consecutive values with nextval()
On 2022-02-27 14:22, Julien Rouhaud wrote: Hi, On Sun, Feb 27, 2022 at 10:42:25AM +0100, Jille Timmermans wrote: First time PostgreSQL contributor here :) Welcome! Thanks! I wanted to be able to allocate a bunch of numbers from a sequence at once. Multiple people seem to be struggling with this (https://stackoverflow.com/questions/896274/select-multiple-ids-from-a-postgresql-sequence, https://www.depesz.com/2008/03/20/getting-multiple-values-from-sequences/). I propose to add an extra argument to nextval() that specifies how many numbers you want to allocate (default 1). The attached patch (based on master) passes `./configure --enable-cassert --enable-debug && make && make check`, including the newly added regression tests. It does change the signature of nextval_internal(), not sure if that's considered backwards compatibility breaking (for extensions?). Please register this patch to the next commit fest (and last for pg15 inclusion) at https://commitfest.postgresql.org/37/ if not done already. Done: https://commitfest.postgresql.org/37/3577/ (I was waiting for mailman approval before I got the thread id.)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
On Fri, Feb 25, 2022 at 8:38 PM Nitin Jadhav wrote: Had a quick look over the v3 patch. I'm not sure if it's the best way to have pg_stat_get_progress_checkpoint_type, pg_stat_get_progress_checkpoint_kind and pg_stat_get_progress_checkpoint_start_time just for printing info in readable format in pg_stat_progress_checkpoint. I don't think these functions will ever be useful for the users. 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint or checkpoint instead of having a new function pg_stat_get_progress_checkpoint_type? 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks directly instead of new function pg_stat_get_progress_checkpoint_kind? + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", + (flags == 0) ? "unknown" : "", + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", + (flags & CHECKPOINT_FORCE) ? "force " : "", + (flags & CHECKPOINT_WAIT) ? "wait " : "", + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); 3) Why do we need this extra calculation for start_lsn? Do you ever see a negative LSN or something here? +('0/0'::pg_lsn + ( +CASE +WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) +ELSE (0)::numeric +END + (s.param3)::numeric)) AS start_lsn, 4) Can't you use timestamptz_in(to_char(s.param4)) instead of pg_stat_get_progress_checkpoint_start_time? I don't quite understand the reasoning for having this function and it's named as *checkpoint* when it doesn't do anything specific to the checkpoint at all? Having 3 unnecessary functions that aren't useful to the users at all in proc.dat will simply eatup the function oids IMO. Hence, I suggest let's try to do without extra functions. Regards, Bharath Rupireddy.
Re: Add index scan progress to pg_stat_progress_vacuum
> On Wed, Feb 23, 2022 at 10:02 AM Imseih (AWS), Sami wrote: >> If the failsafe kicks in midway through a vacuum, the number indexes_total will not be reset to 0. If INDEX_CLEANUP is turned off, then the value will be 0 at the start of the vacuum. > > The way that this works with num_index_scans is that we "round up" > when there has been non-zero work in lazy_vacuum_all_indexes(), but > not if the precheck in lazy_vacuum_all_indexes() fails. That seems > like a good model to generalize from here. Note that this makes > INDEX_CLEANUP=off affect num_index_scans in much the same way as a > VACUUM where the failsafe kicks in very early, during the initial heap > pass. That is, if the failsafe kicks in before we reach lazy_vacuum() > for the first time (which is not unlikely), or even in the > lazy_vacuum_all_indexes() precheck, then num_index_scans will remain > at 0, just like INDEX_CLEANUP=off. > > The actual failsafe WARNING shows num_index_scans, possibly before it > gets incremented one last time (by "rounding up"). So it's reasonably > clear how this all works from that context (assuming that the > autovacuum logging stuff, which reports num_index_scans, outputs a > report for a table where the failsafe kicked in). >I am confused. If failsafe kicks in during the middle of a vacuum, I >(perhaps naively) would expect indexes_total and indexes_processed to go to >zero, and I'd expect to no longer see the "vacuuming indexes" and "cleaning >up indexes" phases. Otherwise, how would I know that we are now skipping >indexes? Of course, you won't have any historical context about the index >work done before failsafe kicked in, but IMO it is misleading to still >include it in the progress view. Failsafe occurring in the middle of a vacuum and resetting "indexes_total" to 0 will be misleading. I am thinking that it is a better idea to expose only one column "indexes_remaining". If index_cleanup is set to OFF, the values of indexes_remaining will be 0 at the start of the vacuum. If failsafe kicks in during a vacuum in-progress, "indexes_remaining" will be calculated to 0. This approach will provide a progress based on how many indexes remaining with no ambiguity. -- Sami Imseih Amazon Web Services
Re: support for MERGE
On 2022-Jan-28, Andres Freund wrote: > Any chance you could split this into something more reviewable? The overall > diff stats are: 102 files changed, 8589 insertions(+), 234 deletions(-) thats > pretty hard to really review. Incremental commits don't realy help with that. I'll work on splitting this next week. > One thing from skimming: There's not enough documentation about the approach > imo - it's a complicated enough feature to deserve more than the one paragraph > in src/backend/executor/README. Sure, I'll have a look at that. > I'm still [1],[2] uncomfortable with execMerge.c kinda-but-not-really being an > executor node. I think we should make a decision on code arrangement here. From my perspective, MERGE isn't its own executor node; rather it's just another "method" in ModifyTable. Which makes sense, given that all it does is call parts of INSERT, UPDATE, DELETE which are the other ModifyTable methods. Having a separate file doesn't strike me as great, but on the other hand it's true that merely moving all the execMerge.c code into nodeModifyTable.c makes the latter too large. However I don't want to create a .h file that means exposing all those internal functions to the outside world. My ideal would be to have each INSERT, UPDATE, DELETE, MERGE as its own separate .c file, which would be #included from nodeModifyTable.c. We don't use that pattern anywhere though. Any opposition to that? (The prototypes for each file would have to live in nodeModifyTable.c itself.) > > diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c > > index 1a9c1ac290..280ac40e63 100644 > > --- a/src/backend/commands/trigger.c > > +++ b/src/backend/commands/trigger.c > > This stuff seems like one candidate for splitting out. Yeah, I had done that. It's now posted as 0001. > > + /* > > +* We maintain separate transition tables for UPDATE/INSERT/DELETE since > > +* MERGE can run all three actions in a single statement. Note that > > UPDATE > > +* needs both old and new transition tables whereas INSERT needs only > > new, > > +* and DELETE needs only old. > > +*/ > > + > > + /* "old" transition table for UPDATE, if any */ > > + Tuplestorestate *old_upd_tuplestore; > > + /* "new" transition table for UPDATE, if any */ > > + Tuplestorestate *new_upd_tuplestore; > > + /* "old" transition table for DELETE, if any */ > > + Tuplestorestate *old_del_tuplestore; > > + /* "new" transition table INSERT, if any */ > > + Tuplestorestate *new_ins_tuplestore; > > + > > TupleTableSlot *storeslot; /* for converting to tuplestore's > > format */ > > }; > > Do we need to do something slightly clever about the memory limits for the > tuplestore? Each of them having a separate work_mem makes nodeModifyTable.c > one memory hungry node (the worst of all maybe?). Not sure how that would work. The memory handling is inside tuplestore.c IIRC ... I think that would require some largish refactoring. Merely dividing by four doesn't seem like a great answer either. Perhaps we could divide by two and be optimistic about it. > > + /* > > +* Project the tuple. In case of a partitioned > > table, the > > +* projection was already built to use the > > root's descriptor, > > +* so we don't need to map the tuple here. > > +*/ > > + actionInfo.actionState = action; > > + insert_slot = ExecProject(action->mas_proj); > > + > > + (void) ExecInsert(mtstate, rootRelInfo, > > + insert_slot, > > slot, > > + estate, > > &actionInfo, > > + > > mtstate->canSetTag); > > + InstrCountFiltered1(&mtstate->ps, 1); > > What happens if somebody concurrently inserts a conflicting row? An open question to which I don't have any good answer RN. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)
Re: support for MERGE
Alvaro Herrera writes: > I think we should make a decision on code arrangement here. From my > perspective, MERGE isn't its own executor node; rather it's just another > "method" in ModifyTable. Which makes sense, given that all it does is > call parts of INSERT, UPDATE, DELETE which are the other ModifyTable > methods. Having a separate file doesn't strike me as great, but on the > other hand it's true that merely moving all the execMerge.c code into > nodeModifyTable.c makes the latter too large. However I don't want to > create a .h file that means exposing all those internal functions to the > outside world. My ideal would be to have each INSERT, UPDATE, DELETE, > MERGE as its own separate .c file, which would be #included from > nodeModifyTable.c. We don't use that pattern anywhere though. Any > opposition to that? (The prototypes for each file would have to live in > nodeModifyTable.c itself.) Yeah, I don't like that. The point of having separate .c files is that you know that there's no interactions of non-global symbols across files. This pattern breaks that assumption, making it harder to see what's connected to what; and what's it buying exactly? I'd rather keep all the ModifyTable code in one .c file, even if that one is bigger than our usual practice. regards, tom lane
Re: CREATE DATABASE IF NOT EXISTS in PostgreSQL
Japin Li writes: > Why don't support CREATE DATABASE IF NOT EXISTS syntax in PostgreSQL? FWIW, I'm generally hostile to CREATE IF NOT EXISTS semantics across the board, because of its exceedingly squishy semantics: it ensures that an object by that name exists, but you have exactly no guarantees about its properties or contents. The more complex the object, the bigger that problem becomes ... and a whole database is the most complex sort of object we have. So IMV, the fact that we don't have this "feature" is a good thing. We do have DROP DATABASE IF EXISTS, and I think using that followed by CREATE is a much better-defined approach. regards, tom lane
Re: Commitfest manager for 2022-03
On Sat, Feb 26, 2022, at 9:37 PM, Justin Pryzby wrote: > |https://commitfest.postgresql.org/37/2906/ > |Row filtering for logical replication > If I'm not wrong, this is merged and should be closed? I think Amit forgot to mark it as committed. Done. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: support for MERGE
On Sun, Feb 27, 2022 at 9:25 AM Alvaro Herrera wrote: > I attach v12 of MERGE. Significant effort went into splitting > ExecUpdate and ExecDelete into parts that can be reused from the MERGE > routines in a way that doesn't involve jumping out in the middle of > TM_Result processing to merge-specific EvalPlanQual handling. So in > terms of code flow, this is much cleaner. It does mean that MERGE now > has to call three routines instead of just one, but that doesn't seem a > big deal. > > So now the main ExecUpdate first has to call ExecUpdatePrologue, then > ExecUpdateAct, and complete with ExecUpdateEpilogue. In the middle of > all this, it does its own thing to deal with foreign tables, and result > processing for regular tables including EvalPlanQual. ExecUpdate has > been split similarly. > > So MERGE now does these three things, and then its own result > processing. > > Now, naively patching in that way results in absurd argument lists for > these routines, so I introduced a new ModifyTableContext struct to carry > the context for each operation. I think this is good, but perhaps it > could stand some more improvement in terms of what goes in there and > what doesn't. It took me a while to find an arrangement that really > worked. (Initially I had resultRelInfo and the tuple slot and some > flags for DELETE, but it turned out to be better to keep them out.) > > Regarding code arrangement: I decided that exporting all those > ModifyTable internal functions was a bad idea, so I made it all static > again. I think the MERGE routines are merely additional ModifyTable > methods; trying to make it a separate executor node doesn't seem like > it'd be an improvement. For now, I just made nodeModifyTable.c #include > execMerge.c, though I'm not sure if moving all the code into > nodeModifyTable.c would be a good idea, or whether we should create > separate files for each of those methods. Generally speaking I'm not > enthusiastic about creating an exported API of these methods. If we > think nodeModifyTable.c is too large, maybe we can split it in smaller > files and smash them together with #include "foo.c". (If we do expose > it in some .h then the ModifyTableContext would have to be exported as > well, which doesn't sound too exciting.) > > > Sadly, because I've been spending a lot of time preparing for an > international move, I didn't have time to produce a split patch that > would first restructure nodeModifyTable.c making this more reviewable, > but I'll do that as soon as I'm able. There is also a bit of a bug in a > corner case of an UPDATE that involves a cross-partition tuple migration > with another concurrent update. I was unable to figure out how to fix > that, so I commented out the affected line from merge-update.spec. > Again, I'll get that fixed as soon as the dust has settled here. > > There are some review points that are still unaddressed, such as > Andres' question of what happens in case of a concurrent INSERT. > > Thanks > > -- > Álvaro Herrera 39°49'30"S 73°17'W — > https://www.EnterpriseDB.com/ > "La fuerza no está en los medios físicos > sino que reside en una voluntad indomable" (Gandhi) > Hi, + else if (node->operation == CMD_MERGE) + { + /* EXPLAIN ANALYZE display of actual outcome for each tuple proposed */ I know the comment was copied. But it seems `proposed` should be `processed`. For ExecMergeNotMatched(): + foreach(l, actionStates) + { ... + break; + } If there is only one state in the List, maybe add an assertion for the length of the actionStates List. + ModifyTableContext sc; It seems the variable name can be more intuitive. How about naming it mtc (or context, as what later code uses) ? Cheers
Re: support for MERGE
> On 27 Feb 2022, at 18:42, Tom Lane wrote: > I'd rather keep all the ModifyTable code in one .c file, even if that one is > bigger than our usual practice. Agreed, I also prefer a (too) large file over a set of .c #include's. -- Daniel Gustafsson https://vmware.com/
Re: Separate the result of \watch for each query execution (psql)
Hi, 2022年2月25日(金) 13:42 Pavel Stehule : > > > > pá 25. 2. 2022 v 5:23 odesílatel Noboru Saito napsal: >> >> Hi, >> >> Pavel Stehule : >> > > I strongly agree. It was a lot of work to find a workable solution for >> > > pspg. Special chars that starting result and maybe other, that ending >> > > result can significantly increase robustness and can reduce code. I >> > > think it can be better to use form feed at the end of form - like it is >> > > semantic of form feed. You know, at this moment, the result is complete. >> > > https://en.wikipedia.org/wiki/Page_break >> > >> > It's easier to print a form feed before the result, but it's okay at the >> > end. >> > >> > > I don't think using it by default can be the best. Lot of people don't >> > > use specialized pagers, but it can be set by \pset. Form feed should be >> > > used on end >> > > >> > > \pset formfeed [on, off] >> > >> > I think it's a good idea to be able to switch with \pset. >> >> I have created a patch that allows you to turn it on and off in \pset. >> The attached patch adds the following features. >> >> >> Formfeed can be turned on with the command line option or \pset. >> Formfeed (\f\n) is output after the query execution result by \watch. >> >> I think the considerations are as follows. >> >> * Is formfeed output after the result, not before? > > > We are talking about the first iteration. In the second and other iteration > this question has no sense. You know the starting point. You don't know the > endpoint. So I think so using formfeed on the end is good idea. Yes. However, as you know, in the case of \ watch, there is a difference in waiting before and after the specified time. >> * Is the formfeed output only "\f\n"? > > > yes > >> >> * Is the formfeed output only executed by \watch? > > > This is a good question. I think the implementation for \watch is a good > start. But it can help with normal results too. In pspg it can be very useful > for incremental load or for streaming mode. But if it will be used > everywhere, then it should be used just for some specified pagers. > > >> >> * Is the name "formfeed" appropriate? > > > If it will do work of formfeed, then the formfeed is good name. > >> >> >> If the formfeed is output before the query result, >> it will be better if the screen is reset when the formfeed is read. > > > I think, so you propose another feature - reset terminal sequence - it can be > a good idea. Not for pspg, but generally, why not. Yes, it is. I was wondering if I could just add formfeed or add more features. Thank you. Following your advice, I will register this for the commitfest (I am registering for the commitfest for the first time. Thank you for your cooperation).
Re: Missed condition-variable wakeups on FreeBSD
Andres Freund writes: > On 2022-02-26 14:07:05 -0500, Tom Lane wrote: >> I have observed this three times in the REL_11 branch, once >> in REL_12, and a couple of times last summer before it occurred >> to me to start keeping notes. Over that time the machine has >> been running various patchlevels of FreeBSD 13.0. > It's certainly interesting that it appears to happen only in the branches > using poll rather than kqueue to implement latches. That changed between 12 > and 13. Yeah, and there was no PHJ in v10, so that's a pretty good theory as to why I've only seen it in those two branches. > Have you tried running the core regression tests with force_parallel_mode = > on, or with the parallel costs lowered, to see if that makes the problem > appear more often? > The next time this happens / if you still have this open, perhaps it could be > worth checking if there's a byte in the self pipe? > Besides trying to make the issue more likely as suggested above, it might be > worth checking if signalling the stuck processes with SIGUSR1 gets them > unstuck. I've now wasted a bunch of kilowatt-hours fruitlessly trying to reproduce this outside the confines of the buildfarm script. I'm at a loss to figure out what the buildfarm is doing differently, but apparently there's something. I'm going to re-enable the machine's buildfarm job and just wait for it to hang up again. More info eventually ... regards, tom lane
Re: Proposal: Support custom authentication methods using hooks
On Fri, 2022-02-25 at 14:10 -0500, Tom Lane wrote: > I'm happy to add support for custom auth methods if they can use > a protocol that's safer than cleartext-password. But if that's the > only feasible option, then we're just encouraging people to use > insecure methods. FWIW, I'd like to be able to use a token in the password field, and then authenticate that token. So I didn't intend to send an actual password in plaintext. Maybe we should have a new "token" connection parameter so that libpq can allow sending a token plaintext but refuse to send a password in plaintext? > I also have in mind here that there has been discussion of giving > libpq a feature to refuse, on the client side, to send cleartext > passwords. I am generally in favor of that idea, but I'm not sure that will completely resolve the issue. For instance, should it also refuse MD5? Regards, Jeff Davis
Re: Missed condition-variable wakeups on FreeBSD
On Sun, Feb 27, 2022 at 8:07 AM Tom Lane wrote: > I don't know much about how gdb interacts with kernel calls on > FreeBSD, but I speculate that the poll(2) call returns with EINTR > after gdb releases the process, and then things resume fine, Yeah, at least FreeBSD and macOS interrupt system calls when you send SIGSTOP + SIGCONT directly, or when gdb and lldb do something similar via ptrace. The same applies on Linux, except that Linux restarts the system call automatically (even though the man page has this in the list of system calls that never restart) so you don't usually notice (though you can see it with strace). It's not really clear to me what should happen given the language around restarts is all about signal handlers, and stop + cont are system magic, not signal handlers. Anyway... > suggesting that we lost an interrupt somewhere. So it's happening on an i386 kernel, with WAIT_USE_POLL (not WAIT_USE_KQUEUE), but only under the build farm script (not when you ran it manually in loop) hmmm.
Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
On Sun, Feb 27, 2022 at 10:29 PM Julien Rouhaud wrote: > On Sun, Feb 27, 2022 at 06:10:45PM +0900, Michael Paquier wrote: > > On Sat, Feb 26, 2022 at 01:39:42PM -0800, Andres Freund wrote: > > > I suspect the easiest is to just convert that usleep to a WaitLatch(). > > > That'd > > > require adding a new enum value to WaitEventTimeout in 14. Which probably > > > is > > > fine? > > > > We've added wait events in back-branches in the past, so this does not > > strike me as a problem as long as you add the new entry at the end of > > the enum, while keeping things ordered on HEAD. > > +1 +1 Sleeps like these are also really bad for ProcSignalBarrier, which I was expecting to be the impetus for fixing any remaining loops like this. With the attached, 027_stream_regress.pl drops from ~29.5s to ~19.6s on my FreeBSD workstation! It seems a little strange to introduce a new wait event that will very often appear into a stable branch, but ... it is actually telling the truth, so there is that. The sleep/poll loop in RegisterSyncRequest() may also have another problem. The comment explains that it was a deliberate choice not to do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't think there's an excuse to ignore postmaster death in a loop that presumably becomes infinite if the checkpointer exits. I guess we could do: - pg_usleep(1L); + WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10, WAIT_EVENT_SYNC_REQUEST); But... really, this should be waiting on a condition variable that the checkpointer broadcasts on when the queue goes from full to not full, no? Perhaps for master only? From 23abdf95dea74b914dc56a46739a63ff86035f51 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 28 Feb 2022 11:27:05 +1300 Subject: [PATCH] Wake up for latches in CheckpointWriteDelay(). The checkpointer shouldn't ignore its latch. Other backends may be waiting for it to drain the request queue. Hopefully real systems don't have a full queue often, but the condition is reached easily when shared buffers is very small. This involves defining a new wait event, which will appear in the pg_stat_activity view often due to spread checkpoints. Back-patch only to 14. Even though the problem exists in earlier branches too, it's hard to hit there. In 14 we stopped using signal handers for latches on Linux, *BSD and macOS, which were previously hiding this problem by interrupting the sleep (though not reliably, as the signal could arrive before the sleep begins; precisely the problem latches address). Reported-by: Andres Freund Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de --- doc/src/sgml/monitoring.sgml| 4 src/backend/postmaster/checkpointer.c | 5 - src/backend/utils/activity/wait_event.c | 3 +++ src/include/utils/wait_event.h | 1 + 4 files changed, 12 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index bf7625d988..9dc3978f35 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -2236,6 +2236,10 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser BaseBackupThrottle Waiting during base backup when throttling activity. + + CheckpointerWriteDelay + Waiting between writes while performing a checkpoint. + PgSleep Waiting due to a call to pg_sleep or diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 4488e3a443..9d9aad166e 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -726,7 +726,10 @@ CheckpointWriteDelay(int flags, double progress) * Checkpointer and bgwriter are no longer related so take the Big * Sleep. */ - pg_usleep(10L); + WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, + 100, + WAIT_EVENT_CHECKPOINT_WRITE_DELAY); + ResetLatch(MyLatch); } else if (--absorb_counter <= 0) { diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c index 60972c3a75..0706e922b5 100644 --- a/src/backend/utils/activity/wait_event.c +++ b/src/backend/utils/activity/wait_event.c @@ -485,6 +485,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w) case WAIT_EVENT_BASE_BACKUP_THROTTLE: event_name = "BaseBackupThrottle"; break; + case WAIT_EVENT_CHECKPOINT_WRITE_DELAY: + event_name = "CheckpointWriteDelay"; + break; case WAIT_EVENT_PG_SLEEP: event_name = "PgSleep"; break; diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h index 395d325c5f..d0345c6b49 100644 --- a/src/include/utils/wait_event.h +++ b/src/include/utils/wait_event.h @@ -141,6 +141,7 @@ typedef enum typedef enum { WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT, + WAIT_EVENT_CHECKPOINT_WRITE_DELAY, WAIT_EVENT_PG_SLEEP, WAIT_EVENT_RECOVERY_APPLY_D
Re: CREATE DATABASE IF NOT EXISTS in PostgreSQL
On Mon, 28 Feb 2022 at 01:53, Tom Lane wrote: > Japin Li writes: >> Why don't support CREATE DATABASE IF NOT EXISTS syntax in PostgreSQL? > > FWIW, I'm generally hostile to CREATE IF NOT EXISTS semantics across > the board, because of its exceedingly squishy semantics: it ensures > that an object by that name exists, but you have exactly no guarantees > about its properties or contents. Thanks for the explanation! I think it is the database user who should guarantee the properties and contents of a database. CREATE IF NOT EXISTS is just a syntax sugar. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Design of pg_stat_subscription_workers vs pgstats
On Sat, Feb 26, 2022 at 11:51 AM Amit Kapila wrote: > > On Thu, Feb 24, 2022 at 9:20 PM Masahiko Sawada wrote: > > > > I have reviewed the latest version and made a few changes along with > fixing some of the pending comments by Peter Smith. Thank you for updating the patch! > The changes are as > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as > that is not required now; (b) changed the struct name > PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it > similar to DropDb; (c) changed the view name to > pg_stat_subscription_stats, we can reconsider it in future if there is > a consensus on some other name, accordingly changed the reset function > name to pg_stat_reset_subscription_stats; (d) moved some of the newly > added subscription stats functions adjacent to slots to main the > consistency in code; (e) changed comments at few places; Agreed. > (f) added > LATERAL back to system_views query as we refer pg_subscription's oid > in the function call, previously that was not clear. I think LATERAL is still unnecessary as you pointed out before. The documentation[1] says, LATERAL can also precede a function-call FROM item, but in this case it is a noise word, because the function expression can refer to earlier FROM items in any case. The rest looks good to me. Regards, [1] https://www.postgresql.org/docs/devel/sql-select.html -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
Hi, On February 27, 2022 4:19:21 PM PST, Thomas Munro wrote: >With the attached, 027_stream_regress.pl drops from ~29.5s to ~19.6s >on my FreeBSD workstation! That's impressive - wouldn't have guessed it to make that much of a difference. I assume that running the tests on freebsd for an older pg with a similar s_b & max_wal_size doesn't benefit as much? I wonder how much windows will improve. >It seems a little strange to introduce a new wait event that will very >often appear into a stable branch, but ... it is actually telling the >truth, so there is that. In the back branches it needs to be at the end of the enum - I assume you intended that just to be for HEAD. I wonder whether in HEAD we shouldn't make that sleep duration be computed from the calculation in IsOnSchedule... >The sleep/poll loop in RegisterSyncRequest() may also have another >problem. The comment explains that it was a deliberate choice not to >do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't >think there's an excuse to ignore postmaster death in a loop that >presumably becomes infinite if the checkpointer exits. I guess we >could do: > >- pg_usleep(1L); >+ WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10, >WAIT_EVENT_SYNC_REQUEST); > >But... really, this should be waiting on a condition variable that the >checkpointer broadcasts on when the queue goes from full to not full, >no? Perhaps for master only? Looks worth improving, but yes, I'd not do it in the back branches. I do think it's worth giving that sleep a proper wait event though, even in the back branches. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
At Sat, 26 Feb 2022 12:11:15 +0900, Michael Paquier wrote in > On Fri, Feb 25, 2022 at 01:09:53PM -0800, Nathan Bossart wrote: > > This one has been quiet for a while. Should we mark it as > > returned-with-feedback? > > Yes, that's my feeling and I got cold feet about this change. This > patch would bring some extra visibility for something that's not > incorrect either on HEAD, as end-of-recovery checkpoints are the same > things as shutdown checkpoints. And there is an extra argument where > back-patching would become a bit more tricky in an area that's already > a lot sensitive. That sounds like we should reject the patch as we don't agree to its objective. If someday end-of-recovery checkpoints functionally diverge from shutdown checkpoints but leave (somehow) the transition alone, we may visit this again but it would be another proposal. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
On Mon, Feb 28, 2022 at 10:51:06AM +0900, Kyotaro Horiguchi wrote: > That sounds like we should reject the patch as we don't agree to its > objective. If someday end-of-recovery checkpoints functionally > diverge from shutdown checkpoints but leave (somehow) the transition > alone, we may visit this again but it would be another proposal. The patch has been already withdrawn in the CF app. -- Michael signature.asc Description: PGP signature
Re: pg_stat_statements: remove redundant function call in pg_stat_statements_internal
On Sun, Feb 27, 2022 at 09:08:56PM +0900, Michael Paquier wrote: > Yes, let's clean up that on HEAD. No objections from here. I'll do > that tomorrow or so. And done. -- Michael signature.asc Description: PGP signature
Re: Design of pg_stat_subscription_workers vs pgstats
On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com wrote: > > On Saturday, February 26, 2022 11:51 AM Amit Kapila > wrote: > > I have reviewed the latest version and made a few changes along with fixing > > some of the pending comments by Peter Smith. The changes are as > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as that is > > not required now; (b) changed the struct name PgStat_MsgSubscriptionPurge > > to PgStat_MsgSubscriptionDrop to make it similar to DropDb; (c) changed the > > view name to pg_stat_subscription_stats, we can reconsider it in future if > > there > > is a consensus on some other name, accordingly changed the reset function > > name to pg_stat_reset_subscription_stats; (d) moved some of the newly > > added subscription stats functions adjacent to slots to main the > > consistency in > > code; (e) changed comments at few places; (f) added LATERAL back to > > system_views query as we refer pg_subscription's oid in the function call, > > previously that was not clear. > > > > Do let me know what you think of the attached? > Hi, thank you for updating the patch ! > > > I have a couple of comments on v4. > > (1) > > I'm not sure if I'm correct, but I'd say the sync_error_count > can come next to the subname as the order of columns. > I felt there's case that the column order is somewhat > related to the time/processing order (I imagined > pg_stat_replication's LSN related columns). > If this was right, table sync related column could be the > first column as a counter within this patch. > I am not sure if there is such a correlation but even if it is there it doesn't seem to fit here completely as sync errors can happen after apply errors in multiple ways like via Alter Subscription ... Refresh ... So, I don't see the need to change the order here. What do you or others think? > > (2) doc/src/sgml/monitoring.sgml > > +Resets statistics for a single subscription shown in the > +pg_stat_subscription_stats view to zero. If > +the argument is NULL, reset statistics for all > +subscriptions. > > > I felt we could improve the first sentence. > > From: > Resets statistics for a single subscription shown in the.. > > To(idea1): > Resets statistics for a single subscription defined by the argument to zero. > Okay, I can use this one. -- With Regards, Amit Kapila.
Re: Design of pg_stat_subscription_workers vs pgstats
On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila wrote: > > On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com > wrote: > > > > On Saturday, February 26, 2022 11:51 AM Amit Kapila > > wrote: > > > I have reviewed the latest version and made a few changes along with > > > fixing > > > some of the pending comments by Peter Smith. The changes are as > > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as that > > > is > > > not required now; (b) changed the struct name PgStat_MsgSubscriptionPurge > > > to PgStat_MsgSubscriptionDrop to make it similar to DropDb; (c) changed > > > the > > > view name to pg_stat_subscription_stats, we can reconsider it in future > > > if there > > > is a consensus on some other name, accordingly changed the reset function > > > name to pg_stat_reset_subscription_stats; (d) moved some of the newly > > > added subscription stats functions adjacent to slots to main the > > > consistency in > > > code; (e) changed comments at few places; (f) added LATERAL back to > > > system_views query as we refer pg_subscription's oid in the function call, > > > previously that was not clear. > > > > > > Do let me know what you think of the attached? > > Hi, thank you for updating the patch ! > > > > > > I have a couple of comments on v4. > > > > (1) > > > > I'm not sure if I'm correct, but I'd say the sync_error_count > > can come next to the subname as the order of columns. > > I felt there's case that the column order is somewhat > > related to the time/processing order (I imagined > > pg_stat_replication's LSN related columns). > > If this was right, table sync related column could be the > > first column as a counter within this patch. > > > > I am not sure if there is such a correlation but even if it is there > it doesn't seem to fit here completely as sync errors can happen after > apply errors in multiple ways like via Alter Subscription ... Refresh > ... > > So, I don't see the need to change the order here. What do you or others > think? I'm also not sure about it, both sound good to me. Probably we can change the order later. > > > > > (2) doc/src/sgml/monitoring.sgml > > > > +Resets statistics for a single subscription shown in the > > +pg_stat_subscription_stats view to zero. > > If > > +the argument is NULL, reset statistics for all > > +subscriptions. > > > > > > I felt we could improve the first sentence. > > > > From: > > Resets statistics for a single subscription shown in the.. > > > > To(idea1): > > Resets statistics for a single subscription defined by the argument to zero. > > > > Okay, I can use this one. Are you going to remove the part "shown in the pg_stat_subsctiption_stats view"? I think it's better to keep it in order to make it clear which statistics the function resets as we have pg_stat_subscription and pg_stat_subscription_stats. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Design of pg_stat_subscription_workers vs pgstats
On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada wrote: > > On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila wrote: > > > > > > > > (2) doc/src/sgml/monitoring.sgml > > > > > > +Resets statistics for a single subscription shown in the > > > +pg_stat_subscription_stats view to > > > zero. If > > > +the argument is NULL, reset statistics for all > > > +subscriptions. > > > > > > > > > I felt we could improve the first sentence. > > > > > > From: > > > Resets statistics for a single subscription shown in the.. > > > > > > To(idea1): > > > Resets statistics for a single subscription defined by the argument to > > > zero. > > > > > > > Okay, I can use this one. > > Are you going to remove the part "shown in the > pg_stat_subsctiption_stats view"? I think it's better to keep it in > order to make it clear which statistics the function resets as we have > pg_stat_subscription and pg_stat_subscription_stats. > How about the following: "Resets statistics for a single subscription defined by the argument shown in the pg_stat_subscription_stats view to zero. If the argument is NULL, reset statistics for all subscriptions." -- With Regards, Amit Kapila.
Re: Design of pg_stat_subscription_workers vs pgstats
On Mon, Feb 28, 2022 at 11:52 AM Amit Kapila wrote: > > On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada wrote: > > > > On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila > > wrote: > > > > > > > > > > > (2) doc/src/sgml/monitoring.sgml > > > > > > > > +Resets statistics for a single subscription shown in the > > > > +pg_stat_subscription_stats view to > > > > zero. If > > > > +the argument is NULL, reset statistics for > > > > all > > > > +subscriptions. > > > > > > > > > > > > I felt we could improve the first sentence. > > > > > > > > From: > > > > Resets statistics for a single subscription shown in the.. > > > > > > > > To(idea1): > > > > Resets statistics for a single subscription defined by the argument to > > > > zero. > > > > > > > > > > Okay, I can use this one. > > > > Are you going to remove the part "shown in the > > pg_stat_subsctiption_stats view"? I think it's better to keep it in > > order to make it clear which statistics the function resets as we have > > pg_stat_subscription and pg_stat_subscription_stats. > > > > How about the following: > "Resets statistics for a single subscription defined by the argument > shown in the pg_stat_subscription_stats view > to zero. If the argument is NULL, reset statistics > for all subscriptions." Sounds good but I'm not sure it's correct in terms of English grammar. Shouldn't it be something like "subscription that is defined by the argument and shown in the pg_stat_subscription_stats"? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Design of pg_stat_subscription_workers vs pgstats
On Monday, February 28, 2022 11:34 AM Amit Kapila wrote: > On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com > wrote: > > > > On Saturday, February 26, 2022 11:51 AM Amit Kapila > wrote: > > > I have reviewed the latest version and made a few changes along with > > > fixing some of the pending comments by Peter Smith. The changes are > > > as > > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as > > > that is not required now; (b) changed the struct name > > > PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it > > > similar to DropDb; (c) changed the view name to > > > pg_stat_subscription_stats, we can reconsider it in future if there > > > is a consensus on some other name, accordingly changed the reset > > > function name to pg_stat_reset_subscription_stats; (d) moved some of > > > the newly added subscription stats functions adjacent to slots to > > > main the consistency in code; (e) changed comments at few places; > > > (f) added LATERAL back to system_views query as we refer > pg_subscription's oid in the function call, previously that was not clear. > > > > > > Do let me know what you think of the attached? > > Hi, thank you for updating the patch ! > > I have a couple of comments on v4. > > > > (1) > > > > I'm not sure if I'm correct, but I'd say the sync_error_count can come > > next to the subname as the order of columns. > > I felt there's case that the column order is somewhat related to the > > time/processing order (I imagined pg_stat_replication's LSN related > > columns). > > If this was right, table sync related column could be the first column > > as a counter within this patch. > > > > I am not sure if there is such a correlation but even if it is there it > doesn't seem > to fit here completely as sync errors can happen after apply errors in > multiple > ways like via Alter Subscription ... Refresh ... > > So, I don't see the need to change the order here. What do you or others > think? In the alter subscription case, any errors after the table sync would increment apply_error_count. I mentioned this, because this point of view would impact on the doc read by users and internal source codes for developers. I had a concern that when we extend and increase a lot of statistics (not only for this view, but also other statistics in general), writing doc for statistics needs some alignment for better readability. *But*, as you mentioned, in case we don't have such a correlation, I'm okay with the current patch. Thank you for replying. Best Regards, Takamichi Osumi
Re: Design of pg_stat_subscription_workers vs pgstats
Below are my comments for the v4 patch. These are only nitpicking comments now; Otherwise, it LGTM. (Sorry, now I see there are some overlaps with comments posted in the last 20 mins so take or leave these as you wish) == 1. doc/src/sgml/monitoring.sgml - - OID of the relation that the worker was processing when the - error occurred + Number of times an error occurred during the application of changes BEFORE Number of times an error occurred during the application of changes SUGGESTED Number of times an error occurred while applying changes ~~~ 2. doc/src/sgml/monitoring.sgml +Resets statistics for a single subscription shown in the +pg_stat_subscription_stats view to zero. If +the argument is NULL, reset statistics for all +subscriptions. SUGGESTED (simpler description, more similar to pg_stat_reset_replication_slot) Reset statistics to zero for a single subscription. If the argument is NULL, reset statistics for all subscriptions. ~~~ 3. src/backend/replication/logical/worker.c - comment + /* */ + pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker()); BEFORE Report the worker failed during the application of the change SUGGESTED Report the worker failed while applying changes ~~~ 4. src/include/pgstat.h - comment +typedef struct PgStat_MsgResetsubcounter +{ + PgStat_MsgHdr m_hdr; + Oid m_subid; /* InvalidOid for clearing all subscription + * stats */ +} PgStat_MsgResetsubcounter; BEFORE InvalidOid for clearing all subscription stats SUGGESTED InvalidOid means reset all subscription stats -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Commitfest manager for 2022-03
On Sun, Feb 27, 2022 at 11:37 PM Euler Taveira wrote: > > On Sat, Feb 26, 2022, at 9:37 PM, Justin Pryzby wrote: > > |https://commitfest.postgresql.org/37/2906/ > |Row filtering for logical replication > If I'm not wrong, this is merged and should be closed? > > I think Amit forgot to mark it as committed. Done. > I was waiting for some build farm cycles to complete but it is fine to mark it now. -- With Regards, Amit Kapila.
Re: Design of pg_stat_subscription_workers vs pgstats
On Mon, Feb 28, 2022 at 8:59 AM Peter Smith wrote: > > > 2. doc/src/sgml/monitoring.sgml > > +Resets statistics for a single subscription shown in the > +pg_stat_subscription_stats view to zero. If > +the argument is NULL, reset statistics for all > +subscriptions. > > > SUGGESTED (simpler description, more similar to > pg_stat_reset_replication_slot) > Reset statistics to zero for a single subscription. If the argument is > NULL, reset statistics for all subscriptions. > As discussed, it is better to keep the view name in this description important as we have another view (pg_stat_susbcription) as well. So, I am planning to retain the current wording. -- With Regards, Amit Kapila.
Re: Design of pg_stat_subscription_workers vs pgstats
On Mon, Feb 28, 2022 at 8:49 AM osumi.takami...@fujitsu.com wrote: > > On Monday, February 28, 2022 11:34 AM Amit Kapila > wrote: > > On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com > > wrote: > > > > > > On Saturday, February 26, 2022 11:51 AM Amit Kapila > > wrote: > > > > I have reviewed the latest version and made a few changes along with > > > > fixing some of the pending comments by Peter Smith. The changes are > > > > as > > > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as > > > > that is not required now; (b) changed the struct name > > > > PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it > > > > similar to DropDb; (c) changed the view name to > > > > pg_stat_subscription_stats, we can reconsider it in future if there > > > > is a consensus on some other name, accordingly changed the reset > > > > function name to pg_stat_reset_subscription_stats; (d) moved some of > > > > the newly added subscription stats functions adjacent to slots to > > > > main the consistency in code; (e) changed comments at few places; > > > > (f) added LATERAL back to system_views query as we refer > > pg_subscription's oid in the function call, previously that was not clear. > > > > > > > > Do let me know what you think of the attached? > > > Hi, thank you for updating the patch ! > > > I have a couple of comments on v4. > > > > > > (1) > > > > > > I'm not sure if I'm correct, but I'd say the sync_error_count can come > > > next to the subname as the order of columns. > > > I felt there's case that the column order is somewhat related to the > > > time/processing order (I imagined pg_stat_replication's LSN related > > > columns). > > > If this was right, table sync related column could be the first column > > > as a counter within this patch. > > > > > > > I am not sure if there is such a correlation but even if it is there it > > doesn't seem > > to fit here completely as sync errors can happen after apply errors in > > multiple > > ways like via Alter Subscription ... Refresh ... > > > > So, I don't see the need to change the order here. What do you or others > > think? > In the alter subscription case, any errors after the table sync would > increment > apply_error_count. > Sure, but the point I was trying to explain was that there is no certainty in the order of these errors. -- With Regards, Amit Kapila.
RE: Design of pg_stat_subscription_workers vs pgstats
On Monday, February 28, 2022 12:57 PM Amit Kapila > On Mon, Feb 28, 2022 at 8:49 AM osumi.takami...@fujitsu.com > wrote: > > > > On Monday, February 28, 2022 11:34 AM Amit Kapila > wrote: > > > On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com > > > wrote: > > > > > > > > On Saturday, February 26, 2022 11:51 AM Amit Kapila > > > wrote: > > > > > I have reviewed the latest version and made a few changes along > > > > > with fixing some of the pending comments by Peter Smith. The > > > > > changes are as > > > > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError > > > > > as that is not required now; (b) changed the struct name > > > > > PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to > > > > > make it similar to DropDb; (c) changed the view name to > > > > > pg_stat_subscription_stats, we can reconsider it in future if > > > > > there is a consensus on some other name, accordingly changed the > > > > > reset function name to pg_stat_reset_subscription_stats; (d) > > > > > moved some of the newly added subscription stats functions > > > > > adjacent to slots to main the consistency in code; (e) changed > > > > > comments at few places; > > > > > (f) added LATERAL back to system_views query as we refer > > > pg_subscription's oid in the function call, previously that was not clear. > > > > > > > > > > Do let me know what you think of the attached? > > > > Hi, thank you for updating the patch ! > > > > I have a couple of comments on v4. > > > > > > > > (1) > > > > > > > > I'm not sure if I'm correct, but I'd say the sync_error_count can > > > > come next to the subname as the order of columns. > > > > I felt there's case that the column order is somewhat related to > > > > the time/processing order (I imagined pg_stat_replication's LSN > > > > related columns). > > > > If this was right, table sync related column could be the first > > > > column as a counter within this patch. > > > > > > > > > > I am not sure if there is such a correlation but even if it is there > > > it doesn't seem to fit here completely as sync errors can happen > > > after apply errors in multiple ways like via Alter Subscription ... > > > Refresh ... > > > > > > So, I don't see the need to change the order here. What do you or others > think? > > In the alter subscription case, any errors after the table sync would > > increment apply_error_count. > > > > Sure, but the point I was trying to explain was that there is no certainty in > the > order of these errors. I got it. Thank you so much for your explanation. I don't have other new comments on this patch. It looks good to me as well. Best Regards, Takamichi Osumi
Re: Design of pg_stat_subscription_workers vs pgstats
On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada wrote: > > On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila wrote: > > > > > > > > (2) doc/src/sgml/monitoring.sgml > > > > > > +Resets statistics for a single subscription shown in the > > > +pg_stat_subscription_stats view to > > > zero. If > > > +the argument is NULL, reset statistics for all > > > +subscriptions. > > > > > > > > > I felt we could improve the first sentence. > > > > > > From: > > > Resets statistics for a single subscription shown in the.. > > > > > > To(idea1): > > > Resets statistics for a single subscription defined by the argument to > > > zero. > > > > > > > Okay, I can use this one. > > Are you going to remove the part "shown in the > pg_stat_subsctiption_stats view"? I think it's better to keep it in > order to make it clear which statistics the function resets as we have > pg_stat_subscription and pg_stat_subscription_stats. > I decided to keep this part of the docs as it is and fixed a few other minor comments raised by you and Peter. Additionally, I have bumped the PGSTAT_FILE_FORMAT_ID. I'll push this patch tomorrow unless there are any other major comments. -- With Regards, Amit Kapila. v5-0001-Reconsider-pg_stat_subscription_workers-view.patch Description: Binary data
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
On Sun, Feb 27, 2022 at 8:44 PM Bharath Rupireddy wrote: > > On Fri, Feb 25, 2022 at 8:38 PM Nitin Jadhav > wrote: > > Had a quick look over the v3 patch. I'm not sure if it's the best way > to have pg_stat_get_progress_checkpoint_type, > pg_stat_get_progress_checkpoint_kind and > pg_stat_get_progress_checkpoint_start_time just for printing info in > readable format in pg_stat_progress_checkpoint. I don't think these > functions will ever be useful for the users. > > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint > or checkpoint instead of having a new function > pg_stat_get_progress_checkpoint_type? > > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks > directly instead of new function pg_stat_get_progress_checkpoint_kind? > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", > + (flags == 0) ? "unknown" : "", > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", > + (flags & CHECKPOINT_FORCE) ? "force " : "", > + (flags & CHECKPOINT_WAIT) ? "wait " : "", > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); > > 3) Why do we need this extra calculation for start_lsn? Do you ever > see a negative LSN or something here? > +('0/0'::pg_lsn + ( > +CASE > +WHEN (s.param3 < 0) THEN pow((2)::numeric, (64)::numeric) > +ELSE (0)::numeric > +END + (s.param3)::numeric)) AS start_lsn, > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > the reasoning for having this function and it's named as *checkpoint* > when it doesn't do anything specific to the checkpoint at all? > > Having 3 unnecessary functions that aren't useful to the users at all > in proc.dat will simply eatup the function oids IMO. Hence, I suggest > let's try to do without extra functions. Another thought for my review comment: > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint > or checkpoint instead of having a new function > pg_stat_get_progress_checkpoint_type? I don't think using pg_is_in_recovery work here as it is taken after the checkpoint has started. So, I think the right way here is to send 1 in CreateCheckPoint and 2 in CreateRestartPoint and use CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint". Continuing my review: 5) Do we need a special phase for this checkpoint operation? I'm not sure in which cases it will take a long time, but it looks like there's a wait loop here. vxids = GetVirtualXIDsDelayingChkpt(&nvxids); if (nvxids > 0) { do { pg_usleep(1L); /* wait for 10 msec */ } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids)); } Also, how about special phases for SyncPostCheckpoint(), SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but it might be increase in future (?)), TruncateSUBTRANS()? 6) SLRU (Simple LRU) isn't a phase here, you can just say PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES. + + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES); CheckPointPredicate(); And :s/checkpointing SLRU pages/checkpointing predicate lock pages + WHEN 9 THEN 'checkpointing SLRU pages' 7) :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN 'processing file sync requests' 8) :s/Finalizing/finalizing + WHEN 14 THEN 'Finalizing' 9) :s/checkpointing snapshots/checkpointing logical replication snapshot files + WHEN 3 THEN 'checkpointing snapshots' :s/checkpointing logical rewrite mappings/checkpointing logical replication rewrite mapping files + WHEN 4 THEN 'checkpointing logical rewrite mappings' 10) I'm not sure if it's discussed, how about adding the number of snapshot/mapping files so far the checkpoint has processed in file processing while loops of CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can be many logical snapshot or mapping files and users may be interested in knowing the so-far-processed-file-count. 11) I think it's discussed, are we going to add the pid of the checkpoint requestor? Regards, Bharath Rupireddy.
Re: Synchronizing slots from primary to standby
On Thu, Feb 24, 2022 at 12:46 AM James Coleman wrote: > I've been working on adding test coverage to prove this out, but I've > encountered the problem reported in [1]. > > My assumption, but Andres please correct me if I'm wrong, that we > should see issues with the following steps (given the primary, > physical replica, and logical subscriber already created in the test): > > 1. Ensure both logical subscriber and physical replica are caught up > 2. Disable logical subscription > 3. Make a catalog change on the primary (currently renaming the > primary key column) > 4. Vacuum pg_class > 5. Ensure physical replication is caught up > 6. Stop primary and promote the replica > 7. Write to the changed table > 8. Update subscription to point to promoted replica > 9. Re-enable logical subscription > > I'm attaching my test as an additional patch in the series for > reference. Currently I have steps 3 and 4 commented out to show that > the issues in [1] occur without any attempt to trigger the catalog > xmin problem. > > Given this error seems pretty significant in terms of indicating > fundamental lack of test coverage (the primary stated benefit of the > patch is physical failover), and it currently is a blocker to testing > more deeply. Few of my initial concerns specified at [1] are this: 1) Instead of a new LIST_SLOT command, can't we use READ_REPLICATION_SLOT (slight modifications needs to be done to make it support logical replication slots and to get more information from the subscriber). 2) How frequently the new bg worker is going to sync the slot info? How can it ensure that the latest information exists say when the subscriber is down/crashed before it picks up the latest slot information? 4) IIUC, the proposal works only for logical replication slots but do you also see the need for supporting some kind of synchronization of physical replication slots as well? IMO, we need a better and consistent way for both types of replication slots. If the walsender can somehow push the slot info from the primary (for physical replication slots)/publisher (for logical replication slots) to the standby/subscribers, this will be a more consistent and simplistic design. However, I'm not sure if this design is doable at all. Can anyone help clarify these? [1] https://www.postgresql.org/message-id/CALj2ACUGNGfWRtwwZwT-Y6feEP8EtOMhVTE87rdeY14mBpsRUA%40mail.gmail.com Regards, Bharath Rupireddy.
Re: why do hash index builds use smgrextend() for new splitpoint pages
On Sat, Feb 26, 2022 at 9:17 PM Melanie Plageman wrote: > > On Fri, Feb 25, 2022 at 11:17 PM Amit Kapila wrote: > > > > On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman > > wrote: > > > > > > Since _hash_alloc_buckets() WAL-logs the last page of the > > > splitpoint, is it safe to skip the smgrimmedsync()? What if the last > > > page of the splitpoint doesn't end up having any tuples added to it > > > during the index build and the redo pointer is moved past the WAL for > > > this page and then later there is a crash sometime before this page > > > makes it to permanent storage. Does it matter that this page is lost? If > > > not, then why bother WAL-logging it? > > > > > > > I think we don't care if the page is lost before we update the > > meta-page in the caller because we will try to reallocate in that > > case. But we do care after meta page update (having the updated > > information about this extension via different masks) in which case we > > won't lose this last page because it would have registered the sync > > request for it via sgmrextend before meta page update. > > and could it happen that during smgrextend() for the last page, a > checkpoint starts and finishes between FileWrite() and > register_dirty_segment(), then index build finishes, and then a crash > occurs before another checkpoint completes the pending fsync for that > last page? > Yeah, this seems to be possible and then the problem could be that index's idea and smgr's idea for EOF could be different which could lead to a problem when we try to get a new page via _hash_getnewbuf(). If this theory turns out to be true then probably, we can get an error either because of disk full or the index might request a block that is beyond EOF as determined by RelationGetNumberOfBlocksInFork() in _hash_getnewbuf(). Can we try to reproduce this scenario with the help of a debugger to see if we are missing something? -- With Regards, Amit Kapila.
Re: BufferAlloc: don't take two simultaneous locks
В Пт, 25/02/2022 в 09:01 -0800, Andres Freund пишет: > Hi, > > On 2022-02-25 12:51:22 +0300, Yura Sokolov wrote: > > > > +* The usage_count starts out at 1 so that the buffer can > > > > survive one > > > > +* clock-sweep pass. > > > > +* > > > > +* We use direct atomic OR instead of Lock+Unlock since no > > > > other backend > > > > +* could be interested in the buffer. But StrategyGetBuffer, > > > > +* Flush*Buffers, Drop*Buffers are scanning all buffers and > > > > locks them to > > > > +* compare tag, and UnlockBufHdr does raw write to state. So we > > > > have to > > > > +* spin if we found buffer locked. > > > > > > So basically the first half of of the paragraph is wrong, because no, we > > > can't? > > > > Logically, there are no backends that could be interesting in the buffer. > > Physically they do LockBufHdr/UnlockBufHdr just to check they are not > > interesting. > > Yea, but that's still being interested in the buffer... > > > > > > +* Note that we write tag unlocked. It is also safe since there > > > > is always > > > > +* check for BM_VALID when tag is compared. > > > > */ > > > > buf->tag = newTag; > > > > - buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | > > > > - BM_CHECKPOINT_NEEDED | BM_IO_ERROR | > > > > BM_PERMANENT | > > > > - BUF_USAGECOUNT_MASK); > > > > if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == > > > > INIT_FORKNUM) > > > > - buf_state |= BM_TAG_VALID | BM_PERMANENT | > > > > BUF_USAGECOUNT_ONE; > > > > + new_bits = BM_TAG_VALID | BM_PERMANENT | > > > > BUF_USAGECOUNT_ONE; > > > > else > > > > - buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE; > > > > - > > > > - UnlockBufHdr(buf, buf_state); > > > > + new_bits = BM_TAG_VALID | BUF_USAGECOUNT_ONE; > > > > > > > > - if (oldPartitionLock != NULL) > > > > + buf_state = pg_atomic_fetch_or_u32(&buf->state, new_bits); > > > > + while (unlikely(buf_state & BM_LOCKED)) > > > > > > I don't think it's safe to atomic in arbitrary bits. If somebody else has > > > locked the buffer header in this moment, it'll lead to completely bogus > > > results, because unlocking overwrites concurrently written contents (which > > > there shouldn't be any, but here there are)... > > > > That is why there is safety loop in the case buf->state were locked just > > after first optimistic atomic_fetch_or. 99.999% times this loop will not > > have a job. But in case other backend did lock buf->state, loop waits > > until it releases lock and retry atomic_fetch_or. > > > And or'ing contents in also doesn't make sense because we it doesn't work > > > to > > > actually unset any contents? > > > > Sorry, I didn't understand sentence :(( > > You're OR'ing multiple bits into buf->state. LockBufHdr() only ORs in > BM_LOCKED. ORing BM_LOCKED is fine: > Either the buffer is not already locked, in which case it just sets the > BM_LOCKED bit, acquiring the lock. Or it doesn't change anything, because > BM_LOCKED already was set. > > But OR'ing in multiple bits is *not* fine, because it'll actually change the > contents of ->state while the buffer header is locked. First, both states are valid: before atomic_or and after. Second, there are no checks for buffer->state while buffer header is locked. All LockBufHdr users uses result of LockBufHdr. (I just checked that). > > > Why don't you just use LockBufHdr/UnlockBufHdr? > > > > This pair makes two atomic writes to memory. Two writes are heavier than > > one write in this version (if optimistic case succeed). > > UnlockBufHdr doesn't use a locked atomic op. It uses a write barrier and an > unlocked write. Write barrier is not free on any platform. Well, while I don't see problem with modifying buffer->state, there is problem with modifying buffer->tag: I missed Drop*Buffers doesn't check BM_TAG_VALID flag. Therefore either I had to add this check to those places, or return to LockBufHdr+UnlockBufHdr pair. For patch simplicity I'll return Lock+UnlockBufHdr pair. But it has measurable impact on low connection numbers on many-sockets. > > Greetings, > > Andres Freund
Re: generic plans and "initial" pruning
Hi Andres, On Fri, Feb 11, 2022 at 10:29 AM Andres Freund wrote: > On 2022-02-10 17:13:52 +0900, Amit Langote wrote: > > The attached patch implements this idea. Sorry for the delay in > > getting this out and thanks to Robert for the off-list discussions on > > this. > > I did not follow this thread at all. And I only skimmed the patch. So I'm > probably wrong. Thanks for your interest in this and sorry about the delay in replying (have been away due to illness). > I'm a wary of this increasing executor overhead even in cases it won't > help. Without this patch, for simple queries, I see small allocations > noticeably in profiles. This adds a bunch more, even if > !context->stmt->usesPreExecPruning: Ah, if any new stuff added by the patch runs in !context->stmt->usesPreExecPruning paths, then it's just poor coding on my part, which I'm now looking to fix. Maybe not all of it is avoidable, but I think whatever isn't should be trivial... > - makeNode(ExecPrepContext) > - makeNode(ExecPrepOutput) > - palloc0(sizeof(PlanPrepOutput *) * result->numPlanNodes) > - stmt_execprep_list = lappend(stmt_execprep_list, execprep); > - AllocSetContextCreate(CurrentMemoryContext, > "CachedPlan execprep list", ... > - ... > > That's a lot of extra for something that's already a bottleneck. If all these allocations are limited to the usesPreExecPruning path, IMO, they would amount to trivial overhead compared to what is going to be avoided -- locking say 1000 partitions when only 1 will be scanned. Although, maybe there's a way to code this to have even less overhead than what's in the patch now. -- Amit Langote EDB: http://www.enterprisedb.com
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Hi, On Mon, Feb 28, 2022 at 10:21:23AM +0530, Bharath Rupireddy wrote: > > Another thought for my review comment: > > 1) Can't we use pg_is_in_recovery to determine if it's a restartpoint > > or checkpoint instead of having a new function > > pg_stat_get_progress_checkpoint_type? > > I don't think using pg_is_in_recovery work here as it is taken after > the checkpoint has started. So, I think the right way here is to send > 1 in CreateCheckPoint and 2 in CreateRestartPoint and use > CASE-WHEN-ELSE-END to show "1": "checkpoint" "2":"restartpoint". I suggested upthread to store the starting timeline instead. This way you can deduce whether it's a restartpoint or a checkpoint, but you can also deduce other information, like what was the starting WAL. > 11) I think it's discussed, are we going to add the pid of the > checkpoint requestor? As mentioned upthread, there can be multiple backends that request a checkpoint, so unless we want to store an array of pid we should store a number of backend that are waiting for a new checkpoint.
Re: parse/analyze API refactoring
You set this commit fest entry to Waiting on Author, but there were no reviews posted and the patch still applies and builds AFAICT, so I don't know what you meant by that. On 13.01.22 00:49, Bossart, Nathan wrote: On 12/28/21, 8:25 AM, "Peter Eisentraut" wrote: (The "withcb" naming maybe isn't great; better ideas welcome.) FWIW I immediately understood that this meant "with callback," so it might be okay. Not included in this patch set, but food for further thought: The pg_analyze_and_rewrite_*() functions aren't all that useful (anymore). One might as well write pg_rewrite_query(parse_analyze_xxx(...)) I had a similar thought while reading through the patches. If further deduplication isn't too much trouble, I'd vote for that. Nathan
Re: support for MERGE
On Sun, Feb 27, 2022 at 09:17:13PM +0100, Daniel Gustafsson wrote: > > On 27 Feb 2022, at 18:42, Tom Lane wrote: > > > I'd rather keep all the ModifyTable code in one .c file, even if that one is > > bigger than our usual practice. > > Agreed, I also prefer a (too) large file over a set of .c #include's. +1. Also, if a split is really needed isn't our usual approach to create some *_private.h files so that third-party code is aware that this is in no way an API meant for them?
RE: Logical replication timeout problem
On Wed, Feb 22, 2022 at 4:56 PM Amit Kapila wrote: > Thanks for your review. > On Tue, Feb 22, 2022 at 9:17 AM wangw.f...@fujitsu.com > wrote: > > > > On Fri, Feb 18, 2022 at 10:51 AM Ajin Cherian wrote: > > > Some comments: > > Thanks for your review. > > > > > I see you only track skipped Inserts/Updates and Deletes. What about > > > DDL operations that are skipped, what about truncate. > > > What about changes made to unpublished tables? I wonder if you could > > > create a test script that only did DDL operations > > > and truncates, would this timeout happen? > > According to your suggestion, I tested with DDL and truncate. > > While testing, I ran only 20,000 DDLs and 10,000 truncations in one > > transaction. > > If I set wal_sender_timeout and wal_receiver_timeout to 30s, it will time > > out. > > And if I use the default values, it will not time out. > > IMHO there should not be long transactions that only contain DDL and > > truncation. I'm not quite sure, do we need to handle this kind of use case? > > > > I think it is better to handle such cases as well and changes related > to unpublished tables as well. BTW, it seems Kuroda-San has also given > some comments [1] which I am not sure are addressed. Add handling of related use cases. > I think instead of keeping the skipping threshold w.r.t > wal_sender_timeout, we can use some conservative number like 1 or > so which we are sure won't impact performance and won't lead to > timeouts. Yes, it would be better. Set the threshold conservatively to 1. > * > + /* > + * skipped_changes_count is reset when processing changes that do not need > to > + * be skipped. > + */ > + skipped_changes_count = 0 > > When the skipped_changes_count is reset, the sendTime should also be > reset. Can we reset it whenever the UpdateProgress function is called > with send_keep_alive as false? Fixed. Attached a new patch that addresses following improvements I have got so far as comments: 1. Consider other changes that need to be skipped(truncate, DDL and function calls pg_logical_emit_message). [suggestion by Ajin, Amit] (Add a new function SendKeepaliveIfNecessary for trying to send keepalive message.) 2. Set the threshold conservatively to a static value of 1.[suggestion by Amit, Kuroda-San] 3. Reset sendTime in function WalSndUpdateProgress when send_keep_alive is false. [suggestion by Amit] Regards, Wang wei 0001-Fix-the-timeout-of-subscriber-in-long-transactions.patch Description: 0001-Fix-the-timeout-of-subscriber-in-long-transactions.patch
RE: Logical replication timeout problem
On Thur, Feb 24, 2022 at 4:06 PM Kuroda, Hayato/黒田 隼人 wrote: > Dear Wang, Thanks for your review. > > According to our discussion, we need to send keepalive messages to > > subscriber when skipping changes. > > One approach is that **for each skipped change**, we try to send > > keepalive message by calculating whether a timeout will occur based on > > the current time and the last time the keepalive was sent. But this will > > brings > slight overhead. > > So I want to try another approach: after **constantly skipping some > > changes**, we try to send keepalive message by calculating whether a > > timeout will occur based on the current time and the last time the keepalive > was sent. > > You meant that calling system calls like GetCurrentTimestamp() should be > reduced, right? I'm not sure how it affects but it seems reasonable. Yes. There is no need to invoke frequently, and it will bring overhead. > > IMO, we should send keepalive message after skipping a certain number > > of changes constantly. > > And I want to calculate the threshold dynamically by using a fixed > > value to avoid adding too much code. > > In addition, different users have different machine performance, and > > users can modify wal_sender_timeout, so the threshold should be > > dynamically calculated according to wal_sender_timeout. > > Your experiment seems not bad, but the background cannot be understand > from code comments. I prefer a static threshold because it's more simple, > which > as Amit said in the following too: > > https://www.postgresql.org/message-id/CAA4eK1%2B- > p_K_j%3DNiGGD6tCYXiJH0ypT4REX5PBKJ4AcUoF3gZQ%40mail.gmail.com Yes, you are right. Fixed. And I set the threshold to 1. > BTW, this patch cannot be applied to current master. Thanks for reminder. Rebase it. Kindly have a look at new patch shared in [1]. [1] https://www.postgresql.org/message-id/OS3PR01MB6275FEB9F83081F1C87539B99E019%40OS3PR01MB6275.jpnprd01.prod.outlook.com Regards, Wang wei
RE: Design of pg_stat_subscription_workers vs pgstats
On Mon, Feb 28, 2022 12:32 PM Amit Kapila wrote: > > On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada > wrote: > > > > On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila > wrote: > > > > > > > > > > > (2) doc/src/sgml/monitoring.sgml > > > > > > > > +Resets statistics for a single subscription shown in the > > > > +pg_stat_subscription_stats view to > > > > zero. If > > > > +the argument is NULL, reset statistics for > > > > all > > > > +subscriptions. > > > > > > > > > > > > I felt we could improve the first sentence. > > > > > > > > From: > > > > Resets statistics for a single subscription shown in the.. > > > > > > > > To(idea1): > > > > Resets statistics for a single subscription defined by the argument to > > > > zero. > > > > > > > > > > Okay, I can use this one. > > > > Are you going to remove the part "shown in the > > pg_stat_subsctiption_stats view"? I think it's better to keep it in > > order to make it clear which statistics the function resets as we have > > pg_stat_subscription and pg_stat_subscription_stats. > > > > I decided to keep this part of the docs as it is and fixed a few other > minor comments raised by you and Peter. Additionally, I have bumped > the PGSTAT_FILE_FORMAT_ID. I'll push this patch tomorrow unless there > are any other major comments. > Thanks for your patch. I have finished the review/test for this patch. The patch LGTM. Regards, Tang
Re: make tuplestore helper function
On Thu, Feb 24, 2022 at 08:25:06PM +0900, Michael Paquier wrote: > This is the remaining piece, as attached, that I have not been able to > poke much at yet. So, I have finally poked at this last part of the patch set, and I found that we can be more aggressive with the refactoring, by moving into MakeFuncResultTuplestore() the parts where we save the tuplestore and the tupledesc in the per-query memory context. There are two pieces that matter once things are reshaped: - The tuple descriptor may need some extra validation via BlessTupleDesc() when it comes from a transient record datatype, something that happens for most of the subroutines related to the JSON functions. - expectedDesc is sometimes required by the caller, though most of the time just needs to be built with the more expensive get_call_result_type(). In order to keep things pluggable at will, MakeFuncResultTuplestore() has been changed to access a set of bits32 flags, able to control the two options above. With this facility in place, I have been able to cut much more code than the initial patch, roughly twice as of: 24 files changed, 157 insertions(+), 893 deletions(-) This seems in rather good shape to me, the changes are straight-forward and the code cut is really good, so I'd like to move on with that. 0001 is the initial patch, and 0002 is the extra refactoring I have been working on. The plan would be to merge both, but I am sending a split to ease any checks on what I have changed. Comments or objections? -- Michael From d45e9a7031017e13e5429ff985b36ddafdfdb443 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 24 Feb 2022 17:27:55 +0900 Subject: [PATCH v9 1/2] Introduce MakeFuncResultTuplestore() This is the main patch from Melanie, that I have tweaked a bit. Note that I am not completely done with it ;p --- src/include/funcapi.h | 2 + src/backend/access/transam/xlogfuncs.c| 16 +--- src/backend/commands/event_trigger.c | 32 +--- src/backend/commands/extension.c | 48 +--- src/backend/commands/prepare.c| 17 + src/backend/foreign/foreign.c | 14 +--- src/backend/libpq/hba.c | 19 + src/backend/replication/logical/launcher.c| 16 +--- .../replication/logical/logicalfuncs.c| 16 +--- src/backend/replication/logical/origin.c | 19 + src/backend/replication/slotfuncs.c | 16 +--- src/backend/replication/walsender.c | 16 +--- src/backend/storage/ipc/shmem.c | 16 +--- src/backend/utils/adt/datetime.c | 17 + src/backend/utils/adt/genfile.c | 31 +--- src/backend/utils/adt/jsonfuncs.c | 73 +++ src/backend/utils/adt/mcxtfuncs.c | 16 +--- src/backend/utils/adt/misc.c | 14 +--- src/backend/utils/adt/pgstatfuncs.c | 48 +--- src/backend/utils/adt/varlena.c | 14 +--- src/backend/utils/fmgr/funcapi.c | 38 ++ src/backend/utils/misc/guc.c | 15 +--- src/backend/utils/misc/pg_config.c| 13 +--- src/backend/utils/mmgr/portalmem.c| 17 + 24 files changed, 82 insertions(+), 461 deletions(-) diff --git a/src/include/funcapi.h b/src/include/funcapi.h index ba927c2f33..b9f9e92d1a 100644 --- a/src/include/funcapi.h +++ b/src/include/funcapi.h @@ -229,6 +229,8 @@ extern TupleDesc BlessTupleDesc(TupleDesc tupdesc); extern AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc); extern HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values); extern Datum HeapTupleHeaderGetDatum(HeapTupleHeader tuple); +extern Tuplestorestate *MakeFuncResultTuplestore(FunctionCallInfo fcinfo, + TupleDesc *result_tupdesc); /*-- diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 12e2bf4135..2fc1ed023c 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -178,24 +178,10 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS) XLogRecPtr stoppoint; SessionBackupState status = get_backup_status(); - /* check to see if caller supports us returning a tuplestore */ - if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) - ereport(ERROR, -(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("set-valued function called in context that cannot accept a set"))); - if (!(rsinfo->allowedModes & SFRM_Materialize)) - ereport(ERROR, -(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("materialize mode required, but it is not allowed in this context"))); - - /* Build a tuple descriptor for our result type */ - if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) - elog(ERROR, "return type must be a row type"); - per_query_ctx = rsinfo->econtext->ecxt_per_query_memory; oldcontext = MemoryContextSwitchTo(per_query_ctx); - tupstore = tuplestore_b