Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-03-01 Thread Kyotaro Horiguchi
At Thu, 20 Jan 2022 17:19:04 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 20 Jan 2022 15:07:22 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> CI now likes this version for all platforms.

An xlog.c refactoring happend recently hit this.
Just rebased on the change.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 35958b17c62cd14f81efa26a097c32c273028f77 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 11 Nov 2021 20:42:00 +0900
Subject: [PATCH v16 1/3] Add tablespace support to TAP framework

TAP framework doesn't support nodes that have tablespaces.  Especially
backup and initialization from backups failed if the source node has
tablespaces.  This commit provides simple way to create tablespace
directories and allows backup routines to handle tablespaces.
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 264 ++-
 src/test/perl/PostgreSQL/Test/Utils.pm   |  43 
 2 files changed, 305 insertions(+), 2 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index be05845248..15d57b9a71 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -298,6 +298,64 @@ sub archive_dir
 
 =pod
 
+=item $node->tablespace_storage([, nocreate])
+
+Diretory to store tablespace directories.
+If nocreate is true, returns undef if not yet created.
+
+=cut
+
+sub tablespace_storage
+{
+	my ($self, $nocreate) = @_;
+
+	if (!defined $self->{_tsproot})
+	{
+		# tablespace is not used, return undef if nocreate is specified.
+		return undef if ($nocreate);
+
+		# create and remember the tablespae root directotry.
+		$self->{_tsproot} = PostgreSQL::Test::Utils::tempdir_short();
+	}
+
+	return $self->{_tsproot};
+}
+
+=pod
+
+=item $node->tablespaces()
+
+Returns a hash from tablespace OID to tablespace directory name.  For
+example, an oid 16384 pointing to /tmp/jWAhkT_fs0/ts1 is stored as
+$hash{16384} = "ts1".
+
+=cut
+
+sub tablespaces
+{
+	my ($self) = @_;
+	my $pg_tblspc = $self->data_dir . '/pg_tblspc';
+	my %ret;
+
+	# return undef if no tablespace is used
+	return undef if (!defined $self->tablespace_storage(1));
+
+	# collect tablespace entries in pg_tblspc directory
+	opendir(my $dir, $pg_tblspc);
+	while (my $oid = readdir($dir))
+	{
+		next if ($oid !~ /^([0-9]+)$/);
+		my $linkpath = "$pg_tblspc/$oid";
+		my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath);
+		$ret{$oid} = File::Basename::basename($tsppath);
+	}
+	closedir($dir);
+
+	return %ret;
+}
+
+=pod
+
 =item $node->backup_dir()
 
 The output path for backups taken with $node->backup()
@@ -313,6 +371,77 @@ sub backup_dir
 
 =pod
 
+=item $node->backup_tablespace_storage_path(backup_name)
+
+Returns tablespace location path for backup_name.
+Retuns the parent directory if backup_name is not given.
+
+=cut
+
+sub backup_tablespace_storage_path
+{
+	my ($self, $backup_name) = @_;
+	my $dir = $self->backup_dir . '/__tsps';
+
+	$dir .= "/$backup_name" if (defined $backup_name);
+
+	return $dir;
+}
+
+=pod
+
+=item $node->backup_create_tablespace_storage(backup_name)
+
+Create tablespace location directory for backup_name if not yet.
+Create the parent tablespace storage that holds all location
+directories if backup_name is not supplied.
+
+=cut
+
+sub backup_create_tablespace_storage
+{
+	my ($self, $backup_name) = @_;
+	my $dir = $self->backup_tablespace_storage_path($backup_name);
+
+	File::Path::make_path $dir if (! -d $dir);
+}
+
+=pod
+
+=item $node->backup_tablespaces(backup_name)
+
+Returns a reference to hash from tablespace OID to tablespace
+directory name of tablespace directory that the specified backup has.
+For example, an oid 16384 pointing to ../tsps/backup1/ts1 is stored as
+$hash{16384} = "ts1".
+
+=cut
+
+sub backup_tablespaces
+{
+	my ($self, $backup_name) = @_;
+	my $pg_tblspc = $self->backup_dir . '/' . $backup_name . '/pg_tblspc';
+	my %ret;
+
+	#return undef if this backup holds no tablespaces
+	return undef if (! -d $self->backup_tablespace_storage_path($backup_name));
+
+	# scan pg_tblspc directory of the backup
+	opendir(my $dir, $pg_tblspc);
+	while (my $oid = readdir($dir))
+	{
+		next if ($oid !~ /^([0-9]+)$/);
+		my $linkpath = "$pg_tblspc/$oid";
+		my $tsppath = PostgreSQL::Test::Utils::dir_readlink($linkpath);
+		$ret{$oid} = File::Basename::basename($tsppath);
+	}
+	closedir($dir);
+
+	return \%ret;
+}
+
+=pod
+
 =item $node->install_path()
 
 The configured install path (if any) for the node.
@@ -370,6 +499,7 @@ sub info
 	print $fh "Data directory: " . $self->data_dir . "\n";
 	print $fh "Backup directory: " . $self->backup_dir . "\n";
 	print $fh "Archive directory: " . $self->archive_dir . "\n";
+	print $fh "Tablespace directory: " . $self->tablespace_storage . "\n";
 	print $fh "Connection string: " . $self->connstr . "\n";
 	print $fh "Log file: " . $self->logfile . "\n";
 	print $fh "Install Path: ", $self->{_install_path} . "\n"
@@ -600,6 +730,43 @@ sub 

Re: Design of pg_stat_subscription_workers vs pgstats

2022-03-01 Thread Andres Freund
Hi,

On 2022-03-02 12:39:57 +0530, Amit Kapila wrote:
> On Wed, Mar 2, 2022 at 10:39 AM Andres Freund  wrote:
> >
> > Working on rebasing shared memory stats over this. Feels *much* better so 
> > far.
> >
> 
> Good to hear that it helps. BTW, there is another patch [1] that
> extends this view. I think that patch is still not ready but once it
> is ready (which I expect to happen sometime in this CF), it might be
> good if you would be able to check whether it has any major problem
> with your integration.

I skimmed it briefly, and I don't see an architectural conflict. I'm not
convinced it's worth adding that information, but that's a separate
discussion.


> > While rebasing, I was wondering why pgstat_reset_subscription_counter() has
> > "all subscription counters" support. We don't have a function to reset all
> > function stats or such either.
> >
> 
> We have similar thing for srlu (pg_stat_reset_slru) and slots
> (pg_stat_reset_replication_slot).

Neither should have been added imo. We're already at 9 different reset
functions. Without a unified function to reset all stats, pretty much the only
actually relevant operation. And having pg_stat_reset_shared() with variable
'reset' systems but then also pg_stat_reset_slru() and
pg_stat_reset_subscription_stats() is absurd.

This is just making something incomprehensible evermore incomprehensible.


> For functions and tables, one can use pg_stat_reset.

Not in isolation.

Greetings,

Andres Freund




pg_stop_backup() v2 incorrectly marked as proretset

2022-03-01 Thread Michael Paquier
Hi all,

In my hunt looking for incorrect SRFs, I have noticed a new case of a
system function marked as proretset while it builds and returns only
one record.  And this is a popular one: pg_stop_backup(), labelled
v2.

This leads to a lot of unnecessary work, as the function creates a
tuplestore it has no need for with the usual set of checks related to
SRFs.  The logic can be be simplified as of the attached.

Thoughts?
--
Michael
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index bf88858171..d8e8715ed1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6275,9 +6275,9 @@
   proname => 'pg_stop_backup', provolatile => 'v', proparallel => 'r',
   prorettype => 'pg_lsn', proargtypes => '', prosrc => 'pg_stop_backup' },
 { oid => '2739', descr => 'finish taking an online backup',
-  proname => 'pg_stop_backup', prorows => '1', proretset => 't',
-  provolatile => 'v', proparallel => 'r', prorettype => 'record',
-  proargtypes => 'bool bool', proallargtypes => '{bool,bool,pg_lsn,text,text}',
+  proname => 'pg_stop_backup', provolatile => 'v', proparallel => 'r',
+  prorettype => 'record', proargtypes => 'bool bool',
+  proallargtypes => '{bool,bool,pg_lsn,text,text}',
   proargmodes => '{i,i,o,o,o}',
   proargnames => '{exclusive,wait_for_archive,lsn,labelfile,spcmapfile}',
   prosrc => 'pg_stop_backup_v2' },
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 12e2bf4135..2f292e2bd8 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -165,11 +165,9 @@ pg_stop_backup(PG_FUNCTION_ARGS)
 Datum
 pg_stop_backup_v2(PG_FUNCTION_ARGS)
 {
+#define PG_STOP_BACKUP_V2_COLS 3
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	TupleDesc	tupdesc;
-	Tuplestorestate *tupstore;
-	MemoryContext per_query_ctx;
-	MemoryContext oldcontext;
 	Datum		values[3];
 	bool		nulls[3];
 
@@ -178,29 +176,15 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 	XLogRecPtr	stoppoint;
 	SessionBackupState status = get_backup_status();
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
-	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
-	oldcontext = MemoryContextSwitchTo(per_query_ctx);
-
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = tupstore;
-	rsinfo->setDesc = tupdesc;
-
-	MemoryContextSwitchTo(oldcontext);
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STOP_BACKUP_V2_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "lsn",
+	   PG_LSNOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "labelfile",
+	   TEXTOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "spcmapfile",
+	   TEXTOID, -1, 0);
+	BlessTupleDesc(tupdesc);
 
 	MemSet(values, 0, sizeof(values));
 	MemSet(nulls, 0, sizeof(nulls));
@@ -251,9 +235,8 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 	/* Stoppoint is included on both exclusive and nonexclusive backups */
 	values[0] = LSNGetDatum(stoppoint);
 
-	tuplestore_putvalues(tupstore, tupdesc, values, nulls);
-
-	return (Datum) 0;
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
 /*
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 758ab6e25a..81bac6f581 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -384,7 +384,7 @@ CREATE OR REPLACE FUNCTION
 CREATE OR REPLACE FUNCTION pg_stop_backup (
 exclusive boolean, wait_for_archive boolean DEFAULT true,
 OUT lsn pg_lsn, OUT labelfile text, OUT spcmapfile text)
-  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
+  RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
 CREATE OR REPLACE FUNCTION


signature.asc
Description: PGP signature


Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Masahiko Sawada
On Wed, Mar 2, 2022 at 4:14 PM Amit Kapila  wrote:
>
> On Wed, Mar 2, 2022 at 9:33 AM Masahiko Sawada  wrote:
> >
> > On Wed, Mar 2, 2022 at 12:21 PM Amit Kapila  wrote:
> > >
> > > On Wed, Mar 2, 2022 at 8:25 AM Peter Smith  wrote:
> > > >
> > > > On Mon, Feb 28, 2022 at 11:16 PM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > > > The errcontext message would become like follows:
> > > > >
> > > > > *Before
> > > > > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > > > > DETAIL:  Key (c)=(1) already exists.
> > > > > CONTEXT:  processing remote data during "INSERT" for replication
> > > > > target relation "public.test" in transaction 726 at 2022-02-28
> > > > > 20:59:56.005909+09
> > > > >
> > > > > * After
> > > > > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > > > > DETAIL:  Key (c)=(1) already exists.
> > > > > CONTEXT:  processing remote data during "INSERT" for replication
> > > > > target relation "public.test" in transaction 726 committed at LSN
> > > > > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from replication
> > > > > origin "pg_16395"
> > > > >
> > > > > I'm a bit concerned that the message may be too long.
> > > >
> > > > If you are willing to use abbreviations instead of full
> > > > words/sentences perhaps you can shorten the long CONTEXT part like
> > > > below?
> > > >
> > > > Before:
> > > > CONTEXT:  processing remote data during "INSERT" for replication
> > > > target relation "public.test" in transaction 726 committed at LSN
> > > > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from
> > > > replication origin "pg_16395"
> > > >
> > > > After:
> > > > CONTEXT: processing remote data during "INSERT" for replication target
> > > > relation "public.test" (txid 726, LSN 0/14BFA88, ts 2022-02-28
> > > > 20:58:27.964238+09, origin "pg_16395")
> > > >
> > >
> > > I am wondering whether we can avoid having a timestamp in the message?
> > > If one wants, it can be retrieved from the errors otherwise as well.
> > > For example, I see the below error in my machine:
> > > 2022-02-26 07:45:25.092 IST [17644] ERROR:  duplicate key value
> > > violates unique constraint "t1_pkey"
> > > 2022-02-26 07:45:25.092 IST [17644] DETAIL:  Key (c1)=(1) already exists.
> > > 2022-02-26 07:45:25.092 IST [17644] CONTEXT:  processing remote data
> > > during "INSERT" for replication target relation "public.t1" in
> > > transaction 724 at 2022-02-26 07:45:09.083848+05:30
> > >
> > > Now, here, won't the time at the starting of CONTEXT serves our
> > > purpose. If we can remove the timestamp, I think the message won't
> > > appear too long. What do you think?
> >
> > The time of the CONTEXT log message and the time in the message would
> > largely vary when the subscriber is much behind the publisher. But I
> > basically agree that the timestamp in the message might not be
> > important, at least for now. If we will support conflict resolution
> > that resolves based on the commit timestamp in the future, we might
> > want it again.
> >
>
> Possible, but let's remove it for now as it will simplify the message
> and the need for additional branches. What do you think?

Agreed.

I've attached updated patches.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v2-0001-Use-complete-sentences-in-logical-replication-wor.patch
Description: Binary data


v2-0002-Add-the-origin-name-and-remote-commit-LSN-to-logi.patch
Description: Binary data


Re: CREATE INDEX CONCURRENTLY on partitioned index

2022-03-01 Thread Alexander Pyhalov

Hi.

I've added 0005-Mark-intermediate-partitioned-indexes-as-valid.patch 
which fixed the following issues - when partitioned index is created, 
indexes on intermediate partitioned tables were preserved in invalid 
state. Also added some more tests.

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom 18fa3c27a3311294a7abfdc0674ef6143c65423b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 7 Feb 2022 10:28:42 +0300
Subject: [PATCH 1/5] Allow CREATE INDEX CONCURRENTLY on partitioned table

0001-Allow-CREATE-INDEX-CONCURRENTLY-on-partitioned-table.patch from
https://www.postgresql.org/message-id/20210226182019.gu20...@telsasoft.com

Fixes:
  - rel was used after table_close();
  - it seems childidxs shouldn't live in ind_context;
  - updated doc.
---
 doc/src/sgml/ref/create_index.sgml |  14 +--
 src/backend/commands/indexcmds.c   | 151 ++---
 src/test/regress/expected/indexing.out |  60 +-
 src/test/regress/sql/indexing.sql  |  18 ++-
 4 files changed, 186 insertions(+), 57 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 91eaaabc90f..ffa98692430 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -641,7 +641,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 If a problem arises while scanning the table, such as a deadlock or a
 uniqueness violation in a unique index, the CREATE INDEX
-command will fail but leave behind an invalid index. This index
+command will fail but leave behind an invalid index.
+If this happens while creating index concurrently on a partitioned
+table, the command can also leave behind valid or
+invalid indexes on table partitions.  The invalid index
 will be ignored for querying purposes because it might be incomplete;
 however it will still consume update overhead. The psql
 \d command will report such an index as INVALID:
@@ -688,15 +691,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index cd30f15eba6..a34a1b133a0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static void reindex_invalid_child_indexes(Oid indexRelationId);
 static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
@@ -670,17 +671,6 @@ DefineIndex(Oid relationId,
 	partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
 	if (partitioned)
 	{
-		/*
-		 * Note: we check 'stmt->concurrent' rather than 'concurrent', so that
-		 * the error is thrown also for temporary tables.  Seems better to be
-		 * consistent, even though we could do it on temporary table because
-		 * we're not actually doing it concurrently.
-		 */
-		if (stmt->concurrent)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot create index on partitioned table \"%s\" concurrently",
-			RelationGetRelationName(rel;
 		if (stmt->excludeOpNames)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1119,6 +1109,11 @@ DefineIndex(Oid relationId,
 		if (pd->nparts != 0)
 			flags |= INDEX_CREATE_INVALID;
 	}
+	else if (concurrent && OidIsValid(parentIndexId))
+	{
+		/* If concurrent, initially build index partitions as "invalid" */
+		flags |= INDEX_CREATE_INVALID;
+	}
 
 	if (stmt->deferrable)
 		constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE;
@@ -1174,18 +1169,30 @@ DefineIndex(Oid relationId,
 		partdesc = RelationGetPartitionDesc(rel, true);
 		if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0)
 		{
+			/*
+			 * Need to close the relation before recursing into children, so
+			 * copy needed data into a longlived context.
+			 */
+
+			MemoryContext	ind_context = AllocSetContextCreate(PortalContext, "CREATE INDEX",
+	ALLOCSET_DEFAULT_SIZES);
+			MemoryContext	oldcontext = MemoryContextSwitchTo(ind_context);
 			int			nparts = partdesc->nparts;
 			Oid		   *part_oids = palloc(sizeof(Oid) * nparts);
 			bool		invalidate_parent = false;
 			TupleDesc	parentDesc;
 			Oid		   *opfamOids;
+			char		*relname;
 
 			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
 		 nparts);
 
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
+			parentDesc = CreateTupleDescCopy(RelationGetDescr(rel));
+			relname = 

Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-03-01 Thread Michael Paquier
On Wed, Mar 02, 2022 at 07:42:50AM +0530, Amit Kapila wrote:
> This is done as part of commit:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7a85073290856554416353a89799a4c04d09b74b

Thanks for taking care of it!
--
Michael


signature.asc
Description: PGP signature


Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Amit Kapila
On Wed, Mar 2, 2022 at 9:33 AM Masahiko Sawada  wrote:
>
> On Wed, Mar 2, 2022 at 12:21 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 2, 2022 at 8:25 AM Peter Smith  wrote:
> > >
> > > On Mon, Feb 28, 2022 at 11:16 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > The errcontext message would become like follows:
> > > >
> > > > *Before
> > > > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > > > DETAIL:  Key (c)=(1) already exists.
> > > > CONTEXT:  processing remote data during "INSERT" for replication
> > > > target relation "public.test" in transaction 726 at 2022-02-28
> > > > 20:59:56.005909+09
> > > >
> > > > * After
> > > > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > > > DETAIL:  Key (c)=(1) already exists.
> > > > CONTEXT:  processing remote data during "INSERT" for replication
> > > > target relation "public.test" in transaction 726 committed at LSN
> > > > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from replication
> > > > origin "pg_16395"
> > > >
> > > > I'm a bit concerned that the message may be too long.
> > >
> > > If you are willing to use abbreviations instead of full
> > > words/sentences perhaps you can shorten the long CONTEXT part like
> > > below?
> > >
> > > Before:
> > > CONTEXT:  processing remote data during "INSERT" for replication
> > > target relation "public.test" in transaction 726 committed at LSN
> > > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from
> > > replication origin "pg_16395"
> > >
> > > After:
> > > CONTEXT: processing remote data during "INSERT" for replication target
> > > relation "public.test" (txid 726, LSN 0/14BFA88, ts 2022-02-28
> > > 20:58:27.964238+09, origin "pg_16395")
> > >
> >
> > I am wondering whether we can avoid having a timestamp in the message?
> > If one wants, it can be retrieved from the errors otherwise as well.
> > For example, I see the below error in my machine:
> > 2022-02-26 07:45:25.092 IST [17644] ERROR:  duplicate key value
> > violates unique constraint "t1_pkey"
> > 2022-02-26 07:45:25.092 IST [17644] DETAIL:  Key (c1)=(1) already exists.
> > 2022-02-26 07:45:25.092 IST [17644] CONTEXT:  processing remote data
> > during "INSERT" for replication target relation "public.t1" in
> > transaction 724 at 2022-02-26 07:45:09.083848+05:30
> >
> > Now, here, won't the time at the starting of CONTEXT serves our
> > purpose. If we can remove the timestamp, I think the message won't
> > appear too long. What do you think?
>
> The time of the CONTEXT log message and the time in the message would
> largely vary when the subscriber is much behind the publisher. But I
> basically agree that the timestamp in the message might not be
> important, at least for now. If we will support conflict resolution
> that resolves based on the commit timestamp in the future, we might
> want it again.
>

Possible, but let's remove it for now as it will simplify the message
and the need for additional branches. What do you think?

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-03-01 Thread Amit Kapila
On Wed, Mar 2, 2022 at 10:39 AM Andres Freund  wrote:
>
> Working on rebasing shared memory stats over this. Feels *much* better so far.
>

Good to hear that it helps. BTW, there is another patch [1] that
extends this view. I think that patch is still not ready but once it
is ready (which I expect to happen sometime in this CF), it might be
good if you would be able to check whether it has any major problem
with your integration.

> While rebasing, I was wondering why pgstat_reset_subscription_counter() has
> "all subscription counters" support. We don't have a function to reset all
> function stats or such either.
>

We have similar thing for srlu (pg_stat_reset_slru) and slots
(pg_stat_reset_replication_slot). For functions and tables, one can
use pg_stat_reset. Similarly, we have pg_stat_reset_shared() which
reset stats like WAL. This matches more with slru/slots, so we
providied it via pg_stat_reset_subscription_stats.

[1] - 
https://www.postgresql.org/message-id/TYWPR01MB8362B30A904274A911C0B1CCED039%40TYWPR01MB8362.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-01 Thread Michael Paquier
On Tue, Mar 01, 2022 at 10:03:20PM +, Jacob Champion wrote:
> Added a first draft in v5, alongside the perltidy fixups mentioned by
> Michael.

+The authenticated identity is an immutable identifier for the user
+presented during the connection handshake; the exact format depends on
+the authentication method in use. (For example, when using the
+scram-sha-256 auth method, the authenticated identity
+is simply the username. When using the cert auth
+method, the authenticated identity is the Distinguished Name of the
+client certificate.) Even for auth methods which use the username as
+the authenticated identity, this function differs from
+session_user in that its return value cannot be
+changed after login.

That looks enough seen from here.  Thanks!

Nit: "auth method" would be a first in the documentation, so this had
better be "authentication method".  (No need to send an updated patch
just for that).

So, any comments and/or opinions from others?
--
Michael


signature.asc
Description: PGP signature


Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Kyotaro Horiguchi
At Wed, 2 Mar 2022 14:39:54 +0900, Masahiko Sawada  
wrote in 
> On Wed, Mar 2, 2022 at 11:55 AM Amit Kapila  wrote:
> >
> > On Mon, Feb 28, 2022 at 5:46 PM Masahiko Sawada  
> > wrote:
> > >
> > > I've attached two patches: the first one changes
> > > apply_error_callback() so that it uses complete sentences with if-else
> > > blocks in order to have a translation work,
> > >
> >
> > This is an improvement over what we have now but I think this is still
> > not completely correct as per message translation rules:
> > + else
> > + errcontext("processing remote data during \"%s\" in transaction %u at %s",
> > +logicalrep_message_type(errarg->command),
> > +errarg->remote_xid,
> > +(errarg->ts != 0) ? timestamptz_to_str(errarg->ts) : "(not-set)");
> >
> > As per guidelines [1][2], we don't prefer to construct messages at
> > run-time aka we can do better for the following part: +(errarg->ts
> > != 0) ? timestamptz_to_str(errarg->ts) : "(not-set)". I think we need
> > to use if-else here to split it further. If you agree, then the same
> > needs to be dealt with in other parts of the patch as well.
> 
> I intended to use "(not-set)" as a value rather than a word in the
> sentence so I think it doesn't violate the guidelines. We can split it
> further as you suggested but we will end up having more if-else
> branches.

It seems to me exactly our way.  In the first place I doubt
"(not-set)" fits the place for timestamp even in English.

Moreover, if we (I?) translated the message into Japanese, it would
look like;

CONTEXT: (not-set)にトランザクション 2352314 内で "TRUNCATE" でのリモートデータの処理中

I don't think that looks fine.  Translating "(not-set)" makes things
even worse.

CONTEXT: (非設定)にトランザクション 2352314 内で "TRUNCATE" でのリモートデータの処理中

Yes, I can alleviate that strangeness a bit by modulating it, but it
still looks odd.

CONTEXT: 時刻(非設定)、トランザクション 2352314 内で "TRUNCATE" でのリモートデータの処理中

Rather, I'd prefer simply to list the attributes.

CONTEXT: processing remote data during "MESSAGE". Transaction (unknown). Time 
(unknown), replication target relation (unknown), column (unknown)
CONTEXT: "MESSAGE"でのリモートデータの処理中. トランザクション (不明). 時刻 (不明), レプリケーション対象リレーション (不明), 
column (不明)


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-03-01 Thread Michael Paquier
On Wed, Feb 16, 2022 at 01:58:10PM +0900, Michael Paquier wrote:
> I have been looking at how much simplicity this brings, and I have to
> admit that it is tempting to just support the loading of dumps when
> setting up the old instance to upgrade from.  We'd still need to do an
> extra effort in terms of cleaning up the diffs for the dump of the old
> instance with older versions once/if this is plugged into the
> buildfarm, but that could be addressed later depending on the versions
> that need to be covered.

The bug related to the detection of Windows and temporary paths for
pg_upgrade's server.c has been fixed as of dc57366, so attached is the
remaining rebased piece as perl2host has been recently removed.

Do others have an opinion about a backpatch of the bugfix?  Nobody has
complained about that since pg_upgrade exists, so I have just done the
change on HEAD.
--
Michael
From 66b6961866d215344aa836963e5bdbfb237ed606 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 2 Mar 2022 15:55:32 +0900
Subject: [PATCH v10] Switch tests of pg_upgrade to use TAP

---
 src/bin/pg_upgrade/Makefile|  21 +-
 src/bin/pg_upgrade/TESTING |  85 ++--
 src/bin/pg_upgrade/t/001_basic.pl  |   9 +
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 251 ++
 src/bin/pg_upgrade/test.sh | 279 -
 src/tools/msvc/vcregress.pl|  92 +---
 6 files changed, 290 insertions(+), 447 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/001_basic.pl
 create mode 100644 src/bin/pg_upgrade/t/002_pg_upgrade.pl
 delete mode 100644 src/bin/pg_upgrade/test.sh

diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 49b94f0ac7..1f5d757548 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -28,6 +28,10 @@ OBJS = \
 override CPPFLAGS := -DDLSUFFIX=\"$(DLSUFFIX)\" -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
+# required for 002_pg_upgrade.pl
+REGRESS_SHLIB=$(abs_top_builddir)/src/test/regress/regress$(DLSUFFIX)
+export REGRESS_SHLIB
+
 all: pg_upgrade
 
 pg_upgrade: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
@@ -47,17 +51,8 @@ clean distclean maintainer-clean:
 	rm -rf delete_old_cluster.sh log/ tmp_check/ \
 	   reindex_hash.sql
 
-# When $(MAKE) is present, make automatically infers that this is a
-# recursive make. which is not actually what we want here, as that
-# e.g. prevents output synchronization from working (as make thinks
-# that the subsidiary make knows how to deal with that itself, but
-# we're invoking a shell script that doesn't know). Referencing
-# $(MAKE) indirectly avoids that behaviour.
-# See https://www.gnu.org/software/make/manual/html_node/MAKE-Variable.html#MAKE-Variable
-NOTSUBMAKEMAKE=$(MAKE)
+check:
+	$(prove_check)
 
-check: test.sh all temp-install
-	MAKE=$(NOTSUBMAKEMAKE) $(with_temp_install) bindir=$(abs_top_builddir)/tmp_install/$(bindir) EXTRA_REGRESS_OPTS="$(EXTRA_REGRESS_OPTS)" $(SHELL) $<
-
-# installcheck is not supported because there's no meaningful way to test
-# pg_upgrade against a single already-running server
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_upgrade/TESTING b/src/bin/pg_upgrade/TESTING
index 78b9747908..3718483a1c 100644
--- a/src/bin/pg_upgrade/TESTING
+++ b/src/bin/pg_upgrade/TESTING
@@ -2,78 +2,23 @@ THE SHORT VERSION
 -
 
 On non-Windows machines, you can execute the testing process
-described below by running
+described below by running the following command in this directory:
 	make check
-in this directory.  This will run the shell script test.sh, performing
-an upgrade from the version in this source tree to a new instance of
-the same version.
 
-To test an upgrade from a different version, you must have a built
-source tree for the old version as well as this version, and you
-must have done "make install" for both versions.  Then do:
+This will run the TAP tests to run pg_upgrade, performing an upgrade
+from the version in this source tree to a new instance of the same
+version.
 
-export oldsrc=...somewhere/postgresql	(old version's source tree)
-export oldbindir=...otherversion/bin	(old version's installed bin dir)
-export bindir=...thisversion/bin	(this version's installed bin dir)
-export libdir=...thisversion/lib	(this version's installed lib dir)
-sh test.sh
+Testing an upgrade from a different version requires a dump to set up
+the contents of this instance, with its set of binaries.  The following
+variables are available to control the test:
+export olddump=...somewhere/dump.sql	(old version's dump)
+export oldinstall=...otherversion/	(old version's install base path)
 
-In this case, you will have to manually eyeball the resulting dump
-diff for version-specific differences, as explained below.
+Finally, the tests can be done by running
+	make check
 
-
-DETAILS

-
-The most 

Re: make tuplestore helper function

2022-03-01 Thread Michael Paquier
On Mon, Feb 28, 2022 at 04:49:41PM +0900, Michael Paquier wrote:
> In order to keep things pluggable at will, MakeFuncResultTuplestore()
> has been changed to access a set of bits32 flags, able to control the
> two options above.  With this facility in place, I have been able to
> cut much more code than the initial patch, roughly twice as of:
>  24 files changed, 157 insertions(+), 893 deletions(-)

So, I have been thinking about this patch once again, and after
pondering more on it, MakeFuncResultTuplestore() is actually a wrong
name now that it does much more than just creating a tuplestore, by
assigning the TupleDesc and the TupleStore into the function's
ReturnSetInfo.

This is actually setting up a function in the context of a single call
where we fill the tuplestore with all its values, so instead I have
settled down to name that SetSingleFuncCall(), to make a parallel with
the existing MultiFuncCall*().  funcapi.c is the right place for
that, and I have added more documentation in the fmgr's README as well
as funcapi.h.
--
Michael
From 5777ec80ee45fb2975a7e61224734892bbd3503e Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 24 Feb 2022 17:27:55 +0900
Subject: [PATCH v10 1/2] Introduce MakeFuncResultTuplestore()

This is the main patch from Melanie, that I have tweaked a bit.  Note
that I am not completely done with it ;p
---
 src/include/funcapi.h |  2 +
 src/backend/access/transam/xlogfuncs.c| 16 +---
 src/backend/commands/event_trigger.c  | 32 +---
 src/backend/commands/extension.c  | 48 +---
 src/backend/commands/prepare.c| 17 +
 src/backend/foreign/foreign.c | 14 +---
 src/backend/libpq/hba.c   | 19 +
 src/backend/replication/logical/launcher.c| 16 +---
 .../replication/logical/logicalfuncs.c| 16 +---
 src/backend/replication/logical/origin.c  | 19 +
 src/backend/replication/slotfuncs.c   | 16 +---
 src/backend/replication/walsender.c   | 16 +---
 src/backend/storage/ipc/shmem.c   | 16 +---
 src/backend/utils/adt/datetime.c  | 17 +
 src/backend/utils/adt/genfile.c   | 31 +---
 src/backend/utils/adt/jsonfuncs.c | 73 +++
 src/backend/utils/adt/mcxtfuncs.c | 16 +---
 src/backend/utils/adt/misc.c  | 14 +---
 src/backend/utils/adt/pgstatfuncs.c   | 48 +---
 src/backend/utils/adt/varlena.c   | 14 +---
 src/backend/utils/fmgr/funcapi.c  | 38 ++
 src/backend/utils/misc/guc.c  | 15 +---
 src/backend/utils/misc/pg_config.c| 13 +---
 src/backend/utils/mmgr/portalmem.c| 17 +
 24 files changed, 82 insertions(+), 461 deletions(-)

diff --git a/src/include/funcapi.h b/src/include/funcapi.h
index ba927c2f33..b9f9e92d1a 100644
--- a/src/include/funcapi.h
+++ b/src/include/funcapi.h
@@ -229,6 +229,8 @@ extern TupleDesc BlessTupleDesc(TupleDesc tupdesc);
 extern AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc);
 extern HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values);
 extern Datum HeapTupleHeaderGetDatum(HeapTupleHeader tuple);
+extern Tuplestorestate *MakeFuncResultTuplestore(FunctionCallInfo fcinfo,
+		TupleDesc *result_tupdesc);
 
 
 /*--
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 12e2bf4135..2fc1ed023c 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -178,24 +178,10 @@ pg_stop_backup_v2(PG_FUNCTION_ARGS)
 	XLogRecPtr	stoppoint;
 	SessionBackupState status = get_backup_status();
 
-	/* check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("materialize mode required, but it is not allowed in this context")));
-
-	/* Build a tuple descriptor for our result type */
-	if (get_call_result_type(fcinfo, NULL, ) != TYPEFUNC_COMPOSITE)
-		elog(ERROR, "return type must be a row type");
-
 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
 
-	tupstore = tuplestore_begin_heap(true, false, work_mem);
+	tupstore = MakeFuncResultTuplestore(fcinfo, );
 	rsinfo->returnMode = SFRM_Materialize;
 	rsinfo->setResult = tupstore;
 	rsinfo->setDesc = tupdesc;
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 1e8587502e..68cad0a580 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -1306,25 +1306,11 @@ pg_event_trigger_dropped_objects(PG_FUNCTION_ARGS)
 		

Re: more descriptive message for process termination due to max_slot_wal_keep_size

2022-03-01 Thread Kyotaro Horiguchi
At Tue, 04 Jan 2022 10:29:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> So what do you say if I propose the following?
> 
> LOG:  terminating process %d to release replication slot \"%s\"
> because its restart_lsn %X/%X exceeds the limit %X/%X
> HINT: You might need to increase max_slot_wal_keep_size.

This version emits the following message.

[35785:checkpointer] LOG:  terminating process 36368 to release replication 
slot "s1" because its restart_lsn 0/1F000148 exceeds the limit 0/2100
[35785:checkpointer] HINT:  You might need to increase max_slot_wal_keep_size.
[36368:walsender] FATAL:  terminating connection due to administrator command
[36368:walsender] STATEMENT:  START_REPLICATION SLOT "s1" 0/1F00 TIMELINE 1
[35785:checkpointer] LOG:  invalidating slot "s1" because its restart_lsn 
0/1F000148 exceeds the limit 0/2100
[35785:checkpointer] HINT:  You might need to increase max_slot_wal_keep_size.

We can omit the HINT line from the termination log for non-persistent
slots but I think we don't want to bother that considering its low
frequency.

The CI was confused by the mixed patches for multiple PG versions. In
this version the patchset for master are attached as .patch and that
for PG13 as .txt.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 4909db2893f0b33ab6bca1ffc3ad802cd159c577 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 24 Dec 2021 12:52:07 +0900
Subject: [PATCH v4 1/2] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie 
Author: Kyotaro Horiguchi 
Reviewed-by: Ashutosh Bapat 
Reviewed-by: Bharath Rupireddy 
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index caa6b29756..92f19bcb35 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1300,8 +1300,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 ereport(LOG,
-		(errmsg("terminating process %d to release replication slot \"%s\"",
-active_pid, NameStr(slotname;
+		(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+active_pid, NameStr(slotname),
+LSN_FORMAT_ARGS(restart_lsn;
 
 (void) kill(active_pid, SIGTERM);
 last_signaled_pid = active_pid;
-- 
2.27.0

>From f31014f85c7dec160bd42e1c48f1c28a7dc22af2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 24 Dec 2021 12:58:25 +0900
Subject: [PATCH v4 2/2] Add detailed information to slot-invalidation messages

The messages says just "exceeds max_slot_wal_keep_size" but doesn't
tell the concrete LSN of the limit. That information helps operators'
understanding on the issue.

Author: Kyotaro Horiguchi 
Reviewed-by: Ashutosh Bapat 
Reviewed-by: Masahiko Sawada 
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 92f19bcb35..be21b62add 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1300,9 +1300,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			if (last_signaled_pid != active_pid)
 			{
 ereport(LOG,
-		(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+		(errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
 active_pid, NameStr(slotname),
-LSN_FORMAT_ARGS(restart_lsn;
+LSN_FORMAT_ARGS(restart_lsn),
+LSN_FORMAT_ARGS(oldestLSN)),
+		 errhint("You might need to increase max_slot_wal_keep_size.")));
 
 (void) kill(active_pid, SIGTERM);
 last_signaled_pid = active_pid;
@@ -1345,9 +1347,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			ReplicationSlotRelease();
 
 			ereport(LOG,
-	(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
+	(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds the limit %X/%X",
 			NameStr(slotname),
-	

Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

2022-03-01 Thread Bharath Rupireddy
On Tue, Mar 1, 2022 at 6:50 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 28 Feb 2022 21:03:07 +0530, Bharath Rupireddy 
>  wrote in
> > Hi,
> >
> > It looks like we use "log segment" in various user-facing messages.
> > The term "log" can mean server logs as well. The "WAL segment" suits
> > well here and it is consistently used across the other user-facing
> > messages [1].
> >
> > Here's a small patch attempting to consistently use the "WAL segment".
> >
> > Thoughts?
>
> I tend to agree to this.

Thanks for taking a look at it. Here's the CF entry -
https://commitfest.postgresql.org/38/3584/

> I also see "log record(s)" (without prefixed
> by "write-ahead") in many places especially in the documentation.  I'm
> not sure how we should treat "WAL log", though.

Yes, but the docs have a glossary term for 'Log record" [1]. FWIW
attaching docs change as v2-0002 patch. I found another place where
"log records" is being used in pg_waldump.c, I changed that and
attached v2-0001 patch.

Please review the v2 patch set.

[1]
 
   Log record

 
  Archaic term for a WAL
record.
 

  

Regards,
Bharath Rupireddy.
From 6263c638b3b50b132cb16dd886ee4ab34bf0e9a5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 2 Mar 2022 05:25:00 +
Subject: [PATCH v2] Use "WAL segment" instead of "log segment"

It looks like we use "log segment" in various user-facing messages.
The term "log" can mean server logs as well. The "WAL segment"
suits well here and it is consistently used across the other
user-facing messages.
---
 src/backend/access/transam/xlogreader.c   | 10 +-
 src/backend/access/transam/xlogrecovery.c |  6 +++---
 src/backend/access/transam/xlogutils.c|  4 ++--
 src/backend/replication/walreceiver.c |  6 +++---
 src/bin/pg_resetwal/pg_resetwal.c |  2 +-
 src/bin/pg_upgrade/controldata.c  |  2 +-
 src/bin/pg_waldump/pg_waldump.c   |  4 ++--
 7 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..a79077c0c8 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -843,7 +843,7 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 		XLogFileName(fname, state->seg.ws_tli, segno, state->segcxt.ws_segsize);
 
 		report_invalid_record(state,
-			  "invalid magic number %04X in log segment %s, offset %u",
+			  "invalid magic number %04X in WAL segment %s, offset %u",
 			  hdr->xlp_magic,
 			  fname,
 			  offset);
@@ -857,7 +857,7 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 		XLogFileName(fname, state->seg.ws_tli, segno, state->segcxt.ws_segsize);
 
 		report_invalid_record(state,
-			  "invalid info bits %04X in log segment %s, offset %u",
+			  "invalid info bits %04X in WAL segment %s, offset %u",
 			  hdr->xlp_info,
 			  fname,
 			  offset);
@@ -898,7 +898,7 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 
 		/* hmm, first page of file doesn't have a long header? */
 		report_invalid_record(state,
-			  "invalid info bits %04X in log segment %s, offset %u",
+			  "invalid info bits %04X in WAL segment %s, offset %u",
 			  hdr->xlp_info,
 			  fname,
 			  offset);
@@ -917,7 +917,7 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 		XLogFileName(fname, state->seg.ws_tli, segno, state->segcxt.ws_segsize);
 
 		report_invalid_record(state,
-			  "unexpected pageaddr %X/%X in log segment %s, offset %u",
+			  "unexpected pageaddr %X/%X in WAL segment %s, offset %u",
 			  LSN_FORMAT_ARGS(hdr->xlp_pageaddr),
 			  fname,
 			  offset);
@@ -942,7 +942,7 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 			XLogFileName(fname, state->seg.ws_tli, segno, state->segcxt.ws_segsize);
 
 			report_invalid_record(state,
-  "out-of-sequence timeline ID %u (after %u) in log segment %s, offset %u",
+  "out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",
   hdr->xlp_tli,
   state->latestPageTLI,
   fname,
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index f9f212680b..feca14d625 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -2988,7 +2988,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			XLogFileName(fname, xlogreader->seg.ws_tli, segno,
 		 wal_segment_size);
 			ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
-	(errmsg("unexpected timeline ID %u in log segment %s, offset %u",
+	(errmsg("unexpected timeline ID %u in WAL segment %s, offset %u",
 			xlogreader->latestPageTLI,
 			fname,
 			offset)));
@@ -3179,13 +3179,13 @@ retry:
 			errno = save_errno;
 			

Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Masahiko Sawada
On Wed, Mar 2, 2022 at 11:55 AM Amit Kapila  wrote:
>
> On Mon, Feb 28, 2022 at 5:46 PM Masahiko Sawada  wrote:
> >
> > I've attached two patches: the first one changes
> > apply_error_callback() so that it uses complete sentences with if-else
> > blocks in order to have a translation work,
> >
>
> This is an improvement over what we have now but I think this is still
> not completely correct as per message translation rules:
> + else
> + errcontext("processing remote data during \"%s\" in transaction %u at %s",
> +logicalrep_message_type(errarg->command),
> +errarg->remote_xid,
> +(errarg->ts != 0) ? timestamptz_to_str(errarg->ts) : "(not-set)");
>
> As per guidelines [1][2], we don't prefer to construct messages at
> run-time aka we can do better for the following part: +(errarg->ts
> != 0) ? timestamptz_to_str(errarg->ts) : "(not-set)". I think we need
> to use if-else here to split it further. If you agree, then the same
> needs to be dealt with in other parts of the patch as well.

I intended to use "(not-set)" as a value rather than a word in the
sentence so I think it doesn't violate the guidelines. We can split it
further as you suggested but we will end up having more if-else
branches.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Allow root ownership of client certificate key

2022-03-01 Thread Chris Bandy
On 3/1/22 3:15 AM, Tom Lane wrote:
> Stephen Frost  writes:
>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>>> I'd be more eager to do that if we had some field complaints
>>> about it.  Since we don't, my inclination is not to, but I'm
>>> only -0.1 or so; anybody else want to vote?
> 
>> This patch was specifically developed in response to field complaints
>> about it working differently, so there's that.
> 
> Anyway, I'd be happier about back-patching if we could document
> actual requests to make it work like the server side does.
> 

This patch is tidy and addresses an incompatibility with Kubernetes, so
+1 from me for a back-patch.


PGO runs PostgreSQL 10 through 14 in Kubernetes, and we have to work
around this issue when using certificates for system accounts.

For example, we use certificates to encrypt and authenticate streaming
replication connections. We store certificates in the Kubernetes API as
Secrets.[1] Kubernetes then hands those certificates/secrets to a
running container by mounting them as files on the filesystem.

Those files and their directories are managed by Kubernetes (as root)
from outside the container, and processes inside the container (as
not-root) cannot change them. They are mounted with these permissions:

  drwxrwsrwt  root postgres  /pgconf/tls
  -rw-r-  root postgres  /pgconf/tls/ca.crt
  -rw-r-  root postgres  /pgconf/tls/tls.crt
  -rw-r-  root postgres  /pgconf/tls/tls.key

  drwxr-sr-x  root postgres  /pgconf/tls/replication
  -rw-r-  root postgres  /pgconf/tls/replication/ca.crt
  -rw-r-  root postgres  /pgconf/tls/replication/tls.crt
  -rw-r-  root postgres  /pgconf/tls/replication/tls.key

Kubernetes treats the server certificate (top) with the same ownership
and permissions as the client certificate for the replication user
(bottom). The server is happy but anything libpq, including walreceiver,
rejects the latter files for not being "u=rw (0600) or less".


There is an open request in the Kubernetes project to provide more
control over ownership and permissions of mounted secrets.[2] PostgreSQL
is mentioned repeatedly as motivation for the feature.


[1]: https://kubernetes.io/docs/concepts/configuration/secret/
[2]: https://issue.kubernetes.io/81089

-- Chris




Re: Failed transaction statistics to measure the logical replication progress

2022-03-01 Thread Masahiko Sawada
On Wed, Mar 2, 2022 at 10:21 AM osumi.takami...@fujitsu.com
 wrote:
>
>
> Also, I quickly checked other similar views(pg_stat_slru, 
> pg_stat_wal_receiver)
> commit logs, especially when they introduce columns.
> But, I couldn't find column name validations.
> So, I feel this is aligned.
>

I've looked at v26 patch and here are some random comments:

+/* determine the subscription entry */
+Oidm_subid;
+
+PgStat_Counter apply_commit_count;
+PgStat_Counter apply_rollback_count;

I think it's better to add the prefix "m_" to
apply_commit/rollback_count for consistency.

---
+/*
+ * Increment the counter of commit for subscription statistics.
+ */
+static void
+subscription_stats_incr_commit(void)
+{
+Assert(OidIsValid(subStats.subid));
+
+subStats.apply_commit_count++;
+}
+

I think we don't need the Assert() here since it should not be a
problem even if subStats.subid is InvalidOid at least in this
function.

If we remove it, we can remove both subscription_stats_incr_commit()
and +subscription_stats_incr_rollback() as well.

---
+void
+pgstat_report_subscription_xact(bool force)
+{
+static TimestampTz last_report = 0;
+PgStat_MsgSubscriptionXact msg;
+
+/* Bailout early if nothing to do */
+if (!OidIsValid(subStats.subid) ||
+(subStats.apply_commit_count == 0 &&
subStats.apply_rollback_count == 0))
+return;
+

+LogicalRepSubscriptionStats subStats =
+{
+.subid = InvalidOid,
+.apply_commit_count = 0,
+.apply_rollback_count = 0,
+};

Do we need subStats.subid? I think we can pass MySubscription->oid (or
MyLogicalRepWorker->subid) to pgstat_report_subscription_xact() along
with the pointer of the statistics (subStats). That way, we don't need
to expose subStats.

Also, I think it's better to add "Xact" or something to the struct
name. For example, SubscriptionXactStats.

---
+
+typedef struct LogicalRepSubscriptionStats
+{
+Oidsubid;
+
+int64  apply_commit_count;
+int64  apply_rollback_count;
+} LogicalRepSubscriptionStats;

We need a description for this struct.

Probably it is better to declare it in logicalworker.h instead so that
pgstat.c includes it instead of worker_internal.h? worker_internal.h
is the header file shared by logical replication workers such as apply
worker,  tablesync worker, and launcher. So it might not be advisable
to include it in pgstat.c.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Design of pg_stat_subscription_workers vs pgstats

2022-03-01 Thread Andres Freund
Hi,

On 2022-03-02 07:35:46 +0530, Amit Kapila wrote:
> Pushed.

Thanks!

Working on rebasing shared memory stats over this. Feels *much* better so far.

While rebasing, I was wondering why pgstat_reset_subscription_counter() has
"all subscription counters" support. We don't have a function to reset all
function stats or such either.

I'm asking because support for that is what currently prevents sub stats from
using the more general function for reset. It's an acceptable amount of code,
but if we don't really need it I'd rather not have it / add it in a more
general way if we want it.

Greetings,

Andres Freund




PG DOCS - logical replication filtering

2022-03-01 Thread Peter Smith
Hi.

PSA a PG docs patch that is associated with the logical replication
Row Filters feature which was recently pushed [1].

This patch introduces a new "Filtering" page to give a common place
where all kinds of logical replication filtering can be described.
(e.g. It is envisaged that a "Column Filters" section can be added
sometime in the future).

The main new content for this page is the "Row Filters" section. This
gives a full overview of the new row filter feature, plus examples.

--
[1] 
https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-PG-docs-Logical-Replication-Filtering.patch
Description: Binary data


Re: Allow async standbys wait for sync replication

2022-03-01 Thread Bharath Rupireddy
On Wed, Mar 2, 2022 at 2:57 AM Nathan Bossart  wrote:
>
> On Tue, Mar 01, 2022 at 11:09:57PM +0530, Bharath Rupireddy wrote:
> > On Tue, Mar 1, 2022 at 10:35 PM Nathan Bossart  
> > wrote:
> >> Yes, perhaps the synchronous replication framework will need to alert WAL
> >> senders when the syncrep LSN advances so that the WAL is sent to the async
> >> standbys.  I'm glossing over the details, but I think that should be the
> >> general direction.
> >
> > It's doable. But we can't avoid async walsenders waiting for the flush
> > LSN even if we take the SyncRepReleaseWaiters() approach right? I'm
> > not sure (at this moment) what's the biggest advantage of this
> > approach i.e. (1) backends waking up walsenders after flush lsn is
> > updated vs (2) walsenders keep looking for the new flush lsn.
>
> I think there are a couple of advantages.  For one, spinning is probably
> not the best from a resource perspective.

Just to be on the same page - by spinning do you mean - the async
walsender waiting for the sync flushLSN in a for-loop with
WaitLatch()?

> There is no guarantee that the
> desired SendRqstPtr will ever be synchronously replicated, in which case
> the WAL sender would spin forever.

The async walsenders will not exactly wait for SendRqstPtr LSN to be
the flush lsn. Say, SendRqstPtr is 100 and the current sync FlushLSN
is 95, they will have to wait until FlushLSN moves ahead of
SendRqstPtr i.e. SendRqstPtr <= FlushLSN. I can't think of a scenario
(right now) that doesn't move the sync FlushLSN at all. If there's
such a scenario, shouldn't it be treated as a sync replication bug?

> Also, this approach might fit in better
> with the existing synchronous replication framework.  When a WAL sender
> realizes that it can't send up to the current "flush" LSN because it's not
> synchronously replicated, it will request to be alerted when it is.

I think you are referring to the way a backend calls SyncRepWaitForLSN
and waits until any one of the walsender sets syncRepState to
SYNC_REP_WAIT_COMPLETE in SyncRepWakeQueue. Firstly, SyncRepWaitForLSN
blocking i.e. the backend spins/waits in for (;;) loop until its
syncRepState becomes SYNC_REP_WAIT_COMPLETE. The backend doesn't do
any other work but waits. So, spinning isn't avoided completely.

Unless, I'm missing something, the existing syc repl queue
(SyncRepQueue) mechanism doesn't avoid spinning in the requestors
(backends) SyncRepWaitForLSN or in the walsenders SyncRepWakeQueue.

> In the
> meantime, it can send up to the latest syncrep LSN so that the async
> standby is as up-to-date as possible.

Just to be clear, there can exist the following scenarios:
Firstly, SendRqstPtr is up to which a walsender can send WAL, it's not the

scenario 1:
async SendRqstPtr is 100, sync FlushLSN is 95 - async standbys will
wait until the FlushLSN moves ahead, once SendRqstPtr <= FlushLSN, it
sends out the WAL.

scenario 2:
async SendRqstPtr is 105, sync FlushLSN is 110 - async standbys will
not wait, it just sends out the WAL up to SendRqstPtr i.e. LSN 105.

scenario 3, same as scenario 2 but SendRqstPtr and FlushLSN is same:
async SendRqstPtr is 105, sync FlushLSN is 105 - async standbys will
not wait, it just sends out the WAL up to SendRqstPtr i.e. LSN 105.

This way, the async standbys are always as up-to-date as possible with
the sync FlushLSN.

Are you referring to any other scenarios?

Regards,
Bharath Rupireddy.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Masahiko Sawada
On Wed, Mar 2, 2022 at 12:21 PM Amit Kapila  wrote:
>
> On Wed, Mar 2, 2022 at 8:25 AM Peter Smith  wrote:
> >
> > On Mon, Feb 28, 2022 at 11:16 PM Masahiko Sawada  
> > wrote:
> > >
> > > The errcontext message would become like follows:
> > >
> > > *Before
> > > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > > DETAIL:  Key (c)=(1) already exists.
> > > CONTEXT:  processing remote data during "INSERT" for replication
> > > target relation "public.test" in transaction 726 at 2022-02-28
> > > 20:59:56.005909+09
> > >
> > > * After
> > > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > > DETAIL:  Key (c)=(1) already exists.
> > > CONTEXT:  processing remote data during "INSERT" for replication
> > > target relation "public.test" in transaction 726 committed at LSN
> > > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from replication
> > > origin "pg_16395"
> > >
> > > I'm a bit concerned that the message may be too long.
> >
> > If you are willing to use abbreviations instead of full
> > words/sentences perhaps you can shorten the long CONTEXT part like
> > below?
> >
> > Before:
> > CONTEXT:  processing remote data during "INSERT" for replication
> > target relation "public.test" in transaction 726 committed at LSN
> > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from
> > replication origin "pg_16395"
> >
> > After:
> > CONTEXT: processing remote data during "INSERT" for replication target
> > relation "public.test" (txid 726, LSN 0/14BFA88, ts 2022-02-28
> > 20:58:27.964238+09, origin "pg_16395")
> >
>
> I am wondering whether we can avoid having a timestamp in the message?
> If one wants, it can be retrieved from the errors otherwise as well.
> For example, I see the below error in my machine:
> 2022-02-26 07:45:25.092 IST [17644] ERROR:  duplicate key value
> violates unique constraint "t1_pkey"
> 2022-02-26 07:45:25.092 IST [17644] DETAIL:  Key (c1)=(1) already exists.
> 2022-02-26 07:45:25.092 IST [17644] CONTEXT:  processing remote data
> during "INSERT" for replication target relation "public.t1" in
> transaction 724 at 2022-02-26 07:45:09.083848+05:30
>
> Now, here, won't the time at the starting of CONTEXT serves our
> purpose. If we can remove the timestamp, I think the message won't
> appear too long. What do you think?

The time of the CONTEXT log message and the time in the message would
largely vary when the subscriber is much behind the publisher. But I
basically agree that the timestamp in the message might not be
important, at least for now. If we will support conflict resolution
that resolves based on the commit timestamp in the future, we might
want it again.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-01 Thread Masahiko Sawada
On Wed, Mar 2, 2022 at 9:34 AM Peter Smith  wrote:
>
> Please see below my review comments for v24.
>
> ==
>
> 1. src/backend/replication/logical/worker.c - start_table_sync
>
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid, 
> !am_tablesync_worker());
>
> (This review comment is just FYI in case you did not do this deliberately)
>
> FYI, you didn't really need to call am_tablesync_worker() here because
> it is already asserted for the sync phase that it MUST be a tablesync
> worker.
>
> OTOH, IMO it documents the purpose of the parm so if it was deliberate
> then that is OK too.
>
> ~~~
>
> 2. src/backend/replication/logical/worker.c - start_table_sync
>
> + PG_CATCH();
> + {
> + /*
> + * Abort the current transaction so that we send the stats message in
> + * an idle state.
> + */
> + AbortOutOfAnyTransaction();
> +
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid, 
> !am_tablesync_worker());
> +
>
> [Maybe you will say that this review comment is unrelated to
> disable_on_err, but since this is a totally new/refactored function
> then I think maybe there is no problem to make this change at the same
> time. Anyway there is no function change, it is just rearranging some
> comments.]
>
> I felt the separation of those 2 statements and comments makes that
> code less clean than it could/should be. IMO they should be grouped
> together.
>
> SUGGESTED
> /*
> * Report the worker failed during table synchronization. Abort the
> * current transaction so that the stats message is sent in an idle
> * state.
> */
> AbortOutOfAnyTransaction();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

After more thoughts, should we do both AbortOutOfAnyTransaction() and
error message handling while holding interrupts? That is,

HOLD_INTERRUPTS();
EmitErrorReport();
FlushErrorState();
AbortOutOfAny Transaction();
RESUME_INTERRUPTS();
pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

I think it's better that we do clean up first and then do other works
such as sending the message to the stats collector and updating the
catalog.

Here are some comments on v24 patch:

+/* Look up our subscription in the catalogs */
+tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
+
CStringGetDatum(MySubscription->name));

s/catalogs/catalog/

Why don't we use SUBSCRIPTIONOID with MySubscription->oid?

---
+if (!HeapTupleIsValid(tup))
+ereport(ERROR,
+errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("subscription \"%s\" does not exist",
+   MySubscription->name));

I think we should use elog() here rather than ereport() since it's a
should-not-happen error.

---
+/* Notify the subscription will be no longer valid */

I'd suggest rephrasing it to like "Notify the subscription will be
disabled". (the subscription is still valid actually, but just
disabled).

---
+/* Notify the subscription will be no longer valid */
+ereport(LOG,
+errmsg("logical replication subscription
\"%s\" will be disabled due to an error",
+   MySubscription->name));
+

I think we can report the log at the end of this function rather than
during the transaction.

---
+my $cmd = qq(
+CREATE TABLE tbl (i INT);
+ALTER TABLE tbl REPLICA IDENTITY FULL;
+CREATE INDEX tbl_idx ON tbl(i));

I think we don't need to set REPLICA IDENTITY FULL to this table since
there is notupdate/delete.

Do we need tbl_idx?

---
+$cmd = qq(
+SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr
+WHERE sr.srsubstate IN ('s', 'r'));
+$node_subscriber->poll_query_until('postgres', $cmd);

It seems better to add a condition of srrelid just in case.

---
+$cmd = qq(
+SELECT count(1) = 1 FROM pg_catalog.pg_subscription s
+WHERE s.subname = 'sub' AND s.subenabled IS FALSE);
+$node_subscriber->poll_query_until('postgres', $cmd)
+  or die "Timed out while waiting for subscriber to be disabled";

I think that it's more natural to directly check the subscription's
subenabled. For example:

SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub';

---
+$cmd = q(ALTER SUBSCRIPTION sub ENABLE);
+$node_subscriber->safe_psql('postgres', $cmd);
+$cmd = q(SELECT COUNT(1) = 3 FROM tbl WHERE i = 3);
+$node_subscriber->poll_query_until('postgres', $cmd)
+  or die "Timed out while waiting for applying";

I think it's better to wait for the subscriber to catch up and check
the query result instead of using poll_query_until() so that we can
check the query result in case where the test fails.

---
+$cmd = qq(DROP INDEX tbl_unique);
+$node_subscriber->safe_psql('postgres', $cmd);

In the newly added tap tests, all queries are consistently assigned to
$cmd and executed even 

Re: Handle infinite recursion in logical replication setup

2022-03-01 Thread vignesh C
On Tue, Mar 1, 2022 at 4:12 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Hi Vignesh,
>
> > In logical replication, currently Walsender sends the data that is
> > generated locally and the data that are replicated from other
> > instances. This results in infinite recursion in circular logical
> > replication setup.
>
> Thank you for good explanation. I understand that this fix can be used
> for a bidirectional replication.

Once these issues are resolved, it can be used for bi-directional
logical replication.

> > Here there are two problems for the user: a) incremental
> > synchronization of table sending both local data and replicated data
> > by walsender b) Table synchronization of table using copy command
> > sending both local data and replicated data
>
> So you wanted to solve these two problem and currently focused on
> the first one, right? We can check one by one.

Yes.

> > For the first problem "Incremental synchronization of table by
> > Walsender" can be solved by:
> > Currently the locally generated data does not have replication origin
> > associated and the data that has originated from another instance will
> > have a replication origin associated. We could use this information to
> > differentiate locally generated data and replicated data and send only
> > the locally generated data. This "only_local" could be provided as an
> > option while subscription is created:
> > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > PUBLICATION pub1 with (only_local = on);
>
> Sounds good, but I cannot distinguish whether the assumption will keep.

Replication origin is created by the apply worker and it will be used
for all the transactions received from the walsender. I feel the
replication origin will be present always.

> I played with your patch, but it could not be applied to current master.
> I tested from bd74c40 and I confirmed infinite loop was not appeared.

I will post an updated version for this soon.

> local_only could not be set from ALTER SUBSCRIPTION command.
> Is it expected?

I wanted to get the opinion from others too just to make sure the
approach is right. I will fix this including the documentation, test,
etc in the later versions.

Regards,
Vignesh




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Amit Kapila
On Wed, Mar 2, 2022 at 8:25 AM Peter Smith  wrote:
>
> On Mon, Feb 28, 2022 at 11:16 PM Masahiko Sawada  
> wrote:
> >
> > The errcontext message would become like follows:
> >
> > *Before
> > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > DETAIL:  Key (c)=(1) already exists.
> > CONTEXT:  processing remote data during "INSERT" for replication
> > target relation "public.test" in transaction 726 at 2022-02-28
> > 20:59:56.005909+09
> >
> > * After
> > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > DETAIL:  Key (c)=(1) already exists.
> > CONTEXT:  processing remote data during "INSERT" for replication
> > target relation "public.test" in transaction 726 committed at LSN
> > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from replication
> > origin "pg_16395"
> >
> > I'm a bit concerned that the message may be too long.
>
> If you are willing to use abbreviations instead of full
> words/sentences perhaps you can shorten the long CONTEXT part like
> below?
>
> Before:
> CONTEXT:  processing remote data during "INSERT" for replication
> target relation "public.test" in transaction 726 committed at LSN
> 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from
> replication origin "pg_16395"
>
> After:
> CONTEXT: processing remote data during "INSERT" for replication target
> relation "public.test" (txid 726, LSN 0/14BFA88, ts 2022-02-28
> 20:58:27.964238+09, origin "pg_16395")
>

I am wondering whether we can avoid having a timestamp in the message?
If one wants, it can be retrieved from the errors otherwise as well.
For example, I see the below error in my machine:
2022-02-26 07:45:25.092 IST [17644] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2022-02-26 07:45:25.092 IST [17644] DETAIL:  Key (c1)=(1) already exists.
2022-02-26 07:45:25.092 IST [17644] CONTEXT:  processing remote data
during "INSERT" for replication target relation "public.t1" in
transaction 724 at 2022-02-26 07:45:09.083848+05:30

Now, here, won't the time at the starting of CONTEXT serves our
purpose. If we can remove the timestamp, I think the message won't
appear too long. What do you think?

-- 
With Regards,
Amit Kapila.




RE: Optionally automatically disable logical replication subscriptions on error

2022-03-01 Thread osumi.takami...@fujitsu.com
On Wednesday, March 2, 2022 9:34 AM Peter Smith  wrote:
> Please see below my review comments for v24.
Thank you for checking my patch !

 
> ==
> 
> 1. src/backend/replication/logical/worker.c - start_table_sync
> 
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> 
> (This review comment is just FYI in case you did not do this deliberately)
> 
> FYI, you didn't really need to call am_tablesync_worker() here because it is
> already asserted for the sync phase that it MUST be a tablesync worker.
> 
> OTOH, IMO it documents the purpose of the parm so if it was deliberate then
> that is OK too.
Fixed.


> ~~~
> 
> 2. src/backend/replication/logical/worker.c - start_table_sync
> 
> + PG_CATCH();
> + {
> + /*
> + * Abort the current transaction so that we send the stats message in
> + * an idle state.
> + */
> + AbortOutOfAnyTransaction();
> +
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> +
> 
> [Maybe you will say that this review comment is unrelated to disable_on_err,
> but since this is a totally new/refactored function then I think maybe there 
> is no
> problem to make this change at the same time. Anyway there is no function
> change, it is just rearranging some comments.]
> 
> I felt the separation of those 2 statements and comments makes that code less
> clean than it could/should be. IMO they should be grouped together.
> 
> SUGGESTED
> /*
> * Report the worker failed during table synchronization. Abort the
> * current transaction so that the stats message is sent in an idle
> * state.
> */
> AbortOutOfAnyTransaction();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> er());
I think this is OK. Thank you for suggestion. Fixed.



> ~~~
> 
> 3. src/backend/replication/logical/worker.c - start_apply
> 
> + /*
> + * Abort the current transaction so that we send the stats message in
> + * an idle state.
> + */
> + AbortOutOfAnyTransaction();
> +
> + /* Report the worker failed during the application of the change */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> 
> Same comment as #2 above, but this code fragment is in start_apply function.
Fixed.


> ~~~
> 
> 4. src/test/subscription/t/029_disable_on_error.pl - comment
> 
> +# Drop the unique index on the sub and re-enabled the subscription.
> +# Then, confirm that we have finished the apply.
> 
> SUGGESTED (tweak the comment wording)
> # Drop the unique index on the sub and re-enable the subscription.
> # Then, confirm that the previously failing insert was applied OK.
Fixed.


Best Regards,
Takamichi Osumi



v25-0001-Optionally-disable-subscriptions-on-error.patch
Description: v25-0001-Optionally-disable-subscriptions-on-error.patch


Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-01 Thread Andy Fan
>
>
> I don't think 0001 is right either, although maybe for somewhat
> different reasons. First, I think it only considers VAR OP CONST style
> clauses, but that is leaving money on the table, because given a.x =
> b.x AND mumble(a.x), we can decide to instead test mumble(b.x) if the
> equality operator in question has is-binary-identical semantics. It
> does not seem necessary for a first patch to deal with both that and
> the somewhat more pleasing case where we're making deductions based on
> operator families ... but we shouldn't commit to a design for the VAR
> OP CONST case without understanding how it could be generalized.
>

I can follow up with this and +1 with the statement.


> Second, it looks to me like the patch takes the rather naive strategy
> of enforcing the derived clauses everywhere that they can legally be
> put, which seems certain not to be optimal.


If we can have some agreement (after more discussion) the EC filter is
acceptable on semantics level,  I think we may have some chances to
improve something at execution level.

-- 
Best Regards
Andy Fan


Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Amit Kapila
On Mon, Feb 28, 2022 at 5:46 PM Masahiko Sawada  wrote:
>
> I've attached two patches: the first one changes
> apply_error_callback() so that it uses complete sentences with if-else
> blocks in order to have a translation work,
>

This is an improvement over what we have now but I think this is still
not completely correct as per message translation rules:
+ else
+ errcontext("processing remote data during \"%s\" in transaction %u at %s",
+logicalrep_message_type(errarg->command),
+errarg->remote_xid,
+(errarg->ts != 0) ? timestamptz_to_str(errarg->ts) : "(not-set)");

As per guidelines [1][2], we don't prefer to construct messages at
run-time aka we can do better for the following part: +(errarg->ts
!= 0) ? timestamptz_to_str(errarg->ts) : "(not-set)". I think we need
to use if-else here to split it further. If you agree, then the same
needs to be dealt with in other parts of the patch as well.

[1] - https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES
[2] - Do not construct sentences at run-time, like:
printf("Files were %s.\n", flag ? "copied" : "removed");
The word order within the sentence might be different in other languages.

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Peter Smith
On Mon, Feb 28, 2022 at 11:16 PM Masahiko Sawada  wrote:
>
> Hia,
>
> We've added some information such as the command and the timestamp to
> the error context message by commit abc0910e2. This patch adds further
> information to it: replication origin name and commit-LSN.
>
> This will be helpful for users to set the origin name and LSN to
> pg_replication_origin_advance().
>
> The errcontext message would become like follows:
>
> *Before
> ERROR:  duplicate key value violates unique constraint "test_pkey"
> DETAIL:  Key (c)=(1) already exists.
> CONTEXT:  processing remote data during "INSERT" for replication
> target relation "public.test" in transaction 726 at 2022-02-28
> 20:59:56.005909+09
>
> * After
> ERROR:  duplicate key value violates unique constraint "test_pkey"
> DETAIL:  Key (c)=(1) already exists.
> CONTEXT:  processing remote data during "INSERT" for replication
> target relation "public.test" in transaction 726 committed at LSN
> 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from replication
> origin "pg_16395"
>
> I'm a bit concerned that the message may be too long.

If you are willing to use abbreviations instead of full
words/sentences perhaps you can shorten the long CONTEXT part like
below?

Before:
CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 726 committed at LSN
0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from
replication origin "pg_16395"

After:
CONTEXT: processing remote data during "INSERT" for replication target
relation "public.test" (txid 726, LSN 0/14BFA88, ts 2022-02-28
20:58:27.964238+09, origin "pg_16395")

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: [Proposal] Global temporary tables

2022-03-01 Thread Wenjing Zeng



> 2022年2月27日 12:13,Andres Freund  写道:
> 
> Hi,
> 
> On 2022-02-27 04:17:52 +0100, Pavel Stehule wrote:
>>> You redirect stats from pg_class and pg_statistics to a local hash table.
>>> This is pretty hairy :(
> 
> As is I think the patch is architecturally completely unacceptable. Having
> code everywhere to redirect to manually written in-memory catalog table code
> isn't maintainable.
> 
> 
>>> I guess you'd also need to handle pg_statistic_ext and ext_data.
>>> pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to
>>> look
>>> at pg_get_gtt_statistics.
>> 
>> Without this, the GTT will be terribly slow like current temporary tables
>> with a lot of problems with bloating of pg_class, pg_attribute and
>> pg_depend tables.
> 
> I think it's not a great idea to solve multiple complicated problems at
> once...

I'm trying to break down the entire implementation into multiple sub-patches.


Regards, Wenjing.


> 
> Greetings,
> 
> Andres Freund
> 
> 





Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-01 Thread Andy Fan
Thanks Tom for joining.


> I'm not in favor of complicating the EquivalenceClass
> mechanism for this, because   (b) what it definitely will do
> is make ECs harder to understand and reason about.


I'm not willing to show opposition on purpose, and I'm not insist on current
strategy,  but I can't understand the comment here, not sure how others.
So I just point it out.  IMO, the semantics of ec_filter is that every EMs
in this
EC can  have this filter.  I do like this method very much.  If we need
something
to improve that, it may be the content in ec_filter is not generic
enough.  For example:

select * from t1, t2 where t1.a = t2.a and t2.a > 3;

Then the EC filter is "t2.a > 3".  Why is it a "t2.a"  rather than a more
generic type to show "any EM" in this EC,  I can double check the
patch to see if this can be any helpful.

Maybe I'm focusing on the current situation too much,  could you describe
more about the badness of this semantics level?



> If we develop a
> separate mechanism that can infer things from inequalities, and it

_only_

kicks in when there are some inequalities, that might work out okay.
>

I will try to make this part clearer.  The current mechanism includes 3
steps.
1). Gather the inequalities_qual_list during the deconstruct_jointree. 2).
After the root->eq_classes is built,  scan each of the above quals to find
out if there is an EC match,  if yes, add it to the EC.  There are some
fast paths here.
3).  compose  the qual in ec_filter and members in ec_members, then
distribute it to the relations.

Step 1 would make sure only inequalities is checked.   Are you unhappy with
the
cost of step 2 here?  for the case like

SELECT * FROM t1, t2 WHERE t1.a = t2.a AND t1.b > 3;

we have to go step 2 and get nothing finally.  As for the case like "FROM
t1, t2, t3
WHERE t1.a = t2.a and t3.c > 3".  t3.c > 3 can be discard quickly with
EC->relids checking.

But because of that, I don't even like the 0001 patch in this series.
> I've not looked at the subsequent ones.
>
>
I agree with 0001 patch should be the first one to reach an agreement .

-- 
Best Regards
Andy Fan


Re: Make mesage at end-of-recovery less scary.

2022-03-01 Thread Kyotaro Horiguchi
At Sat, 19 Feb 2022 09:31:33 +0530, Ashutosh Sharma  
wrote in 
> The changes looks good. thanks.!

Thanks!

Some recent core change changed WAL insertion speed during the TAP
test and revealed one forgotton case of EndOfWAL.  When a record
header flows into the next page, XLogReadRecord does separate check
from ValidXLogRecordHeader by itself.

>* If the whole record header is on this page, validate it immediately.
>* Otherwise do just a basic sanity check on xl_tot_len, and validate 
> the
>* rest of the header after reading it from the next page.  The 
> xl_tot_len
>* check is necessary here to ensure that we enter the "Need to 
> reassemble
>* record" code path below; otherwise we might fail to apply
>* ValidXLogRecordHeader at all.
>*/
>   if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord)
>   {
...
>}
>   else
>   {
>   /* XXX: more validation should be done here */
>   if (total_len < SizeOfXLogRecord)
>   {

I could simplly copy-in a part of ValidXLogRecordHeader there but that
results in rather large duplicate code. I could have
ValidXLogRecordHeader handle the partial header case but it seems to
me complex.

So in this version I split the xl_tot_len part of
ValidXLogRecordHeader into ValidXLogRecordLength.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 01cce076d2b3ad536398cc2b716ef64ed9b2c409 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH v15] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.
---
 src/backend/access/transam/xlogreader.c   | 125 ++
 src/backend/access/transam/xlogrecovery.c |  92 
 src/backend/replication/walreceiver.c |   7 +-
 src/bin/pg_waldump/pg_waldump.c   |  13 ++-
 src/include/access/xlogreader.h   |   1 +
 src/test/recovery/t/011_crash_recovery.pl | 106 ++
 6 files changed, 297 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..ba1c1ece87 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -42,6 +42,8 @@ static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
 static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
 			 int reqLen);
 static void XLogReaderInvalReadState(XLogReaderState *state);
+static bool ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+  XLogRecord *record);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
   XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -121,6 +123,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
 		pfree(state);
 		return NULL;
 	}
+	state->EndOfWAL = false;
 	state->errormsg_buf[0] = '\0';
 
 	/*
@@ -292,6 +295,7 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
 	/* reset error state */
 	*errormsg = NULL;
 	state->errormsg_buf[0] = '\0';
+	state->EndOfWAL = false;
 
 	ResetDecoder(state);
 	state->abortedRecPtr = InvalidXLogRecPtr;
@@ -380,12 +384,11 @@ restart:
 	 * whole header.
 	 */
 	record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
-	total_len = record->xl_tot_len;
 
 	/*
 	 * If the whole record header is on this page, validate it immediately.
-	 * Otherwise do just a basic sanity check on xl_tot_len, and validate the
-	 * rest of the header after reading it from the next page.  The xl_tot_len
+	 * Otherwise do just a basic sanity check on record length, and validate
+	 * the rest of the header after reading it from the next page.  The length
 	 * check is necessary here to ensure that we enter the "Need to reassemble
 	 * record" code path below; otherwise we might fail to apply
 	 * ValidXLogRecordHeader at all.
@@ -399,18 +402,13 @@ restart:
 	}
 	else
 	{
-		/* XXX: more validation should be done here */
-		if (total_len < SizeOfXLogRecord)
-		{
-			report_invalid_record(state,
-  "invalid record length at %X/%X: wanted %u, got %u",
-  LSN_FORMAT_ARGS(RecPtr),
-  (uint32) SizeOfXLogRecord, total_len);
+		if (!ValidXLogRecordLength(state, RecPtr, record))
 			goto err;
-		}
+
 		gotheader = false;
 	}
 
+	total_len = record->xl_tot_len;
 	len = XLOG_BLCKSZ - RecPtr % XLOG_BLCKSZ;
 	if (total_len > len)
 	{
@@ -588,6 +586,15 @@ err:
 		 */
 		state->abortedRecPtr = RecPtr;
 		

Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-03-01 Thread Amit Kapila
On Fri, Feb 25, 2022 at 8:57 AM Amit Kapila  wrote:
>
> On Thu, Feb 24, 2022 at 12:03 PM Michael Paquier  wrote:
> >
> > On Wed, Feb 23, 2022 at 09:52:02AM +0530, Amit Kapila wrote:
> > > Thanks, so you are okay with me pushing that patch just to HEAD.
> >
> > Yes, I am fine with that.  I am wondering about patching the second
> > function though, to avoid any risk of forgetting it, but I am fine to
> > leave that to your judgement.
> >
>
> The corresponding patch with other changes is not very far from being
> ready to commit. So, will do it along with that.
>

This is done as part of commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7a85073290856554416353a89799a4c04d09b74b


-- 
With Regards,
Amit Kapila.




RE: Logical replication timeout problem

2022-03-01 Thread wangw.f...@fujitsu.com
On Mon, Feb 28, 2022 at 6:58 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> Dear Wang,
> 
> > Attached a new patch that addresses following improvements I have got
> > so far as
> > comments:
> > 1. Consider other changes that need to be skipped(truncate, DDL and
> > function calls pg_logical_emit_message). [suggestion by Ajin, Amit]
> > (Add a new function SendKeepaliveIfNecessary for trying to send
> > keepalive
> > message.)
> > 2. Set the threshold conservatively to a static value of
> > 1.[suggestion by Amit, Kuroda-San] 3. Reset sendTime in function
> > WalSndUpdateProgress when send_keep_alive is false. [suggestion by
> > Amit]
> 
> Thank you for giving a good patch! I'll check more detail later, but it can be
> applied my codes and passed check world.
> I put some minor comments:
Thanks for your comments.

> ```
> + * Try to send keepalive message
> ```
> 
> Maybe missing "a"?
Fixed. Add missing "a".

> ```
> +   /*
> +   * After continuously skipping SKIPPED_CHANGES_THRESHOLD changes, try
> to send a
> +   * keepalive message.
> +   */
> ```
> 
> This comments does not follow preferred style:
> https://www.postgresql.org/docs/devel/source-format.html
Fixed. Correct wrong comment style.

> ```
> @@ -683,12 +683,12 @@ OutputPluginWrite(struct LogicalDecodingContext *ctx,
> bool last_write)
>   * Update progress tracking (if supported).
>   */
>  void
> -OutputPluginUpdateProgress(struct LogicalDecodingContext *ctx)
> +OutputPluginUpdateProgress(struct LogicalDecodingContext *ctx, bool
> +send_keep_alive)
> ```
> 
> This function is no longer doing just tracking.
> Could you update the code comment above?
Fixed. Update the comment above function OutputPluginUpdateProgress.

> ```
>   if (!is_publishable_relation(relation))
>   return;
> ```
> 
> I'm not sure but it seems that the function exits immediately if relation is a
> sequence, view, temporary table and so on. Is it OK? Does it never happen?
I did some checks to confirm this. After my confirmation, there are several
situations that can cause a timeout. For example, if I insert many date into
table sql_features in a long transaction, subscriber-side will time out.
Although I think users should not modify these tables arbitrarily, it could
happen. To be conservative, I think this use case should be addressed as well.
Fixed. Invoke function SendKeepaliveIfNecessary before return.

> ```
> +   SendKeepaliveIfNecessary(ctx, false);
> ```
> 
> I think a comment is needed above which clarifies sending a keepalive message.
Fixed. Before invoking function SendKeepaliveIfNecessary, add the corresponding
comment.

Attach the new patch. [suggestion by Kuroda-San]
1. Fix the typo.
2. Improve comment style.
3. Fix missing consideration.
4. Add comments to clarifies above functions and function calls.

Regards,
Wang wei


0001-Fix-the-timeout-of-subscriber-in-long-transactions.patch
Description:  0001-Fix-the-timeout-of-subscriber-in-long-transactions.patch


Re: Design of pg_stat_subscription_workers vs pgstats

2022-03-01 Thread Amit Kapila
On Mon, Feb 28, 2022 at 1:15 PM tanghy.f...@fujitsu.com
 wrote:
>
> On Mon, Feb 28, 2022 12:32 PM Amit Kapila  wrote:
> >
> > I decided to keep this part of the docs as it is and fixed a few other
> > minor comments raised by you and Peter. Additionally, I have bumped
> > the PGSTAT_FILE_FORMAT_ID. I'll push this patch tomorrow unless there
> > are any other major comments.
> >
>
> Thanks for your patch. I have finished the review/test for this patch.
> The patch LGTM.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-01 Thread Tom Lane
Robert Haas  writes:
> I agree. My question is: why shouldn't every case where we can deduce
> an implied inequality be reasonably likely to show a benefit?

Maybe it will be, if we can deal with the issue you already mentioned
about not misestimating the resulting partially-redundant conditions.

> Second, it looks to me like the patch takes the rather naive strategy
> of enforcing the derived clauses everywhere that they can legally be
> put, which seems certain not to be optimal.

I'm not sure about that ... it's basically what we do with derived
equalities.  However, there's enough structure in the equivalence-class
case that we don't end up enforcing redundant quals.  It's not clear
to me whether the same can be said here.

> I don't know whether attaching something to the equivalence class data
> structure is the right idea or not. Presumably, we don't want to make
> an extra pass over the query tree to gather the information needed for
> this kind of optimization, and it feels like we need to know which
> vars are EMs before we try to derive alternate/additional quals.

Yeah, we don't want to make an additional pass over the tree, and
we also would rather not add an additional set of per-operator
catalog lookups.  We might be able to generalize the code that looks
for equality operators so that it looks for "any btree operator"
with the same number of lookups, and then have it feed the results
down either the EquivalenceClass path or the inequality path
as appropriate.  At the end, after we've formed all the ECs, we
could have a go at matching up the inequality structures with the
ECs.  But I don't agree that ECs are a necessary prerequisite.
Here are a couple of other patterns that might be worth looking for:

* "a > b AND b > c" allows deducing "a > c", whether or not any
of those values appears in an EC.

* "a > const1 AND a > const2" can be simplified to either "a > const1"
or "a > const2" depending on which constant is larger.  (The predicate
proof mechanism already has a form of this, but we don't typically
apply it in a way that would result in dropping the redundant qual.)

It's entirely possible that one or both of these patterns is not
worth looking for.  But I would say that it's equally unproven
that deriving "a > c" from "a = b AND b > c" is worth the cycles.
I'll grant that it's most likely going to be a win if we can use
any of these patterns to generate a restriction clause from what
had been join clauses.  Beyond that it's much less clear.

regards, tom lane




RE: logical replication empty transactions

2022-03-01 Thread shiy.f...@fujitsu.com
Hi,

Here are some comments on the v21 patch.

1.
+   WalSndKeepalive(false, 0);

Maybe we can use InvalidXLogRecPtr here, instead of 0.

2.
+   pq_sendint64(_message, writePtr ? writePtr : sentPtr);

Similarly, should we use XLogRecPtrIsInvalid()?

3.
@@ -1183,6 +1269,20 @@ pgoutput_change(LogicalDecodingContext *ctx, 
ReorderBufferTXN *txn,
Assert(false);
}
 
+   if (in_streaming)
+   {
+   /* If streaming, send STREAM START if we haven't yet */
+   if (txndata && !txndata->sent_stream_start)
+   pgoutput_send_stream_start(ctx, txn);
+   }
+   else
+   {
+   /* If not streaming, send BEGIN if we haven't yet */
+   if (txndata && !txndata->sent_begin_txn)
+   pgoutput_send_begin(ctx, txn);
+   }
+
+
/* Avoid leaking memory by using and resetting our own context */
old = MemoryContextSwitchTo(data->context);


I am not sure if it is suitable to send begin or stream_start here, because the
row filter is not checked yet. That means, empty transactions caused by row
filter are not skipped.

4.
@@ -1617,9 +1829,21 @@ pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
ReorderBufferTXN *txn,
XLogRecPtr prepare_lsn)
 {
+   PGOutputTxnData *txndata = txn->output_plugin_private;
+   boolsent_begin_txn = txndata->sent_begin_txn;
+
Assert(rbtxn_is_streamed(txn));
 
-   OutputPluginUpdateProgress(ctx);
+   pfree(txndata);
+   txn->output_plugin_private = NULL;
+
+   if (!sent_begin_txn)
+   {
+   elog(DEBUG1, "Skipping replication of an empty transaction in 
stream prepare");
+   return;
+   }
+
+   OutputPluginUpdateProgress(ctx, false);
OutputPluginPrepareWrite(ctx, true);
logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn);
OutputPluginWrite(ctx, true);

I notice that the patch skips stream prepared transaction, this would cause an
error on subscriber side when committing this transaction on publisher side, so
I think we'd better not do that.

For example:
(set logical_decoding_work_mem = 64kB, max_prepared_transactions = 10 in
postgresql.conf)

-- publisher
create table test (a int, b text, primary key(a));
create table test2 (a int, b text, primary key(a));
create publication pub for table test;

-- subscriber 
create table test (a int, b text, primary key(a));
create table test2 (a int, b text, primary key(a));
create subscription sub connection 'dbname=postgres port=5432' publication pub 
with(two_phase=on, streaming=on);

-- publisher
begin;
INSERT INTO test2 SELECT i, md5(i::text) FROM generate_series(1, 1000) s(i);
prepare transaction 't';
commit prepared 't';

The error message in subscriber log:
ERROR:  prepared transaction with identifier "pg_gid_16391_722" does not exist


Regards,
Shi yu


RE: logical replication restrictions

2022-03-01 Thread osumi.takami...@fujitsu.com
On Wednesday, March 2, 2022 8:54 AM Euler Taveira  wrote:
> On Tue, Mar 1, 2022, at 3:27 AM, osumi.takami...@fujitsu.com
>   wrote:
> 
> 
>   $ git am v1-0001-Time-delayed-logical-replication-subscriber.patch
> 
> 
> I generally use -3 to fall back on 3-way merge. Doesn't it work for you?
It did. Excuse me for making noises.


Best Regards,
Takamichi Osumi





RE: Failed transaction statistics to measure the logical replication progress

2022-03-01 Thread osumi.takami...@fujitsu.com
On Tuesday, March 1, 2022 4:12 PM Peter Smith  wrote:
> Please see below my review comments for v25.
> 
> ==
> 
> 1. Commit message
> 
> Introduce cumulative columns of transactions of logical replication subscriber
> to the pg_stat_subscription_stats view.
> 
> "cumulative columns of transactions" sounds a bit strange to me.
> 
> SUGGESTED
> Introduce 2 new subscription statistics columns (apply_commit_count, and
> apply_rollback_count) to the pg_stat_subscription_stats view for counting
> cumulative transaction commits/rollbacks.
Fixed.



> ~~~
> 
> 2. doc/src/sgml/monitoring.sgml - bug
> 
> The new SGML s have been added in the wrong place!
> 
> I don't think this renders like you expect it does. Please regenerate the 
> help to
> see for yourself.
Fixed.


> ~~~
> 
> 3. doc/src/sgml/monitoring.sgml - wording
> 
> +  
> +   Number of transactions rollbacked in this subscription. Both
> +   ROLLBACK of transaction streamed as
> in-progress
> +   transaction and ROLLBACK PREPARED
> increment this
> +   counter.
> +  
> 
> BEFORE
> Number of transactions rollbacked in this subscription.
> 
> SUGGESTED
> Number of transaction rollbacks in this subscription.
Fixed.


> ~~~
> 
> 4. doc/src/sgml/monitoring.sgml - wording
> 
> +  
> +   Number of transactions rollbacked in this subscription. Both
> +   ROLLBACK of transaction streamed as
> in-progress
> +   transaction and ROLLBACK PREPARED
> increment this
> +   counter.
> +  
> 
> Trying to distinguish between the ROLLBACK of a transaction and of a
> streamed in-progress transaction seems to have made this description too
> complicated. I don't think the user even cares/knows about this
> (in-progress) distinction. So, I think this should just be written more simply
> (like the COMMIT part was)
> 
> BEFORE
> Both  ROLLBACK of transaction streamed as
> in-progress  transaction and ROLLBACK
> PREPARED increment this counter.
> 
> SUGGESTED
> Both  ROLLBACK and ROLLBACK
> PREPARED increment this counter.
Fixed.


> ~~~
> 
> 5. Question - column names.
> 
> Just curious why the columns are called "apply_commit_count" and
> "apply_rollback_count"? Specifically, what extra meaning do those names have
> versus just calling them "commit_count" and "rollback_count"?
I think there's possibility that we'll have counters
for tablesync commit for example. So, the name prefix avoids
the overlap between the possible names.



> ~~~
> 
> 6. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact
> 
> @@ -3421,6 +3425,60 @@ pgstat_send_slru(void)  }
> 
>  /* --
> + * pgstat_report_subscription_xact() -
> + *
> + * Send a subscription transaction stats to the collector.
> + * The statistics are cleared upon sending.
> + *
> + * 'force' is true only when the subscription worker process exits.
> + * --
> + */
> +void
> +pgstat_report_subscription_xact(bool force)
> 
> 6a.
> I think this comment should be worded more like the other
> pgstat_report_subscption_XXX comments
> 
> BEFORE
> Send a subscription transaction stats to the collector.
> 
> SUGGESTED
> Tell the collector about subscriptions transaction stats.
Fixed.



> 6b.
> + * 'force' is true only when the subscription worker process exits.
> 
> I thought this comment should just describe what the 'force' param actually
> does in this function; not the scenario about who calls it...
Fixed.



> ~~~
> 
> 7. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact
> 
> I think the entire function maybe should be relocated to be nearby the other
> pgstat_report_subscription_XXX functions in the source.
I placed the pgstat_report_subscription_xact below 
pgstat_report_subscription_drop.
Meanwhile, pgstat_recv_subscription_xact, another new function in pgstat.c,
is already placed below pgstat_recv_subscription_error, so I kept it as it is.



> ~~~
> 
> 8. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact
> 
> + /*
> + * This function can be called even if nothing at all has happened. In
> + * this case, there's no need to go forward.
> + */
> 
> Too much information. Clearly, it is possible for this function to be called 
> for this
> case otherwise this code would not exist in the first place :) IMO the comment
> can be much simpler but still say all it needs to.
> 
> BEFORE
> This function can be called even if nothing at all has happened. In this case,
> there's no need to go forward.
> SUGGESTED
> Bailout early if nothing to do.
Fixed.


> ~~~
> 
> 9. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact
> 
> + if (subStats.subid == InvalidOid ||
> + (subStats.apply_commit_count == 0 && subStats.apply_rollback_count ==
> + 0)) return;
> 
> Maybe using !OisIsValid(subStats.subid) is better?
Fixed.


> ~~~
> 
> 10. src/backend/postmaster/pgstat.c - pgstat_report_subscription_xact
> 
> + /*
> + * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
> + * msec since we last sent one to avoid 

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-01 Thread Robert Haas
On Tue, Mar 1, 2022 at 5:53 PM Tom Lane  wrote:
> Robert Haas  writes:
> > This topic has been discussed a number of times, and Tom has basically
> > always said that he thinks this would be expensive to plan (which I
> > think is true) and that we wouldn't get much benefit (which I think is
> > false).
>
> I think the trick here, as in so many other places, is to not impose
> significant extra planning cost on queries that don't end up getting
> any benefit.

I agree. My question is: why shouldn't every case where we can deduce
an implied inequality be reasonably likely to show a benefit? If the
query specifies that a.x = b.x and also that a.x < 42, the only reason
to suppose that evaluating a.x < 42 rather than b.x < 42 or in
addition to b.x < 42 is likely to be better is if we assume the user
knows how the query optimizer works and has employed that knowledge in
crafting the query. And admittedly, sophisticated users are probably
likely to do that, and even unsophisticated users may do it more
likely than chance would dictate. But it still feels like we have a
good chance of landing of coming out ahead pretty often unless the
user really knows what they are doing. And even then, any mechanism we
add here can have an off switch.

> I'm not in favor of complicating the EquivalenceClass
> mechanism for this, because (a) I don't think that such an approach
> will lead to success on that metric, and (b) what it definitely will do
> is make ECs harder to understand and reason about.  If we develop a
> separate mechanism that can infer things from inequalities, and it only
> kicks in when there are some inequalities, that might work out okay.
> But because of that, I don't even like the 0001 patch in this series.
> I've not looked at the subsequent ones.

I don't think 0001 is right either, although maybe for somewhat
different reasons. First, I think it only considers VAR OP CONST style
clauses, but that is leaving money on the table, because given a.x =
b.x AND mumble(a.x), we can decide to instead test mumble(b.x) if the
equality operator in question has is-binary-identical semantics. It
does not seem necessary for a first patch to deal with both that and
the somewhat more pleasing case where we're making deductions based on
operator families ... but we shouldn't commit to a design for the VAR
OP CONST case without understanding how it could be generalized.
Second, it looks to me like the patch takes the rather naive strategy
of enforcing the derived clauses everywhere that they can legally be
put, which seems certain not to be optimal.

I don't know whether attaching something to the equivalence class data
structure is the right idea or not. Presumably, we don't want to make
an extra pass over the query tree to gather the information needed for
this kind of optimization, and it feels like we need to know which
vars are EMs before we try to derive alternate/additional quals. So I
guess we'd want to study clauses for possible use by this kind of
mechanism after we've derived ECs but before we do any costing stuff,
yet without introducing a whole new pass. Once we do derive that
information, where are we going to put it? We have to be able to tell
efficiently when looking at a baserel whether there are any implied
inequalities that we should be thinking about ... and there's nothing
obvious tying all of the relevant places together other than the EM.
But I'm kind of blathering here: I feel like there are a lot of
complexities I haven't thought hard enough about to have an
intelligent opinion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Nathan Bossart
Here is a new version of the patch with the following changes:

1. Addressed Chap's feedback from upthread.
2. Renamed pg_start/stop_backup() to pg_backup_start/stop() as
   suggested by David.
3. A couple of other small documentation adjustments.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7119f9063f22652fca1e2a44fdf6b4b6b3fbf679 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 1 Dec 2021 23:50:49 +
Subject: [PATCH v4 1/1] remove exclusive backup mode

---
 doc/src/sgml/backup.sgml  | 188 +--
 doc/src/sgml/func.sgml|  99 +---
 doc/src/sgml/high-availability.sgml   |   6 +-
 doc/src/sgml/monitoring.sgml  |   4 +-
 doc/src/sgml/ref/pgupgrade.sgml   |   2 +-
 src/backend/access/transam/xlog.c | 493 ++
 src/backend/access/transam/xlogfuncs.c| 253 ++---
 src/backend/access/transam/xlogrecovery.c |   2 +-
 src/backend/catalog/system_functions.sql  |  18 +-
 src/backend/postmaster/postmaster.c   |  46 +-
 src/backend/replication/basebackup.c  |  20 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |   4 +
 src/bin/pg_ctl/pg_ctl.c   |   4 +-
 src/bin/pg_rewind/filemap.c   |   6 +-
 src/include/access/xlog.h |   7 +-
 src/include/catalog/pg_control.h  |   2 +-
 src/include/catalog/pg_proc.dat   |  26 +-
 src/include/miscadmin.h   |   4 -
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  56 +-
 .../t/010_logical_decoding_timelines.pl   |   4 +-
 20 files changed, 190 insertions(+), 1054 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0d69851bb1..acffee4688 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -857,16 +857,8 @@ test ! -f /mnt/server/archivedir/000100A90065  cp pg_wal/0
 sequence, and that the success of a step is verified before
 proceeding to the next step.

-   
-Low level base backups can be made in a non-exclusive or an exclusive
-way. The non-exclusive method is recommended and the exclusive one is
-deprecated and will eventually be removed.
-   
-
-   
-Making a Non-Exclusive Low-Level Backup
 
- A non-exclusive low level backup is one that allows other
+ A low level backup allows other
  concurrent backups to be running (both those started using
  the same backup API and those started using
  ).
@@ -881,19 +873,19 @@ test ! -f /mnt/server/archivedir/000100A90065  cp pg_wal/0

 
  Connect to the server (it does not matter which database) as a user with
- rights to run pg_start_backup (superuser, or a user who has been granted
+ rights to run pg_backup_start (superuser, or a user who has been granted
  EXECUTE on the function) and issue the command:
 
-SELECT pg_start_backup('label', false, false);
+SELECT pg_backup_start('label', false);
 
  where label is any string you want to use to uniquely
  identify this backup operation. The connection
- calling pg_start_backup must be maintained until the end of
+ calling pg_backup_start must be maintained until the end of
  the backup, or the backup will be automatically aborted.
 
 
 
- By default, pg_start_backup can take a long time to finish.
+ By default, pg_backup_start can take a long time to finish.
  This is because it performs a checkpoint, and the I/O
  required for the checkpoint will be spread out over a significant
  period of time, by default half your inter-checkpoint interval
@@ -905,10 +897,6 @@ SELECT pg_start_backup('label', false, false);
  issue an immediate checkpoint using as much I/O as available.
 
 
-
- The third parameter being false tells
- pg_start_backup to initiate a non-exclusive base backup.
-


 
@@ -926,7 +914,7 @@ SELECT pg_start_backup('label', false, false);
 
  In the same connection as before, issue the command:
 
-SELECT * FROM pg_stop_backup(false, true);
+SELECT * FROM pg_backup_stop(true);
 
  This terminates backup mode. On a primary, it also performs an automatic
  switch to the next WAL segment.  On a standby, it is not possible to
@@ -937,7 +925,7 @@ SELECT * FROM pg_stop_backup(false, true);
  ready to archive.
 
 
- The pg_stop_backup will return one row with three
+ The pg_backup_stop will return one row with three
  values. The second of these fields should be written to a file named
  backup_label in the root directory of the backup. The
  third field should be written to a file named
@@ -949,14 +937,14 @@ SELECT * FROM pg_stop_backup(false, true);

 
  Once the WAL segment files active during the backup are archived, you are
- done.  The file identified by pg_stop_backup's first 

Re: Separate the result of \watch for each query execution (psql)

2022-03-01 Thread Noboru Saito
Tom Lane :
> Noboru Saito  writes:
> > I have created a patch that allows you to turn it on and off in \pset.
> > The attached patch adds the following features.
> > Formfeed can be turned on with the command line option or \pset.
> > Formfeed (\f\n) is output after the query execution result by \watch.
>
> Hmm ... I grant your use-case for this, but I think the patch
> is too narrow-minded, because it supposes that the only string
> anybody could wish to output between \watch commands is "\f\n".
> Once you open the floodgates of inserting formatting there,
> ISTM that people might want other things.
>
> Also, I'm not that thrilled with treating this as a \pset option,
> because it has nothing to do with formatting of normal query
> results.  (IMV anyway, perhaps others will disagree.)
>
> How about instead of defining fixed semantics, we invent a psql
> special variable that can contain a string to be output between
> \watch commands?  It looks like you could then set it through
> a command like

I understand because it's a problem I was worried about.
However, we need a machine-readable string for PAGER.
It didn't exist before, so newcomers will match it to a fixed string.
The de facto standard is required even if it can be changed.

> \set WATCH_SEPARATOR '\f\n'
>
> (not wedded to that variable name, it's just the first idea
> that came to mind)

I think it's better not to include "WATCH" in the name,
as it may be used in the future other than \watch.

> Personally I'd not bother with inventing a specialized command-line
> option to set it, either.  There's already -v and friends.
>
> > * Is formfeed output after the result, not before?
>
> Or we could invent WATCH_BEFORE and WATCH_AFTER ...
>
> regards, tom lane

It may be good to add it in the future.
For now, I'm fine with either one.

Thank you.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-01 Thread Peter Smith
Please see below my review comments for v24.

==

1. src/backend/replication/logical/worker.c - start_table_sync

+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

(This review comment is just FYI in case you did not do this deliberately)

FYI, you didn't really need to call am_tablesync_worker() here because
it is already asserted for the sync phase that it MUST be a tablesync
worker.

OTOH, IMO it documents the purpose of the parm so if it was deliberate
then that is OK too.

~~~

2. src/backend/replication/logical/worker.c - start_table_sync

+ PG_CATCH();
+ {
+ /*
+ * Abort the current transaction so that we send the stats message in
+ * an idle state.
+ */
+ AbortOutOfAnyTransaction();
+
+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());
+

[Maybe you will say that this review comment is unrelated to
disable_on_err, but since this is a totally new/refactored function
then I think maybe there is no problem to make this change at the same
time. Anyway there is no function change, it is just rearranging some
comments.]

I felt the separation of those 2 statements and comments makes that
code less clean than it could/should be. IMO they should be grouped
together.

SUGGESTED
/*
* Report the worker failed during table synchronization. Abort the
* current transaction so that the stats message is sent in an idle
* state.
*/
AbortOutOfAnyTransaction();
pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

~~~

3. src/backend/replication/logical/worker.c - start_apply

+ /*
+ * Abort the current transaction so that we send the stats message in
+ * an idle state.
+ */
+ AbortOutOfAnyTransaction();
+
+ /* Report the worker failed during the application of the change */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

Same comment as #2 above, but this code fragment is in start_apply function.

~~~

4. src/test/subscription/t/029_disable_on_error.pl - comment

+# Drop the unique index on the sub and re-enabled the subscription.
+# Then, confirm that we have finished the apply.

SUGGESTED (tweak the comment wording)
# Drop the unique index on the sub and re-enable the subscription.
# Then, confirm that the previously failing insert was applied OK.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: speed up a logical replica setup

2022-03-01 Thread Andreas Karlsson

On 2/21/22 13:09, Euler Taveira wrote:

DESIGN

The conversion requires 8 steps.

1. Check if the target data directory has the same system identifier 
than the

source data directory.
2. Stop the target server if it is running as a standby server. (Modify
recovery parameters requires a restart.)
3. Create one replication slot per specified database on the source 
server. One

additional replication slot is created at the end to get the consistent LSN
(This consistent LSN will be used as (a) a stopping point for the recovery
process and (b) a starting point for the subscriptions).
4. Write recovery parameters into the target data directory and start the
target server (Wait until the target server is promoted).
5. Create one publication (FOR ALL TABLES) per specified database on the 
source

server.
6. Create one subscription per specified database on the target server (Use
replication slot and publication created in a previous step. Don't 
enable the

subscriptions yet).
7. Sets the replication progress to the consistent LSN that was got in a
previous step.
8. Enable the subscription for each specified database on the target server.


Very interesting!

I actually just a couple of weeks ago proposed a similar design for 
upgrading a database of a customer of mine. We have not tried it yet so 
it is not decided if we should go ahead with it.


In our case the goal is a bit different so my idea is that we will use 
pg_dump/pg_restore (or pg_upgrade and then some manual cleanup if 
pg_dump/pg_restore is too slow) on the target server. The goal of this 
design is to get a nice clean logical replica at the new version of 
PostgreSQL with indexes with the correct collations, all old invalid 
constraints validated, minimal bloat, etc. And all of this without 
creating bloat or putting too much load on the old master during the 
process. We have plenty of disk space and plenty of time so those are 
not limitations in our case. I can go into more detail if there is interest.


It is nice to see that our approach is not entirely unique. :) And I 
will take a look at this patch when I find the time.


Andreas




Re: logical replication restrictions

2022-03-01 Thread Euler Taveira
On Tue, Mar 1, 2022, at 3:27 AM, osumi.takami...@fujitsu.com wrote:
> $ git am v1-0001-Time-delayed-logical-replication-subscriber.patch
I generally use -3 to fall back on 3-way merge. Doesn't it work for you?


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Typo in pgbench messages.

2022-03-01 Thread Tatsuo Ishii
>> So are you fine with Kawamoto-san's patch?
> 
> Yes.
> 
> Patch applies cleanly (hmmm, it would have been better to have it as
> an attachement). Make & make check ok. Fine with me.

Thank you for the review. Fix pushed to master branch.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-01 Thread Peter Geoghegan
On Tue, Mar 1, 2022 at 1:46 PM Robert Haas  wrote:
> I think that this is not really a description of an algorithm -- and I
> think that it is far from clear that the third "in-between" category
> does not need to exist.

But I already described the algorithm. It is very simple
mechanistically -- though that in itself means very little. As I have
said multiple times now, the hard part is assessing what the
implications are. And the even harder part is making a judgement about
whether or not those implications are what we generally want.

> I think findings like this are very unconvincing.

TPC-C may be unrealistic in certain ways, but it is nevertheless
vastly more realistic than pgbench. pgbench is really more of a stress
test than a benchmark.

The main reasons why TPC-C is interesting here are *very* simple, and
would likely be equally true with TPC-E (just for example) -- even
though TPC-E is a very different benchmark kind of OLTP workload
overall. TPC-C (like TPC-E) features a diversity of transaction types,
some of which are more complicated than others -- which is strictly
more realistic than having only one highly synthetic OLTP transaction
type. Each transaction type doesn't necessarily modify the same tables
in the same way. This leads to natural diversity among tables and
among transactions, including:

* The typical or average number of distinct XIDs per heap page varies
significantly among each table. There are way fewer distinct XIDs per
"order line" table heap page than there are per "order" table heap
page, for the obvious reason.

* Roughly speaking, there are various different ways that free space
management ought to work in a system like Postgres. For example it is
necessary to make a "fragmentations vs space utilization" trade-off
with the new orders table.

* There are joins in some of the transactions!

Maybe TPC-C is a crude approximation of reality, but it nevertheless
exercises relevant parts of the system to a significant degree. What
else would you expect me to use, for a project like this? To a
significant degree the relfrozenxid tracking stuff is interesting
because tables tend to have natural differences like the ones I have
highlighted on this thread. How could that not be the case? Why
wouldn't we want to take advantage of that?

There might be some danger in over-optimizing for this particular
benchmark, but right now that is so far from being the main problem
that the idea seems strange to me. pgbench doesn't need the FSM, at
all. In fact pgbench doesn't even really need VACUUM (except for
antiwraparound), once heap fillfactor is lowered to 95 or so. pgbench
simply isn't relevant, *at all*, except perhaps as a way of measuring
regressions in certain synthetic cases that don't benefit.

> TPC-C (or any
> benchmark really) is so simple as to be a terrible proxy for what
> vacuuming is going to look like on real-world systems.

Doesn't that amount to "no amount of any kind of testing or
benchmarking will convince me of anything, ever"?

There is more than one type of real-world system. I think that TPC-C
is representative of some real world systems in some regards. But even
that's not the important point for me. I find TPC-C generally
interesting for one reason: I can clearly see that Postgres does
things in a way that just doesn't make much sense, which isn't
particularly fundamental to how VACUUM works.

My only long term goal is to teach Postgres to *avoid* various
pathological cases exhibited by TPC-C (e.g., the B-Tree "split after
new tuple" mechanism from commit f21668f328 *avoids* a pathological
case from TPC-C). We don't necessarily have to agree on how important
each individual case is "in the real world" (which is impossible to
know anyway). We only have to agree that what we see is a pathological
case (because some reasonable expectation is dramatically violated),
and then work out a fix.

I don't want to teach Postgres to be clever -- I want to teach it to
avoid being stupid in cases where it exhibits behavior that really
cannot be described any other way. You seem to talk about some of this
work as if it was just as likely to have a detrimental effect
elsewhere, for some equally plausible workload, which will have a
downside that is roughly as bad as the advertised upside. I consider
that very unlikely, though. Sure, regressions are quite possible, and
a real concern -- but regressions *like that* are unlikely. Avoiding
doing what is clearly the wrong thing just seems to work out that way,
in general.

-- 
Peter Geoghegan




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-01 Thread Tom Lane
Robert Haas  writes:
> This topic has been discussed a number of times, and Tom has basically
> always said that he thinks this would be expensive to plan (which I
> think is true) and that we wouldn't get much benefit (which I think is
> false).

I think the trick here, as in so many other places, is to not impose
significant extra planning cost on queries that don't end up getting
any benefit.  I'm not in favor of complicating the EquivalenceClass
mechanism for this, because (a) I don't think that such an approach
will lead to success on that metric, and (b) what it definitely will do
is make ECs harder to understand and reason about.  If we develop a
separate mechanism that can infer things from inequalities, and it only
kicks in when there are some inequalities, that might work out okay.
But because of that, I don't even like the 0001 patch in this series.
I've not looked at the subsequent ones.

regards, tom lane




Re: Consider parallel for lateral subqueries with limit

2022-03-01 Thread Tom Lane
James Coleman  writes:
> On Tue, Jan 4, 2022 at 9:59 PM James Coleman  wrote:
>> On Tue, Jan 4, 2022 at 5:31 PM Tom Lane  wrote:
>>> I don't really see why this patch is even a little bit safe.

> Suppose lateral is not in play. Then if we have a plan like:

> Gather
> NestLoop
> Scan X
> Limit
> Scan Y

> Because we have the result "X join Limit(Y)"  we need "Limit(Y)" to be
> consistent across all of the possible executions of "Limit(Y)" (i.e.,
> in each worker it executes in). That means (absent infrastructure for
> guaranteeing a unique ordering) we obviously can't parallelize the
> inner side of the join as the limit may be applied in different ways
> in each worker's execution.

> Now suppose lateral is in play. Then (given the same plan) instead of
> our result being "X join Limit(Y)" the result is "X join Limit(Y sub
> X)", that is, each row in X is joined to a unique invocation of
> "Limit(Y)".

This argument seems to be assuming that Y is laterally dependent on X,
but the patch as written will take *any* lateral dependency as a
get-out-of-jail-free card.  If what we have is "Limit(Y sub Z)" where
Z is somewhere else in the query tree, it's not apparent to me that
your argument holds.

But more generally, I don't think you've addressed the fundamental
concern, which is that a query involving Limit is potentially
nondeterministic (if it lacks a fully-deterministic ORDER BY),
so that different workers could get different answers from it if
they're using a plan type that permits that to happen.  (See the
original discussion that led to 75f9c4ca5, at [1].)  I do not see
how a lateral dependency removes that hazard.  The bug report that
started the original discussion hit the problem because it
generated a plan like

Gather
   ->  Hash Semi Join
 ->  Parallel Seq Scan
 ->  Hash
   ->  Limit
 ->  Seq Scan

We didn't make the submitter drill down far enough to verify
exactly why he got nondeterministic results from the Limit, but
I suppose the reason was that the table was big enough to trigger
"synchronize_seqscans" behavior, allowing different workers to
read different parts of that table.  Now, that particular case
didn't have any lateral dependency, but if there was one it'd
just have resulted in changing the hash join to a nestloop join,
and the nondeterminism hazard would be exactly the same AFAICS.

> In this case we are already conceivably getting different
> results for each execution of the subquery "Limit(Y)" even if we're
> not running those executions across multiple workers.

That seems to be about the same argument Andres made initially
in the old thread, but we soon shot that down as not being the
level of guarantee we want to provide.  There's nothing in the
SQL standard that says that
select * from events where account in
(select account from events
 where data->>'page' = 'success.html' limit 3);
(the original problem query) shall execute the sub-query
only once, but people expect it to act that way.

If you want to improve this area, my feeling is that it'd be
better to look into what was speculated about in the old
thread: LIMIT doesn't create nondeterminism if the query has
an ORDER BY that imposes a unique row ordering, ie
order-by-primary-key.  We didn't have planner infrastructure
that would allow checking that cheaply in 2018, but maybe
there is some now?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/153417684333.10284.11356259990921828616%40wrigleys.postgresql.org




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-01 Thread Jacob Champion
On Tue, 2022-03-01 at 19:56 +0100, Peter Eisentraut wrote:
> This patch contains no documentation.  I'm having a hard time 
> understanding what the name "session_authn_id" is supposed to convey. 
> The comment for the Port.authn_id field says this is the "system 
> username", which sounds like a clearer terminology.

"System username" may help from an internal development perspective,
especially as it relates to pg_ident.conf, but I don't think that's
likely to be a useful descriptor to an end user. (I don't think of a
client certificate's Subject Distinguished Name as a "system
username".) Does my attempt in v5 help?

--Jacob


Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-01 Thread Jacob Champion
On Tue, 2022-03-01 at 08:35 -0500, Stephen Frost wrote:
> * Michael Paquier (mich...@paquier.xyz) wrote:
> > 
> > Ayway, this function needs to be documented.  I think that you should
> > just add that in "Session Information Functions" in func.sgml, same
> > area as current_user().  The last time we talked about the authn ID,
> > one thing we discussed about was how to describe that in a good way to
> > the user, which is why the section of log_connections was reworked a
> > bit.  And we don't have yet any references to what an authenticated
> > identity is in the docs.
> 
> Agreed that it should be documented and that location seems reasonable
> to me.

Added a first draft in v5, alongside the perltidy fixups mentioned by
Michael.

> > There is no need to update catversion.h in the patch, committers
> > usually take care of that and that's an area of the code that
> > conflicts a lot.
> 
> Yeah, best to let committers handle catversion bumps.

Heh, that was added for my benefit -- I was tired of forgetting to
initdb after switching dev branches -- but I've dropped it from the
patch and will just carry that diff locally.

Thanks,
--Jacob
From fd6e8a5b09b7facbecc8e38ef1f8a3d2cef866d4 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 14 Feb 2022 08:10:53 -0800
Subject: [PATCH v5] Add API to retrieve authn_id from SQL

The authn_id field in MyProcPort is currently only accessible to the
backend itself.  Add a SQL function, pg_session_authn_id(), to expose
the field to triggers that may want to make use of it.
---
 doc/src/sgml/func.sgml| 26 +++
 src/backend/utils/adt/name.c  | 12 ++-
 src/include/catalog/pg_proc.dat   |  3 +++
 src/test/authentication/t/001_password.pl | 11 ++
 src/test/ssl/t/001_ssltests.pl|  7 ++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df3cd5987b..654b96e677 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22280,6 +22280,32 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);

   
 
+  
+   
+
+ pg_session_authn_id
+
+pg_session_authn_id ()
+text
+   
+   
+Returns the authenticated identity for the current connection, or
+NULL if the user has not been authenticated.
+   
+   
+The authenticated identity is an immutable identifier for the user
+presented during the connection handshake; the exact format depends on
+the authentication method in use. (For example, when using the
+scram-sha-256 auth method, the authenticated identity
+is simply the username. When using the cert auth
+method, the authenticated identity is the Distinguished Name of the
+client certificate.) Even for auth methods which use the username as
+the authenticated identity, this function differs from
+session_user in that its return value cannot be
+changed after login.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index e8bba3670c..662a7943ed 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "libpq/libpq-be.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -257,7 +258,7 @@ namestrcmp(Name name, const char *str)
 
 
 /*
- * SQL-functions CURRENT_USER, SESSION_USER
+ * SQL-functions CURRENT_USER, SESSION_USER, PG_SESSION_AUTHN_ID
  */
 Datum
 current_user(PG_FUNCTION_ARGS)
@@ -271,6 +272,15 @@ session_user(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(DirectFunctionCall1(namein, CStringGetDatum(GetUserNameFromId(GetSessionUserId(), false;
 }
 
+Datum
+pg_session_authn_id(PG_FUNCTION_ARGS)
+{
+	if (!MyProcPort || !MyProcPort->authn_id)
+		PG_RETURN_NULL();
+
+	PG_RETURN_TEXT_P(cstring_to_text(MyProcPort->authn_id));
+}
+
 
 /*
  * SQL-functions CURRENT_SCHEMA, CURRENT_SCHEMAS
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index bf88858171..3afd171224 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1508,6 +1508,9 @@
 { oid => '746', descr => 'session user name',
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
+{ oid => '9774', descr => 'session authenticated identity',
+  proname => 'pg_session_authn_id', provolatile => 's', proparallel => 'r',
+  prorettype => 'text', proargtypes => '', prosrc => 'pg_session_authn_id' },
 
 { oid => '744',
   proname => 'array_eq', prorettype => 'bool',
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 3e3079c824..f0bdeda52d 100644
--- 

Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-03-01 Thread Andres Freund
Hi,

On 2022-03-02 06:46:23 +1300, Thomas Munro wrote:
> > I do think it's worth giving that sleep a proper wait event though, even in 
> > the back branches.
> 
> I'm thinking that 0002 should be back-patched all the way, but 0001
> could be limited to 14.

No strong opinion on back to where to backpatch. It'd be nice to have a proper
wait event everywhere, but especially < 12 it looks different enough to be
some effort.


> From a9344bb2fb2a363bec4be526f87560cb212ca10b Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Mon, 28 Feb 2022 11:27:05 +1300
> Subject: [PATCH v2 1/3] Wake up for latches in CheckpointWriteDelay().

LGTM. Would be nice to have this fixed soon, even if it's just to reduce test
times :)



> From 1eb0266fed7ccb63a2430e4fbbaef2300f2aa0d0 Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Tue, 1 Mar 2022 11:38:27 +1300
> Subject: [PATCH v2 2/3] Fix waiting in RegisterSyncRequest().

LGTM.


> From 50060e5a0ed66762680ddee9e30acbad905c6e98 Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Tue, 1 Mar 2022 17:34:43 +1300
> Subject: [PATCH v2 3/3] Use condition variable to wait when sync request queue
>  is full.
> 
> Previously, in the (hopefully) rare case that we need to wait for the
> checkpointer to create space in the sync request queue, we'd enter a
> sleep/retry loop.  Instead, create a condition variable so the
> checkpointer can wake us up whenever there is a transition from 'full'
> to 'not full'.


> @@ -1076,10 +1078,11 @@ RequestCheckpoint(int flags)
>   * to perform its own fsync, which is far more expensive in practice.  It
>   * is theoretically possible a backend fsync might still be necessary, if
>   * the queue is full and contains no duplicate entries.  In that case, we
> - * let the backend know by returning false.
> + * let the backend know by returning false, or if 'wait' is true, then we
> + * wait for space to become available.
>   */
>  bool
> -ForwardSyncRequest(const FileTag *ftag, SyncRequestType type)
> +ForwardSyncRequest(const FileTag *ftag, SyncRequestType type, bool wait)
>  {
>   CheckpointerRequest *request;
>   booltoo_full;
> @@ -1101,9 +1104,9 @@ ForwardSyncRequest(const FileTag *ftag, SyncRequestType 
> type)
>* backend will have to perform its own fsync request.  But before 
> forcing
>* that to happen, we can try to compact the request queue.
>*/
> - if (CheckpointerShmem->checkpointer_pid == 0 ||
> - (CheckpointerShmem->num_requests >= 
> CheckpointerShmem->max_requests &&
> -  !CompactCheckpointerRequestQueue()))
> + if (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests 
> &&
> + !CompactCheckpointerRequestQueue() &&
> + !wait)

Bit confused about the addition of the wait parameter / removal of the
CheckpointerShmem->checkpointer_pid check. What's that about?


> + /*
> +  * If we still don't have enough space and the caller asked us to wait,
> +  * wait for the checkpointer to advertise that there is space.
> +  */
> + if (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests)
> + {
> + 
> ConditionVariablePrepareToSleep(>requests_not_full_cv);
> + while (CheckpointerShmem->num_requests >=
> +CheckpointerShmem->max_requests)
> + {
> + LWLockRelease(CheckpointerCommLock);
> + 
> ConditionVariableSleep(>requests_not_full_cv,
> +
> WAIT_EVENT_FORWARD_SYNC_REQUEST);
> + LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE);
> + }
> + ConditionVariableCancelSleep();
> + }

Could there be a problem with a lot of backends trying to acquire
CheckpointerCommLock in exclusive mode? I'm inclined to think it's rare enough
to not worry.

Greetings,

Andres Freund




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-03-01 Thread Robert Haas
On Mon, Feb 21, 2022 at 2:31 AM Andy Fan  wrote:
> +1.  Just to be more precise,  are you also confused about why this
> should not be done at all.  IIUC, I get 3 reasons from Tom's reply.
> a). Planning cost. b). estimation error.  c)  extra qual execution is bad.

This topic has been discussed a number of times, and Tom has basically
always said that he thinks this would be expensive to plan (which I
think is true) and that we wouldn't get much benefit (which I think is
false).

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-01 Thread Robert Haas
On Sun, Feb 20, 2022 at 3:27 PM Peter Geoghegan  wrote:
> > I think that the idea has potential, but I don't think that I
> > understand yet what the *exact* algorithm is.
>
> The algorithm seems to exploit a natural tendency that Andres once
> described in a blog post about his snapshot scalability work [1]. To a
> surprising extent, we can usefully bucket all tuples/pages into two
> simple categories:
>
> 1. Very, very old ("infinitely old" for all practical purposes).
>
> 2. Very very new.
>
> There doesn't seem to be much need for a third "in-between" category
> in practice. This seems to be at least approximately true all of the
> time.
>
> Perhaps Andres wouldn't agree with this very general statement -- he
> actually said something more specific. I for one believe that the
> point he made generalizes surprisingly well, though. I have my own
> theories about why this appears to be true. (Executive summary: power
> laws are weird, and it seems as if the sparsity-of-effects principle
> makes it easy to bucket things at the highest level, in a way that
> generalizes well across disparate workloads.)

I think that this is not really a description of an algorithm -- and I
think that it is far from clear that the third "in-between" category
does not need to exist.

> Remember when I got excited about how my big TPC-C benchmark run
> showed a predictable, tick/tock style pattern across VACUUM operations
> against the order and order lines table [2]? It seemed very
> significant to me that the OldestXmin of VACUUM operation n
> consistently went on to become the new relfrozenxid for the same table
> in VACUUM operation n + 1. It wasn't exactly the same XID, but very
> close to it (within the range of noise). This pattern was clearly
> present, even though VACUUM operation n + 1 might happen as long as 4
> or 5 hours after VACUUM operation n (this was a big table).

I think findings like this are very unconvincing. TPC-C (or any
benchmark really) is so simple as to be a terrible proxy for what
vacuuming is going to look like on real-world systems. Like, it's nice
that it works, and it shows that something's working, but it doesn't
demonstrate that the patch is making the right trade-offs overall.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SQL/JSON: functions

2022-03-01 Thread Andrew Dunstan


On 2/1/22 14:11,I wrote:
>
> 2. The new GUC "sql_json" is a bit of a worry. I understand what it's
> trying to do, but I'm trying to convince myself it's not going to be a
> fruitful source of error reports, especially if people switch it in the
> middle of a session. Maybe it should be an initdb option instead of a GUC?
>
>


So far my efforts have not borne fruit. Here's why:


andrew=# set sql_json = jsonb;
SET
andrew=# create table abc (x text, y json);
CREATE TABLE
andrew=# \d abc
   Table "public.abc"
 Column | Type | Collation | Nullable | Default
+--+---+--+-
 x  | text |   |  |
 y  | json |   |  |

andrew=# insert into abc values ('a','{"q":1}');
INSERT 0 1
andrew=# select json_each(y) from abc;
ERROR:  function json_each(json) does not exist
LINE 1: select json_each(y) from abc;
   ^
HINT:  No function matches the given name and argument types. You might
need to add explicit type casts.
andrew=# select jsonb_each(y) from abc;
 jsonb_each

 (q,1)
(1 row)


The description tells them the column is json, but the json_* functions
don't work on the column and you need to use the jsonb functions. That
seems to me a recipe for major confusion. It might be better if we set
it at initdb time so it couldn't be changed, but even so it could be
horribly confusing.

This is certainly severable from the rest of these patches. I'm not sure
how severable it is from the SQL/JSON Table patches.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Proposal: Support custom authentication methods using hooks

2022-03-01 Thread Andres Freund
Hi,

On 2022-02-25 13:40:54 -0500, Jonathan S. Katz wrote:
> On 2/25/22 12:39 PM, Tom Lane wrote:
> > My point is that sending cleartext passwords over the wire is an
> > insecure-by-definition protocol that we shouldn't be encouraging
> > more use of.
> 
> This is my general feeling as well. We just spent a bunch of effort adding,
> refining, and making SCRAM the default method. I think doing anything that
> would drive more use of sending plaintext passwords, even over TLS, is
> counter to that.

I want to again emphasize that, as proposed, a custom auth method can use
SCRAM if relevant for it, with a small amount of code. So the whole plaintext
discussion seems independent.

Samay, what do you think about updating the test plugin to do SCRAM instead of
plaintext, just to highlight that fact?

Greetings,

Andres Freund




Re: Proposal: Support custom authentication methods using hooks

2022-03-01 Thread Andres Freund
Hi,

On 2022-02-28 11:26:06 +0100, Peter Eisentraut wrote:
> We already have a variety of authentication mechanisms that support central
> management: LDAP, PAM, Kerberos, Radius.

LDAP, PAM and Radius all require cleartext passwords, so aren't a great
solution based on the concerns voiced in this thread. IME Kerberos is
operationally too complicated to really be used, unless it's already part of
the operating environment.


> What other mechanisms are people thinking about implementing using these
> hooks?

The cases I've heard about are about centralizing auth across multiple cloud
services. Including secret management in some form. E.g. allowing an
application to auth to postgres, redis and having the secret provided by
infrastructure, rather than having to hardcode it in config or such.

I can't see application developers configuring kerberos and I don't think
LDAP, PAM, Radius are a great answer either, due to the plaintext requirement
alone? LDAP is pretty clearly dying technology, PAM is fragile complicated C
stuff that's not portable across OSs. Radius is probably the most realistic,
but at least as implemented doesn't seem flexible enough (e.g. no access to
group memberships etc).

Nor does baking stuff like that in seem realistic to me, it'll presumably be
too cloud provider specific.

Greetings,

Andres Freund




Re: range_agg with multirange inputs

2022-03-01 Thread Chapman Flack
On 02/28/22 23:31, Paul Jungwirth wrote:
> On 2/26/22 17:13, Chapman Flack wrote:
>> (I think generating
>> the patch with 4 lines of context would be enough to keep that from being
>> a recurring issue.)
> 
> Thank you for the review and the tip re 4 lines of context! Rebase attached.

I think the 4 lines should suffice, but it looks like this patch was
generated from a rebase of the old one (with three lines) that ended up
putting the new 'range_agg' entry ahead of 'max' in func.sgml, which
position is now baked into the 4 lines of context. :)

So I think it needs a bit of manual attention to get the additions back
in the right places, and then a 4-context-lines patch generated from that.

> I changed the message to "range_agg must be called
> with a range or multirange". How does that seem?

That works for me.

>> I kind of wonder whether either message is really reachable, at least
>> through the aggregate machinery in the expected way. Won't that machinery
>> ensure that it is calling the right transfn with the right type of
>> argument? If that's the case, maybe the messages could only be seen
>> by someone calling the transfn directly ... which also seems ruled out
>> for these transfns because the state type is internal. Is this whole test
>> more of the nature of an assertion?
> 
> I don't think they are reachable, so perhaps they are more like asserts. Do
> you think I should change it? It seems like a worthwhile check in any case.

I would not change them to actual Assert, which would blow up the whole
process on failure. If it's a genuine "not expected to happen" case,
maybe changing it to elog (or ereport with errmsg_internal) would save
a little workload for translators. But as you were copying an existing
ereport with a translatable message, there's also an argument for sticking
to that style, and maybe mentioning the question to an eventual committer
who might have a stronger opinion.

I did a small double-take seeing the C range_agg_finalfn being shared
by the SQL range_agg_finalfn and multirange_agg_finalfn. I infer that
the reason it works is get_fn_expr_rettype works equally well with
either parameter type.

Do you think it would be worth adding a comment at the C function
explaining that? In a quick query I just did, I found no other aggregate
final functions sharing a C function that way, so this could be the first.

Regards,
-Chap




Re: Commitfest 2022-03 Patch Triage Part 1b

2022-03-01 Thread Tom Lane
Greg Stark  writes:
>> 2433: Erase the distinctClause if the result is unique by definition
>> 
>> (parts of) The approach taken in this patch has been objected against in 
>> favor
>> of work that Tom has proposed.  Until that work materialize this patch is
>> blocked, and thus I think we are better of closing it and re-opening it when 
>> it
>> gets unstuck.  Unless Tom has plans to hack on this shortly?

> Ugh. This is a problematic dynamic. Tom has a different idea of what
> direction to take this but hasn't had a chance to work on it. So
> what's Andy Fan supposed to do here? He can't read Tom's mind and
> nobody else can really help him. Ultimately we all have limited time
> so this is a thing that will happen but is there anything we can do to
> resolve it in this case?

> We definitely shouldn't spend lots of time on this patch unless we're
> going to be ok going ahead without Tom's version of it. Is this
> something we can do using the Andy's data structure for now and change
> in the future?

> It looks like the Skip Scan patch was related to this work in some
> way? Is it blocked on it?

I did promise some time ago to get involved in the skip scan work.
I've so far failed to make good on that promise, but I will make
it a high priority to look at the area during this CF.

regards, tom lane




Re: Allow async standbys wait for sync replication

2022-03-01 Thread Nathan Bossart
On Tue, Mar 01, 2022 at 11:09:57PM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 1, 2022 at 10:35 PM Nathan Bossart  
> wrote:
>> Yes, perhaps the synchronous replication framework will need to alert WAL
>> senders when the syncrep LSN advances so that the WAL is sent to the async
>> standbys.  I'm glossing over the details, but I think that should be the
>> general direction.
> 
> It's doable. But we can't avoid async walsenders waiting for the flush
> LSN even if we take the SyncRepReleaseWaiters() approach right? I'm
> not sure (at this moment) what's the biggest advantage of this
> approach i.e. (1) backends waking up walsenders after flush lsn is
> updated vs (2) walsenders keep looking for the new flush lsn.

I think there are a couple of advantages.  For one, spinning is probably
not the best from a resource perspective.  There is no guarantee that the
desired SendRqstPtr will ever be synchronously replicated, in which case
the WAL sender would spin forever.  Also, this approach might fit in better
with the existing synchronous replication framework.  When a WAL sender
realizes that it can't send up to the current "flush" LSN because it's not
synchronously replicated, it will request to be alerted when it is.  In the
meantime, it can send up to the latest syncrep LSN so that the async
standby is as up-to-date as possible.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Proposal: Support custom authentication methods using hooks

2022-03-01 Thread Jonathan S. Katz

On 2/28/22 5:26 AM, Peter Eisentraut wrote:

On 17.02.22 20:25, samay sharma wrote:
A use case where this is useful are environments where you want 
authentication to be centrally managed across different services. This 
is a common deployment model for cloud providers where customers like 
to use single sign on and authenticate across different services 
including Postgres. Implementing this now is tricky as it requires 
syncing that authentication method's credentials with Postgres (and 
that gets trickier with TTL/expiry etc.). With these hooks, you can 
implement an extension to check credentials directly using the 
authentication provider's APIs.


We already have a variety of authentication mechanisms that support 
central management: LDAP, PAM, Kerberos, Radius.  What other mechanisms 
are people thinking about implementing using these hooks?  Maybe there 
are a bunch of them, in which case a hook system might be sensible, but 
if there are only one or two plausible ones, we could also just make 
them built in.


OIDC is the big one that comes to mind.

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-01 Thread Jonathan S. Katz

On 3/1/22 8:31 AM, Stephen Frost wrote:

Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:

On Mon, Feb 28, 2022 at 04:42:55PM -0500, Stephen Frost wrote:

Keeping it around will just push out the point at which everyone will
finally be done with it, as there's really only two groups: those who
have already moved to scram, and those who won't move until they want to
upgrade to a release that doesn't have md5.


FWIW, I am not sure if we are at this point yet.  An extra reason to
remove it would be that it is a support burden, but I don't have seen
in recent memory any problems related to it that required any deep
changes in the way to use it, and its code paths are independent.


Ongoing reports that there's a known vulnerability aren't great to have
to deal with.  We can at least point people to scram but that's not
great.


The last time I played with this area is the recent error handling
improvement with cryptohashes but MD5 has actually helped here in
detecting the problem as a patched OpenSSL would complain if trying to
use MD5 as hash function when FIPS is enabled.


Having to continue to deal with md5 as an algorithm when it's known to
be notably less secure and so much so that organizations essentially ban
its use for exactly what we're using it for, in fact, another reason to
remove it, not a reason to keep it.  Better code coverage testing of
error paths is the answer to making sure that our error handling behaves
properly.


So, I generally agree with Stephen and his arguments for dropping the 
md5 mechanism.


If you're moving to a newer version of PostgreSQL, you likely have to 
update your connection drivers anyway (rebuilt against new libpq, 
supporting any changes in the protocol, etc). I would prefer more data 
to support that argument, but this is generally what you need to do.


However, we may need to step towards it. We took one step last release 
with defaulting to SCRAM. Perhaps this release we add a warning for 
anything using md5 auth that "this will be removed in a future release." 
(or specifically v16). We should also indicate in the docs that md5 is 
deprecated and will be removed.


We also need to think about upgrading: what happens if I try to upgrade 
and I still have user passwords stored in a md5 hash? What if all my 
password-based users have a md5 hash, does pg_upgrade fail?


As much as I want md5 gone, I think v16 is the earliest we can do it, 
but we can lay the groundwork for it now.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Proposal: Support custom authentication methods using hooks

2022-03-01 Thread Nathan Bossart
On Mon, Feb 28, 2022 at 03:46:34PM -0500, Stephen Frost wrote:
> md5 should be removed.

+1

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Commitfest 2022-03 Patch Triage Part 1b

2022-03-01 Thread Greg Stark
> 2096: psql - add SHOW_ALL_RESULTS option
> 
> Peter posted an updated version of Fabiens patch about a month ago (which at
> this point no longer applies) fixing a few issues, but also point at old 
> review
> comments still unaddressed.  Since this was pushed, but had to be reverted, I
> assume there is a desire for the feature but it seems to need more work still.

It looks like Peter and Fabien were debating the merits of a libpq
change and probably that won't happen this release cycle. Is there a
kernel of psql functionality that can be extracted from this without
the libpq change in this release cycle or should it wait until we add
the functionality to libpq?

If it's the latter then perhaps we should move this to 16?


> 1651: GROUP BY optimization
> ===
> This is IMO a desired optimization, and after all the heavy lifting by Tomas
> the patch seems to be in pretty good shape.

This is two patches and it sounds like the first one is ready for
committer whereas the second one is less clear. Or is the second one
meant to be an alternative for the first one?

>
> 2377: pg_ls_* functions for showing metadata and recurse (pg_ls_tmpdir to show
> shared filesets)
> ==
> The question of what to do with lstat() on Windows is still left unanswered,
> but the patchset has been split to up to be able to avoid it.  Stephen and 
> Tom,
> having done prior reviews do you have any thoughts on this?

Is this still blocked on lstat for windows? I couldn't tell, is there
consensus on a behaviour for windows even if that just means failing
or returning partial results on windows?

Other than that it seems like there's a lot of this patch that has
positive reviews and is ready for committing.


> 2349: global temporary table
> 
> GTT has been up for discussion numerous times in tbe past, and I can't judge
> whether this proposal has a better chance than previous ones.  I do note the
> patch has a number of crashes reported lately, and no reviews from senior
> contributors in a while, making it seem unlikely to be committed in this CF.
> Since the patch is very big, can it be teased apart and committed separately
> for easier review?

I think Andres's review decisively makes it clear this in an
uncommittable state.

https://www.postgresql.org/message-id/20220225074500.sfizxbmlrj2s6hp5%40alap3.anarazel.de
https://www.postgresql.org/message-id/20220227041304.mnimeqkhwktrjyht%40alap3.anarazel.de

It's definitely not going to make it this release and will probably
need a significant amount of time next release cycle. IMHO dividing it
up into smaller features does seem like it would be more effective at
getting things committed.

Should we mark this returned with feedback or just move it to the next
commitfest as waiting on author?


> 2433: Erase the distinctClause if the result is unique by definition
> 
> (parts of) The approach taken in this patch has been objected against in favor
> of work that Tom has proposed.  Until that work materialize this patch is
> blocked, and thus I think we are better of closing it and re-opening it when 
> it
> gets unstuck.  Unless Tom has plans to hack on this shortly?

Ugh. This is a problematic dynamic. Tom has a different idea of what
direction to take this but hasn't had a chance to work on it. So
what's Andy Fan supposed to do here? He can't read Tom's mind and
nobody else can really help him. Ultimately we all have limited time
so this is a thing that will happen but is there anything we can do to
resolve it in this case?

We definitely shouldn't spend lots of time on this patch unless we're
going to be ok going ahead without Tom's version of it. Is this
something we can do using the Andy's data structure for now and change
in the future?

It looks like the Skip Scan patch was related to this work in some
way? Is it blocked on it?

-- 
greg




Re: Add id's to various elements in protocol.sgml

2022-03-01 Thread Chapman Flack
On 03/01/22 14:50, Brar Piening wrote:
> TBH I don't like the visual representation of the unicode link symbol
> (U+1F517) in my browser. It's a bold black fat thing that doesn't
> inherit colors. I've tried to soften it by decreasing the size but that
> doesn't really solve it for me. Font support also doesn't seem
> overwhelming.

That sounds like it's probably in less wide use than I thought, and if the
font support is spotty, that seems like a good enough reason not to go
there. I've no objection to the # symbol. Maybe this should really get
a comment from someone more actively involved in styling the web site.

>> As long as we stick to manually assigned ids in the same way my patch
>> does it, it shouldn't be too hard.
> 
> Patch is attached. I don't think it should get applied this way, though.
> The fact that you only get links for section headers that have manually
> assigned ids would be pretty surprising for users of the docs and in
> some files (e.g. protocol-flow.html) only every other section has a
> manually assigned id. It would be easy to emit a message (or even fail)
> whenever the template fails to find an id and then manually assign ids
> until they are everywhere (currently that means all varlistentries and
> sections) but that would a) be quite some work and b) make the patch
> quite heavy, so I wouldn't even start this before there is really
> consensus that this is the right direction.

This sounds like a bigger deal, and I wonder if it is big enough to merit
splitting the patch, so the added ids can go into protocol.sgml promptly
(and not be any harder to find than any of our fragment ids currently are),
and "improve html docs to expose fragment ids" can get more thought.

As long as we haven't assigned ids to all sections, I could almost think
of the surprising behavior as a feature, distinguishing the links you can
reasonably bet on being stable from the ones you can't. (Maybe the latter
should have their own symbol! 1F3B2?) But you're probably right that it
would seem surprising and arbitrary. And I don't know how much enthusiasm
there will be for assigning ids everywhere.

Regards,
-Chap




Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-01 Thread Dmitry Dolgov
> On Tue, Mar 01, 2022 at 11:16:36AM -0500, Greg Stark wrote:
> Last November Daniel Gustafsson  did a patch triage. It took him three
> emails to get through the patches in the commitfest back then. Since
> then we've had the November and the January commitfests so I was
> interested to see how many of these patches had advanced
>
> I'm only part way through the first email but so far only two patches
> have changed status -- and both to "Returned with feedback" :(
>
> So I'm going to post updates but I'm going to break it up into smaller
> batches because otherwise it'll take me a month before I post
> anything.

Thanks for being proactive!

> > 1741: Index Skip Scan
> > =
> > An often requested feature which has proven hard to reach consensus on an
> > implementation for.  The thread(s) have stalled since May, is there any 
> > hope of
> > taking this further?  Where do we go from here with this patch?
>
> "Often requested indeed! I would love to be able to stop explaining to
> people that Postgres can't handle this case well.
>
> It seems there are multiple patch variants around and suggestions from
> Heikki and Peter G about fundamental interface choices. It would be
> nice to have a good summary from someone who is involved about what's
> actually left unresolved.

I'm going to leave a summary for this one here, if you don't mind.

I believe the design commentary from Heikki about using index_rescan was
more or less answered by Thomas, and having no follow up on that I'm
assuming it was convincing enough.

Peter G most recent suggestion about MDAM approach was interesting, but
very general, not sure what to make of it in absence of any feedback on
follow-up questions/proposed experimental changes.

On top of that a correlated patch [1] that supposed to get some
improvements for this feature on the planner side didn't get much
feedback either. The idea is that the feature could be done in much
better way, but the alternative proposal is still not there and I think
doesn't even have a CF item.

The current state of things is that I've managed to prepare much smaller
and less invasive standalone version of the patch for review, leaving
most questionable parts aside as optional.

Overall it seems that the common agreement about the patch is "the
design could be better", but no one have yet articulated in which way,
or formulated what are the current issues. Having being through 19 CF
the common ground for folks, who were involved into it, is that with no
further feedback the CF item could be closed. Sad but true :(

[1]: https://commitfest.postgresql.org/37/2433/




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Stephen Frost
Greetings,

* Chapman Flack (c...@anastigmatix.net) wrote:
> On 03/01/22 14:14, Stephen Frost wrote:
> >> There can't really be many teams out there thinking "we'll just ignore
> >> these scripts forever, and nothing bad will happen." They all know they'll
> >> have to do stuff sometimes. But it matters how we allow them to schedule 
> >> it.
> > 
> > We only make these changes between major versions.  That's as much as we
> > should be required to provide.
> 
> It's an OSS project, so I guess we're not required to provide anything.
> 
> But in the course of this multi-release exclusive to non-exclusive
> transition, we already demonstrated, in 7117685, that we can avoid
> inflicting immediate breakage when there's nothing in our objective
> that inherently requires it, and avoiding it is relatively easy.
> 
> I can't bring myself to think that was a bad precedent.

It's actively bad because we are ridiculously inconsistent when it comes
to these things and we're terrible about ever removing anything once
it's gotten into the tree as 'deprecated'.  Witness that it's 8 years
since 7117685 and we still have these old and clearly broken APIs
around.  We absolutely need to move *away* from this approach, exactly
how 2dedf4d9, much more recently than 7117685, for all of its other
flaws, did.

> So if I'm outvoted here and the reason is "look, a lighter burden is
> involved this time than that time", then ok. I would rather bow to that
> argument on the specific facts of one case than abandon the precedent
> from 7117685 generally.

It's far from precedent- if anything, it's quite the opposite from how
most changes around here are made, and much more recent commits in the
same area clearly tossed out entirely the idea of trying to maintain
some kind of backwards compatibility with existing scripts.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: CREATEROLE and role ownership hierarchies

2022-03-01 Thread Robert Haas
On Mon, Feb 28, 2022 at 2:09 PM Stephen Frost  wrote:
> I'm generally in support of changing CREATEROLE to only allow roles that
> the role with CREATEROLE is an admin of to be allowed as part of the
> command (throwing an error in other cases).  That doesn't solve other
> use-cases which would certainly be nice to solve but it would at least
> reduce the shock people have when they discover how CREATEROLE actually
> works (that is, the way we document it to work, but that's ultimately
> not what people expect).

So I'm 100% good with that because it does exactly what I want, but my
understanding of the situation is that it breaks the userbot case that
Joshua is talking about. Right now, with stock PostgreSQL in any
released version, his userbot can have CREATEROLE and give out roles
that it doesn't itself possess. If we restrict CREATEROLE in the way
described here, that's no longer possible.

Now it's possible that you and/or he would take the position that
we're still coming out ahead despite that functional regression,
because as I now understand after reading Joshua's latest email, he
doesn't want the userbot to be able to grant ANY role, just the
'employees' role - and today he can't get that. So in a modified
universe where we restrict the privileges of CREATEROLE, then on the
one hand he GAINS the ability to have a userbot that can grant some
roles but not others, but on the other hand, he's forced to give the
userbot the roles he wants it to be able to hand out. Is that better
overall or worse?

To really give him EXACTLY what he wants, we need a way of specifying
administration without membership. See my last reply to the thread for
my concerns about that.

> I don't think it's a great plan to limit who is allowed to DROP roles to
> be just those that a given role created.  I also don't like the idea of
> introducing a single field for owner/manager/tenant/whatever to the role
> system- instead we should add other ways that roles can be associated to
> each other by extending the existing system that we have for that, which
> is role membership.  Role membership today is pretty limited but I don't
> see any reason why we couldn't improve on that in a way that's flexible
> and allows us to define new associations in the future.  The biggest
> difference between a 'tenant' or such as proposed vs. a role association
> is in where the information is tracked and what exactly it means.
> Saying "I want a owner" or such is easy because it's basically punting
> on the complciated bit of asking the question: what does that *mean*
> when it comes to what rights that includes vs. doesn't?  What if I only
> want some of those rights to be given away but not all of them?  We have
> that system for tables/schemas/etc, and it hasn't been great as we've
> seen through the various requests to add things like GRANT TRUNCATE.

Well, there's no accounting for taste, but I guess I see this pretty
much opposite to the way you do. I think GRANT TRUNCATE is nice and
simple and clear. It does one thing and it's easy to understand what
that thing is, and it has very few surprising or undocumented side
effects. On the other hand, role membership is a mess, and it's not at
all clear how to sort that mess out. I guess I agree with you that it
would be nice if it could be done, but the list of problems is pretty
substantial. Like, membership implies the right to SET ROLE, and also
the right to implicitly exercise the privileges of the role, and
you've complained about that fuzziness. And ADMIN OPTION implies
membership, and you don't like that either. And elsewhere it's been
raised that nobody would expect to have a table end up owned by
'pg_execute_server_programs', or a user logged in directly as
'employees' rather than as some particular employee, but all that
stuff can happen, and some of it can't even be effectively prevented
with good configuration. 'toe' can be a member of 'foot' while, which
makes sense to everybody, and at the same time, 'foot' can be a member
of 'toe', which doesn't make any sense at all. And because both
directions are possible even experienced PostgreSQL users and hackers
get confused, as demonstrated by Joshua's having just got the
revoke-from-role case backwards.

Of those four problems, the last two are clearly the result of
conflating users with groups - and really also with capabilities - and
having a unified role concept that encompasses all of those things. I
think we would be better off if we had not done that, both in the
sense that I think the system would be less confusing to understand,
and also in the sense that we would likely have fewer security bugs.
And similarly I agree with you that it would be better if the right to
administer a role were clearly separated from membership in a role,
and if the right to use the privileges of a role were separated from
the ability to SET ROLE to it. However, unlike you, I see the whole
'role membership' concept as the problem, not the solution. 

Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-01 Thread Greg Stark
On Tue, 1 Mar 2022 at 14:59, Greg Stark  wrote:
>
> As Justin suggested I CC the authors from these patches I'm adding
> them here. Some of the patches have multiple "Authors" listed in the
> commitfest which may just be people who posted updated patches so I
> may have added more people than necessary.

> [If you received two copies of this do not reply to the first one or
> you'll get bounces. Hopefully I've cleaned the list now but we'll
> see...]

Nope. One more address to remove from the CC.


-- 
greg




Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-01 Thread Greg Stark
As Justin suggested I CC the authors from these patches I'm adding
them here. Some of the patches have multiple "Authors" listed in the
commitfest which may just be people who posted updated patches so I
may have added more people than necessary.
[If you received two copies of this do not reply to the first one or
you'll get bounces. Hopefully I've cleaned the list now but we'll
see...]




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Chapman Flack
On 03/01/22 14:14, Stephen Frost wrote:
>> There can't really be many teams out there thinking "we'll just ignore
>> these scripts forever, and nothing bad will happen." They all know they'll
>> have to do stuff sometimes. But it matters how we allow them to schedule it.
> 
> We only make these changes between major versions.  That's as much as we
> should be required to provide.

It's an OSS project, so I guess we're not required to provide anything.

But in the course of this multi-release exclusive to non-exclusive
transition, we already demonstrated, in 7117685, that we can avoid
inflicting immediate breakage when there's nothing in our objective
that inherently requires it, and avoiding it is relatively easy.

I can't bring myself to think that was a bad precedent.

Now, granted, the difference between the adaptations being required then
and the ones required now is that those required both: changes to some
function calls, and corresponding changes to how the scripts handled
label and tablespace files. Here, it's only a clerical update to some
function calls.

So if I'm outvoted here and the reason is "look, a lighter burden is
involved this time than that time", then ok. I would rather bow to that
argument on the specific facts of one case than abandon the precedent
from 7117685 generally.

Regards,
-Chap




Re: Add id's to various elements in protocol.sgml

2022-03-01 Thread Brar Piening

On Mar 01, 2022 at 18:27, Brar Piening wrote:

On Feb 28, 2022 at 21:06, Chapman Flack wrote:

I think that in other recent examples I've seen, there might be
(something like a) consensus forming around the Unicode LINK SYMBOL
 rather than # as the symbol for such things.


I intentionally opted for an ASCII character as that definitely won't
cause any display/font/portability issues but changing that is no
problem.


TBH I don't like the visual representation of the unicode link symbol
(U+1F517) in my browser. It's a bold black fat thing that doesn't
inherit colors. I've tried to soften it by decreasing the size but that
doesn't really solve it for me. Font support also doesn't seem
overwhelming. Anyway, I've changed my patch to use it so that you can
judge it yourself.


... and now that the concept is proven, how hard would it be to broaden
that template's pattern to apply to all the other DocBook constructs
(such as section headings) that emit anchors?


As long as we stick to manually assigned ids in the same way my patch
does it, it shouldn't be too hard.


Patch is attached. I don't think it should get applied this way, though.
The fact that you only get links for section headers that have manually
assigned ids would be pretty surprising for users of the docs and in
some files (e.g. protocol-flow.html) only every other section has a
manually assigned id. It would be easy to emit a message (or even fail)
whenever the template fails to find an id and then manually assign ids
until they are everywhere (currently that means all varlistentries and
sections) but that would a) be quite some work and b) make the patch
quite heavy, so I wouldn't even start this before there is really
consensus that this is the right direction.
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 1c5ab00879..cb138b53ad 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1810,7 +1810,7 @@ Replication commands are logged in the server log when
 
 The commands accepted in replication mode are:
 
-  
+  
 IDENTIFY_SYSTEM
  IDENTIFY_SYSTEM
 
@@ -1875,7 +1875,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 SHOW name
  SHOW
 
@@ -1899,7 +1899,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 TIMELINE_HISTORY tli
  TIMELINE_HISTORY
 
@@ -2084,7 +2084,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { 
PHYSICAL [ RESERVE_WAL ] | 
LOGICAL output_plugin [ 
EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT | 
USE_SNAPSHOT | TWO_PHASE ] }
 
 
@@ -2095,7 +2095,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 READ_REPLICATION_SLOT slot_name
   READ_REPLICATION_SLOT
 
@@ -2143,7 +2143,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 START_REPLICATION [ SLOT 
slot_name ] [ 
PHYSICAL ] XXX/XXX [ TIMELINE 
tli ]
  START_REPLICATION
 
@@ -2201,7 +2201,7 @@ The commands accepted in replication mode are:
 
  
   
-  
+  
   
   XLogData (B)
   
@@ -2270,7 +2270,7 @@ The commands accepted in replication mode are:
   
   
   
-  
+  
   
   Primary keepalive message (B)
   
@@ -2334,7 +2334,7 @@ The commands accepted in replication mode are:
 
  
   
-  
+  
   
   Standby status update (F)
   
@@ -2415,7 +2415,7 @@ The commands accepted in replication mode are:
 
  
   
-  
+  
   
   Hot Standby feedback message (F)
   
@@ -2497,7 +2497,7 @@ The commands accepted in replication mode are:
  
 
   
-  
+  
 START_REPLICATION SLOT 
slot_name 
LOGICAL XXX/XXX 
[ ( option_name [ 
option_value ] [, ...] ) ]
 
  
@@ -2572,7 +2572,7 @@ The commands accepted in replication mode are:
 
   
 
-  
+  
 
  DROP_REPLICATION_SLOT slot_name  WAIT 

  DROP_REPLICATION_SLOT
@@ -3266,7 +3266,7 @@ of any individual CopyData message cannot be 
interpretable on their own.)
 
 
 
-
+
 
 AuthenticationOk (B)
 
@@ -3311,7 +3311,7 @@ AuthenticationOk (B)
 
 
 
-
+
 
 AuthenticationKerberosV5 (B)
 
@@ -3355,7 +3355,7 @@ AuthenticationKerberosV5 (B)
 
 
 
-
+
 
 AuthenticationCleartextPassword (B)
 
@@ -3399,7 +3399,7 @@ AuthenticationCleartextPassword (B)
 
 
 
-
+
 
 AuthenticationMD5Password (B)
 
@@ -3454,7 +3454,7 @@ AuthenticationMD5Password (B)
 
 
 
-
+
 
 AuthenticationSCMCredential (B)
 
@@ -3499,7 +3499,7 @@ AuthenticationSCMCredential (B)
 
 
 
-
+
 
 AuthenticationGSS (B)
 
@@ -3544,7 +3544,7 @@ AuthenticationGSS (B)
 
 
 
-
+
 
 AuthenticationGSSContinue (B)
 
@@ -3599,7 +3599,7 @@ AuthenticationGSSContinue (B)
 
 
 
-
+
 
 AuthenticationSSPI (B)
 
@@ -3644,7 +3644,7 @@ AuthenticationSSPI (B)
 
 
 
-
+
 
 AuthenticationSASL (B)
 
@@ -3705,7 +3705,7 @@ following:
 
 
 
-
+
 
 AuthenticationSASLContinue (B)
 
@@ -3760,7 +3760,7 

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Stephen Frost
Greetings,

* Chapman Flack (c...@anastigmatix.net) wrote:
> On 03/01/22 13:22, David Steele wrote:
> > I think people are going to complain no matter what. If scripts are being
> > maintained changing the name is not a big deal (though moving from exclusive
> > to non-exclusive may be). If they aren't being maintained then they'll just
> > blow up a few versions down the road when we remove the compatibility
> > functions.
> 
> I might have already said enough in the message that crossed with this,
> but I think what I'm saying is there's a less-binary distinction between
> scripts that are/aren't "being maintained".
> 
> There can't really be many teams out there thinking "we'll just ignore
> these scripts forever, and nothing bad will happen." They all know they'll
> have to do stuff sometimes. But it matters how we allow them to schedule it.

We only make these changes between major versions.  That's as much as we
should be required to provide.

Further, we seriously changed around how restores work a few versions
back and there was rather little complaining.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Stephen Frost
Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 3/1/22 11:32, Nathan Bossart wrote:
> >On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote:
> >>On 03/01/22 09:44, David Steele wrote:
> >>>Personally, I am in favor of removing it. We change/rename
> >>>functions/tables/views when we need to, and this happens in almost every
> >>>release.
> >>
> >>For clarification, is that a suggestion to remove the 'exclusive' parameter
> >>in some later release, after using this release to default it to false and
> >>reject calls with true?
> >
> >My suggestion was to remove it in v15.  My impression is that David and
> >Stephen agree, but I could be misinterpreting their responses.
> 
> I agree and I'm pretty sure Stephen does as well.

Yes, +1 to removing it.

> >>That way, at least, there would be a period of time where procedures
> >>that currently work (by passing exclusive => false) would continue to work,
> >>and could be adapted as time permits by removing that argument, with no
> >>behavioral change.
> >
> >I'm not sure if there's any advantage to kicking the can down the road.  At
> >some point, we'll need to break existing backup scripts.  Will we be more
> >prepared to do that in v17 than we are now?  We could maintain two sets of
> >functions for a few releases and make it really clear in the documentation
> >that pg_start/stop_backup() are going to be removed soon (and always emit a
> >WARNING when they are used).  Would that address your concerns?
> 
> I think people are going to complain no matter what. If scripts are being
> maintained changing the name is not a big deal (though moving from exclusive
> to non-exclusive may be). If they aren't being maintained then they'll just
> blow up a few versions down the road when we remove the compatibility
> functions.

I don't consider "maintained" and "still using the exclusive backup
method" to both be able to be true at the same time.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Implementing Incremental View Maintenance

2022-03-01 Thread Yugo NAGATA
On Wed, 16 Feb 2022 22:34:18 +0800
huyajun  wrote:

> Hi, Nagata-san
> I am very interested in IMMV and read your patch but have some comments in 
> v25-0007-Add-Incremental-View-Maintenance-support.patch and want to discuss 
> with you.

Thank you for your review!

> 
> + /* For IMMV, we need to rewrite matview query */
> + query = rewriteQueryForIMMV(query, into->colNames);
> + query_immv = copyObject(query);
> 
> /* Create triggers on incremental maintainable materialized view */
> + Assert(query_immv != NULL);
> + CreateIvmTriggersOnBaseTables(query_immv, 
> matviewOid, true);
> 1. Do we need copy query?Is it okay  that CreateIvmTriggersOnBaseTables 
> directly use (Query *) into->viewQuery instead of query_immv like 
> CreateIndexOnIMMV?  It seems only planner may change query, but it shouldn't 
> affect us finding the correct base table in CreateIvmTriggersOnBaseTables .

The copy to query_immv was necessary for supporting sub-queries in the view
definition. However, we excluded the fueature from the current patch to reduce
the patch size, so it would be unnecessary. I'll fix it. 

> 
> +void
> +CreateIndexOnIMMV(Query *query, Relation matviewRel)
> +{
> + Query *qry = (Query *) copyObject(query);
> 2. Also, is it okay to not copy query in CreateIndexOnIMMV? It seems we only 
> read query in CreateIndexOnIMMV.

This was also necessary for supporting CTEs, but unnecessary in the current
patch, so I'll fix it, too.

Regards,
Yugo Nagata


-- 
Yugo NAGATA 




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-01 Thread Peter Eisentraut

On 25.02.22 21:19, Jacob Champion wrote:

On Fri, 2022-02-25 at 16:28 +, Jacob Champion wrote:

Ha, opr_sanity caught my use of cstring. I'll work on a fix later
today.


Fixed in v3.


This patch contains no documentation.  I'm having a hard time 
understanding what the name "session_authn_id" is supposed to convey. 
The comment for the Port.authn_id field says this is the "system 
username", which sounds like a clearer terminology.





Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Chapman Flack
On 03/01/22 13:22, David Steele wrote:
> I think people are going to complain no matter what. If scripts are being
> maintained changing the name is not a big deal (though moving from exclusive
> to non-exclusive may be). If they aren't being maintained then they'll just
> blow up a few versions down the road when we remove the compatibility
> functions.

I might have already said enough in the message that crossed with this,
but I think what I'm saying is there's a less-binary distinction between
scripts that are/aren't "being maintained".

There can't really be many teams out there thinking "we'll just ignore
these scripts forever, and nothing bad will happen." They all know they'll
have to do stuff sometimes. But it matters how we allow them to schedule it.

In the on-ramp, at first there was only exclusive. Then there were both
modes, with exclusive being deprecated, so teams knew they'd need to do
stuff, and most by now probably have. They were able to do separation of
hazards and schedule that work; they did not have to pile it onto the
whole plate of "upgrade PG from 9.5 to 9.6 and make sure everything works".

So now we're dropping the other shoe: first there was one mode, then both,
now there's only the other one. Again there's some work for teams to do;
let's again allow them to separate hazards and schedule that work apart
from the whole 14 to 15 upgrade project.

We can't help getting complaints in the off-ramp from anybody who ignored
the on-ramp. But we can avoid clobbering the teams who dutifully played
along before, and only want the same space to do so now.

Regards,
-Chap




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Chapman Flack
On 03/01/22 12:32, Nathan Bossart wrote:
> On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote:
>> That way, at least, there would be a period of time where procedures
>> that currently work (by passing exclusive => false) would continue to work,
>> and could be adapted as time permits by removing that argument, with no
>> behavioral change.
> 
> I'm not sure if there's any advantage to kicking the can down the road.  At
> some point, we'll need to break existing backup scripts.  Will we be more
> prepared to do that in v17 than we are now?

Yes, if we have provided a transition period the way we did in 7117685.
The way the on-ramp to that transition worked:

- Initially, your procedures used exclusive mode.
- If you changed nothing, they continued to work, with no behavior change.
- You then had ample time to adapt them to non-exclusive mode so now
  they work that way.
- You knew a day would come (here it comes) where, if you've never
  gotten around to doing that, your unchanged exclusive-mode procedures
  are going to break.

So here we are, arrived at that day. If you're still using exclusive mode,
your stuff's going to break; serves you right. So now limit the cases we
care about to people who made use of the time they were given to change
their procedures to use exclusive => false. So the off-ramp can look like:

- Initially, your procedures pass exclusive => false.
- If you change nothing, they should continue to work, with no
  behavior change.
- You should then have ample time to change to a new spelling without
  exclusive => false, and have them work that way.
- You know some later day is coming where, if you've never
  gotten around to doing that, they're going to break.

Then, when that day comes, if you're still passing exclusive at all,
your stuff's going to break; serves you right. If you have made use of
the time you were given for the changes, you'll be fine. So yes, at that
point, I think we can do it with clear conscience. We'll have made the
off-ramp as smooth and navigable as the on-ramp was.

> We could maintain two sets of
> functions for a few releases and make it really clear in the documentation
> that pg_start/stop_backup() are going to be removed soon (and always emit a
> WARNING when they are used).  Would that address your concerns?

That would. I had been thinking of not changing the names, and just making
the parameter go away. But I wasn't thinking of your concern here:

> What we need to do is make sure that an older installation won't silently
> work in a broken way, i.e. if we remove the exclusive flag somebody
> expecting the pre-9.6 behavior might not receive an error and think
> everything is OK. That would not be good.

So I'm ok with changing the names. Then step 3 of the off-ramp
would just be to call the functions by the new names, as well as to drop
the exclusive => false.

The thing I'd want to avoid is just, after the trouble that was taken
to make the on-ramp navigable, making the off-ramp be a cliff.

Regards,
-Chap




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread David Steele

On 3/1/22 11:32, Nathan Bossart wrote:

On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote:

On 03/01/22 09:44, David Steele wrote:

Personally, I am in favor of removing it. We change/rename
functions/tables/views when we need to, and this happens in almost every
release.


For clarification, is that a suggestion to remove the 'exclusive' parameter
in some later release, after using this release to default it to false and
reject calls with true?


My suggestion was to remove it in v15.  My impression is that David and
Stephen agree, but I could be misinterpreting their responses.


I agree and I'm pretty sure Stephen does as well.


That way, at least, there would be a period of time where procedures
that currently work (by passing exclusive => false) would continue to work,
and could be adapted as time permits by removing that argument, with no
behavioral change.


I'm not sure if there's any advantage to kicking the can down the road.  At
some point, we'll need to break existing backup scripts.  Will we be more
prepared to do that in v17 than we are now?  We could maintain two sets of
functions for a few releases and make it really clear in the documentation
that pg_start/stop_backup() are going to be removed soon (and always emit a
WARNING when they are used).  Would that address your concerns?


I think people are going to complain no matter what. If scripts are 
being maintained changing the name is not a big deal (though moving from 
exclusive to non-exclusive may be). If they aren't being maintained then 
they'll just blow up a few versions down the road when we remove the 
compatibility functions.


Regards,
-David




Re: CREATEROLE and role ownership hierarchies

2022-03-01 Thread Robert Haas
[ Been away, catching up on email. ]

On Tue, Feb 22, 2022 at 10:54 AM Joshua Brindle
 wrote:
> Yes, absolutely. It is my understanding that generally a community
> consensus is attempted, I was throwing my (and Crunchy's) use case out
> there as a possible goal, and I have spent time reviewing and testing
> the patch, so I think that is fair. Obviously I am not in the position
> to stipulate hard requirements.

I agree with all of that -- and thanks for writing back.

> if 1A worked for admins, or members I think it may work (i.e., Bot is
> admin of employees but not a member of employees and therefore can
> manage employees but not become them or read their tables)
>
> For example, today this works (in master):
>
> postgres=# CREATE USER creator password 'a';
> CREATE ROLE
> postgres=# CREATE ROLE employees ADMIN creator NOLOGIN;
> CREATE ROLE
>
> as creator:
> postgres=> CREATE USER joshua IN ROLE employees PASSWORD 'a';
> ERROR:  permission denied to create role
>
> as superuser:
> postgres=# CREATE USER joshua LOGIN PASSWORD 'a';
> CREATE ROLE
>
> as creator:
> postgres=> GRANT employees TO joshua;
> GRANT ROLE
> postgres=> SET ROLE joshua;
> ERROR:  permission denied to set role "joshua"
> postgres=> SET ROLE employees;
> SET
>
> So ADMIN of a role can add membership, but not create, and
> unfortunately can SET ROLE to employees.
>
> Can ADMIN mean "can create and drop roles with membership of this role
> but not implicitly be a member of the role"?

I foresee big problems trying to go in this direction. According to
the documentation, "the ADMIN clause is like ROLE, but the named roles
are added to the new role WITH ADMIN OPTION, giving them the right to
grant membership in this role to others." And for me, the name "WITH
ADMIN OPTION" is a huge red flag. You grant membership in a role, and
you may grant that membership with the admin option, or without the
admin option, but either way you are granting membership. And to me
that is just built into the phraseology. You may be able to buy the
car that you want with or without the all-wheel drive option, and you
may even be able to upgrade a car purchased without that option to
have it later, but you can't buy all-wheel drive in the abstract
without an association to some particular car. That's what it means
for it to be an option.

Now, I think there is a good argument to be made that in this case the
fact that the administration privileges are an option associated with
membership is artificial. I expect we can all agree that it is
conceptually easy to understand the idea of being able to administer a
role and the idea of having that role's privileges as two separate
concepts, neither dependent upon the other, and certainly the SQL
syntax could be written in a way that makes that very natural. But as
it is, what is the equivalent of GRANT employees TO bot WITH ADMIN
OPTION when you want to convey only administration rights and not
membership? GRANT employees TO bot WITH ADMIN OPTION BUT WITHOUT THE
UNDERLYING MEMBERSHIP TO WHICH ADMIN IS AN OPTION? Maybe that sounds
sarcastic, but to me it seems like a genuinely serious problem. People
construct a mental model of how stuff works based to a significant
degree on the structure of the syntax, and I really don't see an
obvious way of extending the grammar in a way that is actually going
to make sense to people.

> The current (v8) patch conflates membership and admin:
>
> postgres=# CREATE USER user_creator CREATEROLE WITHOUT ADMIN OPTION
> PASSWORD 'a';
> CREATE ROLE
> postgres=# CREATE ROLE employees ADMIN user_creator NOLOGIN;
> CREATE ROLE
>
> (Note I never GRANTED employees to user_creator):

I think you did, because even right now without the patch "ADMIN
whatever" is documented to mean membership with admin option.

> > So that leads to these questions: (2A) Do you care about restricting
> > which roles the userbot can drop? (2B) If yes, do you endorse
> > restricting the ability of roles to revoke themselves from other
> > roles?
>
> 2A, yes
> 2B, yes, and IIUC this already exists:
> postgres=> select current_user;
>  current_user
> --
>  joshua
> (1 row)
>
> postgres=> REVOKE employees FROM joshua;
> ERROR:  must have admin option on role "employees"

No, because as Stephen correctly points out, you've got that REVOKE
command backwards.

> > I think that we don't have any great problems here, at least as far as
> > this very specific issue is concerned, if either the answer to (2A) is
> > no or the answer to (2B) is yes. However, if the answer to (2A) is yes
> > and the answer to (2B) is no, there are difficulties. Evidently in
> > that case we need some new kind of thing that behaves mostly likes a
> > group of roles but isn't actually a group of roles -- and that thing
> > needs to prohibit self-revocation. Given what I've written above, you
> > may be able to guess my preferred solution: let's call it a TENANT.
> > Then, my pseudo-super-user can have permission to (i) create roles 

Refactor statistics collector, backend status reporting and command progress reporting

2022-03-01 Thread Nitin Jadhav
Hi,

Following are the files related to PostgreSQL statistics collector,
backend status reporting and command progress reporting.

pgstat.[ch] - Definitions for the PostgreSQL statistics collector daemon.
backend_status.[ch] - Definitions related to backend status reporting.
backend_progress.[ch] - Definitions related to command progress reporting.
progress.h - Constants used with the progress reporting facilities
defined in backend_status.h
pgstatfuncs.c - Functions for accessing the statistics collector data

There is a scope for some refactoring here even though the backend
status and command progress related code is separated from pgstat.c.

1) There is a lot of confusion between progress.h and
backend_progress.h. Why 2 header files required for the same
functionality? I feel progress.h can be merged with
backend_progress.h.
2) The access functions related to statistics collector are included
in pgstatfuncs.c file. This also contains the access functions related
to backend status and backend progress. I feel the access function
related to backend status should go in backend_status.c and the access
functions related to backend progress should go in backend progress.c
file. If the size grows in future then we can create new files for
access functions (like backend_status_funcs.c and
backend_progress_funcs.c).
3) There is a dependency between backend status and command progress
reporting but the corresponding functions are separated into 2
different files. It is better to define a new structure named
'PgBackendProgress' in backend_progress.h which consists of
'st_progress_command', 'st_progress_command_target' and
'st_progress_param'. Include a variable of type 'PgBackendProgress' as
a member of 'PgBackendStatus' structure.

Please share your thoughts.
If required, I would like to work on the patch.

Thanks & Regards,
Nitin Jadhav




Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

2022-03-01 Thread Thomas Munro
On Mon, Feb 28, 2022 at 2:36 PM Andres Freund  wrote:
> On February 27, 2022 4:19:21 PM PST, Thomas Munro  
> wrote:
> >It seems a little strange to introduce a new wait event that will very
> >often appear into a stable branch, but ... it is actually telling the
> >truth, so there is that.
>
> In the back branches it needs to be at the end of the enum - I assume you 
> intended that just to be for HEAD.

Yeah.

> I wonder whether in HEAD we shouldn't make that sleep duration be computed 
> from the calculation in IsOnSchedule...

I might look into this.

> >The sleep/poll loop in RegisterSyncRequest() may also have another
> >problem.  The comment explains that it was a deliberate choice not to
> >do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't
> >think there's an excuse to ignore postmaster death in a loop that
> >presumably becomes infinite if the checkpointer exits.  I guess we
> >could do:
> >
> >-   pg_usleep(1L);
> >+   WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10,
> >WAIT_EVENT_SYNC_REQUEST);
> >
> >But... really, this should be waiting on a condition variable that the
> >checkpointer broadcasts on when the queue goes from full to not full,
> >no?  Perhaps for master only?
>
> Looks worth improving, but yes, I'd not do it in the back branches.

0003 is a first attempt at that, for master only (on top of 0002 which
is the minimal fix).  This shaves another second off
027_stream_regress.pl on my workstation.  The main thing I realised is
that I needed to hold interrupts while waiting, which seems like it
should go away with 'tombstone' files as discussed in other threads.
That's not a new problem in this patch, it just looks more offensive
to the eye when you spell it out, instead of hiding it with an
unreported sleep/poll loop...

> I do think it's worth giving that sleep a proper wait event though, even in 
> the back branches.

I'm thinking that 0002 should be back-patched all the way, but 0001
could be limited to 14.
From a9344bb2fb2a363bec4be526f87560cb212ca10b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 28 Feb 2022 11:27:05 +1300
Subject: [PATCH v2 1/3] Wake up for latches in CheckpointWriteDelay().

The checkpointer shouldn't ignore its latch.  Other backends may be
waiting for it to drain the request queue.  Hopefully real systems don't
have a full queue often, but the condition is reached easily when shared
buffers is very small.

This involves defining a new wait event, which will appear in the
pg_stat_activity view often due to spread checkpoints.

Back-patch only to 14.  Even though the problem exists in earlier
branches too, it's hard to hit there.  In 14 we stopped using signal
handlers for latches on Linux, *BSD and macOS, which were previously
hiding this problem by interrupting the sleep (though not reliably, as
the signal could arrive before the sleep begins; precisely the problem
latches address).

Reported-by: Andres Freund 
Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
---
 doc/src/sgml/monitoring.sgml| 4 
 src/backend/postmaster/checkpointer.c   | 8 +++-
 src/backend/utils/activity/wait_event.c | 3 +++
 src/include/utils/wait_event.h  | 1 +
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9fb62fec8e..8620aaddc7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2235,6 +2235,10 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   BaseBackupThrottle
   Waiting during base backup when throttling activity.
  
+ 
+  CheckpointerWriteDelay
+  Waiting between writes while performing a checkpoint.
+ 
  
   PgSleep
   Waiting due to a call to pg_sleep or
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4488e3a443..a59c3cf020 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -484,6 +484,9 @@ CheckpointerMain(void)
 			}
 
 			ckpt_active = false;
+
+			/* We may have received an interrupt during the checkpoint. */
+			HandleCheckpointerInterrupts();
 		}
 
 		/* Check for archive_timeout and switch xlog files if necessary. */
@@ -726,7 +729,10 @@ CheckpointWriteDelay(int flags, double progress)
 		 * Checkpointer and bgwriter are no longer related so take the Big
 		 * Sleep.
 		 */
-		pg_usleep(10L);
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+  100,
+  WAIT_EVENT_CHECKPOINT_WRITE_DELAY);
+		ResetLatch(MyLatch);
 	}
 	else if (--absorb_counter <= 0)
 	{
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index 60972c3a75..0706e922b5 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -485,6 +485,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
 		

Re: Allow async standbys wait for sync replication

2022-03-01 Thread Bharath Rupireddy
On Tue, Mar 1, 2022 at 10:35 PM Nathan Bossart  wrote:
>
> On Tue, Mar 01, 2022 at 04:34:31PM +0900, Kyotaro Horiguchi wrote:
> > At Mon, 28 Feb 2022 22:05:28 -0800, Nathan Bossart 
> >  wrote in
> >> replicated LSN.  TBH there are probably other things that need to be
> >> considered (e.g., how do we ensure that the WAL sender sends the rest once
> >> it is replicated?), but I still think we should avoid spinning in the WAL
> >> sender waiting for WAL to be replicated.
> >
> > It seems to me it would be something similar to
> > SyncRepReleaseWaiters().  Or it could be possible to consolidate this
> > feature into the function, I'm not sure, though.
>
> Yes, perhaps the synchronous replication framework will need to alert WAL
> senders when the syncrep LSN advances so that the WAL is sent to the async
> standbys.  I'm glossing over the details, but I think that should be the
> general direction.

It's doable. But we can't avoid async walsenders waiting for the flush
LSN even if we take the SyncRepReleaseWaiters() approach right? I'm
not sure (at this moment) what's the biggest advantage of this
approach i.e. (1) backends waking up walsenders after flush lsn is
updated vs (2) walsenders keep looking for the new flush lsn.

> >> My feedback is specifically about this behavior.  I don't think we should
> >> spin in XLogSend*() waiting for an LSN to be synchronously replicated.  I
> >> think we should just choose the SendRqstPtr based on what is currently
> >> synchronously replicated.
> >
> > Do you mean something like the following?
> >
> > /* Main loop of walsender process that streams the WAL over Copy messages. 
> > */
> > static void
> > WalSndLoop(WalSndSendDataCallback send_data)
> > {
> > /*
> >  * Loop until we reach the end of this timeline or the client requests 
> > to
> >  * stop streaming.
> >  */
> > for (;;)
> > {
> > if (am_async_walsender && there_are_sync_standbys)
> > {
> >  XLogRecPtr SendRqstLSN;
> >  XLogRecPtr SyncFlushLSN;
> >
> > SendRqstLSN = GetFlushRecPtr(NULL);
> > LWLockAcquire(SyncRepLock, LW_SHARED);
> > SyncFlushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
> > LWLockRelease(SyncRepLock);
> >
> > if (SendRqstLSN > SyncFlushLSN)
> >continue;
> > }
>
> Not quite.  Instead of "continue", I would set SendRqstLSN to SyncFlushLSN
> so that the WAL sender only sends up to the current synchronously
> replicated LSN.  TBH there are probably other things that need to be
> considered (e.g., how do we ensure that the WAL sender sends the rest once
> it is replicated?), but I still think we should avoid spinning in the WAL
> sender waiting for WAL to be replicated.

I did some more analysis on the above point: we can let
XLogSendPhysical know up to which it can send WAL (SendRqstLSN). But,
XLogSendLogical reads the WAL using XLogReadRecord mechanism with
read_local_xlog_page page_read callback to which we can't really say
SendRqstLSN. May be we have to do something like below:

XLogSendPhysical:
/* Figure out how far we can safely send the WAL. */
if (am_async_walsender && there_are_sync_standbys)
{
 LWLockAcquire(SyncRepLock, LW_SHARED);
 SendRqstPtr = WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH];
 LWLockRelease(SyncRepLock);
}
/* Existing code path to determine SendRqstPtr */
else if (sendTimeLineIsHistoric)
{
}
else if (am_cascading_walsender)
{
}
else
{
/*
* Streaming the current timeline on a primary.
}

XLogSendLogical:
if (am_async_walsender && there_are_sync_standbys)
{
 XLogRecPtr SendRqstLSN;
 XLogRecPtr SyncFlushLSN;

 SendRqstLSN = GetFlushRecPtr(NULL);
 LWLockAcquire(SyncRepLock, LW_SHARED);
 SyncFlushLSN = WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH];
 LWLockRelease(SyncRepLock);

 if (SendRqstLSN > SyncFlushLSN)
  return;
 }

On Tue, Mar 1, 2022 at 7:35 AM Hsu, John  wrote:
> > I too observed this once or twice. It looks like the walsender isn't
> > detecting postmaster death in for (;;) with WalSndWait. Not sure if
> > this is expected or true with other wait-loops in walsender code. Any
> > more thoughts here?
>
> Unfortunately I haven't had a chance to dig into it more although iirc I hit 
> it fairly often.

I think I got what the issue is. Below does the trick.
if (got_STOPPING)
proc_exit(0);

 * If the server is shut down, checkpointer sends us
 * PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited.

I will take care of this in the next patch once the approach we take
for this feature gets finalized.

Regards,
Bharath Rupireddy.




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Nathan Bossart
On Tue, Mar 01, 2022 at 11:09:13AM -0500, Chapman Flack wrote:
> On 03/01/22 09:44, David Steele wrote:
>> Personally, I am in favor of removing it. We change/rename
>> functions/tables/views when we need to, and this happens in almost every
>> release.
> 
> For clarification, is that a suggestion to remove the 'exclusive' parameter
> in some later release, after using this release to default it to false and
> reject calls with true?

My suggestion was to remove it in v15.  My impression is that David and
Stephen agree, but I could be misinterpreting their responses.

> That way, at least, there would be a period of time where procedures
> that currently work (by passing exclusive => false) would continue to work,
> and could be adapted as time permits by removing that argument, with no
> behavioral change.

I'm not sure if there's any advantage to kicking the can down the road.  At
some point, we'll need to break existing backup scripts.  Will we be more
prepared to do that in v17 than we are now?  We could maintain two sets of
functions for a few releases and make it really clear in the documentation
that pg_start/stop_backup() are going to be removed soon (and always emit a
WARNING when they are used).  Would that address your concerns?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add id's to various elements in protocol.sgml

2022-03-01 Thread Brar Piening

On Feb 28, 2022 at 21:06, Chapman Flack wrote:

I think that in other recent examples I've seen, there might be
(something like a) consensus forming around the Unicode LINK SYMBOL
 rather than # as the symbol for such things.


I intentionally opted for an ASCII character as that definitely won't
cause any display/font/portability issues but changing that is no problem.


... and now that the concept is proven, how hard would it be to broaden
that template's pattern to apply to all the other DocBook constructs
(such as section headings) that emit anchors?


As long as we stick to manually assigned ids in the same way my patch
does it, it shouldn't be too hard. Speaking of autogenerated ids, I
failed to make use of them since I wasn't able to reproduce the same
autogenerated id twice in order to use it for the link.

Also I'm not sure how well the autogenerated ids are reproducible over
time/versions/builds, and if it is wise to use them as targets to link
to from somewhere else.






Re: Allow async standbys wait for sync replication

2022-03-01 Thread Nathan Bossart
On Tue, Mar 01, 2022 at 04:34:31PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 28 Feb 2022 22:05:28 -0800, Nathan Bossart  
> wrote in 
>> replicated LSN.  TBH there are probably other things that need to be
>> considered (e.g., how do we ensure that the WAL sender sends the rest once
>> it is replicated?), but I still think we should avoid spinning in the WAL
>> sender waiting for WAL to be replicated.
> 
> It seems to me it would be something similar to
> SyncRepReleaseWaiters().  Or it could be possible to consolidate this
> feature into the function, I'm not sure, though.

Yes, perhaps the synchronous replication framework will need to alert WAL
senders when the syncrep LSN advances so that the WAL is sent to the async
standbys.  I'm glossing over the details, but I think that should be the
general direction.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Add reloption for views to enable RLS

2022-03-01 Thread Christoph Heiss

Thanks for reviewing!

On 2/25/22 19:22, Dean Rasheed wrote:

Re-reading this thread, I think I preferred the name
"security_invoker". The main objection seemed to come from the
potential confusion with SECURITY INVOKER/DEFINER functions, but I
think that's really a different thing. As long as the documentation
for the default behaviour is clear (which I think it was), then it
should be easy to explain how a security invoker view behaves
differently. Also, there's value in using the same terminology as
other databases, because many users will already be familiar with the
feature from those databases.


That is also the main reason I preferred naming it "security_invoker" - 
it is consistent with other databases and eases transition from such 
systems.


I kept "check_permissions_owner" for now. Constantly changing it around 
with each iteration doesn't really bring any value IMHO, I'd rather have 
a final consensus on how to name the option and *then* change it for good.




Some other review comments:

1). This new comment:

+   
+Be aware that USAGE privileges on schemas containing
+the underlying base relations are not checked.
+   

is not entirely accurate. It's more accurate to say that a user
creating or replacing a view must have CREATE privileges on the schema
containing the view and USAGE privileges on any schemas referred to in
the view query, whereas a user using the view only needs USAGE
privileges on the schema containing the view.

(Note that, for the view creator, USAGE is required on any schema
referred to in the query -- e.g., schemas containing functions as well
as base relations.)


Improved in the attached v9.



2). The patch is adding a new field to RangeTblEntry which seems to be
unnecessary -- it's set, and copied around, but never read, so it
should just be removed.


I removed that field in v9 since it is indeed completely unused. I 
initially added it to be consistent with the "security_barrier" 
implementation and than somewhat forgot about it.




3). Looking at this change:

[..]

I think it should call setRuleCheckAsUser() in all cases. It might be
true that the rule fetched has checkAsUser set to InvalidOid
throughout its action and quals, but it seems unwise to rely on that
-- better to code defensively and explicitly set it in all cases.


It probably doesn't really matter, but I agree that coding defensively 
is always a good thing.
Changed that in v9 to call setRuleCheckAsUser() either with ->relowner 
or InvalidOid.




4). In the same code block, I think the new behaviour should be
applied to SELECT rules only. The view may have other non-SELECT rules
(just as a table may have non-SELECT rules), created using CREATE
RULE, but their actions are independent of the view definition.
Currently their permissions are checked as the view/table owner, and
if anyone wanted to change that, it should be an option on the rule,
not the view (just as triggers can be made security definer or
invoker, depending on how the trigger function is defined).



Good catch, I added a additional check for rule->event and a test for 
that in v9.
[ I also had to add a missing DROP statement to some previous test, just 
a heads up. ]


It makes sense to mimic the behavior of triggers and further, 
user-created rules otherwise might behave differently for tables and 
views, depending on the view definition.

[ But I'm not _that_ familiar with CREATE RULE, FWIW. ]



5). In the same function, the block of code that fetches rules and
triggers has been moved. I think it would be worth adding a comment to
explain why it's now important to extract the reloptions *before*
fetching the relation's rules and triggers.


Added a small comment explaining that in v9.



6). The second set of tests added to rowsecurity.sql seem to have
nothing to do with RLS, and probably belong in updatable_views.sql,
and I think it would be worth adding a few more tests for things like
views on top of views.


Seems reasonable to move them into updatable_views.sql, done that for 
v9. Further I added two (simple) tests for chained views as you 
mentioned, hope they reflect what you had in mind.


Thanks,
ChristophFrom a7e84c92761881419bf8d63ebfcc528417dc8d24 Mon Sep 17 00:00:00 2001
From: Christoph Heiss 
Date: Tue, 1 Mar 2022 17:36:42 +0100
Subject: [PATCH v9 1/1] Add new boolean reloption "check_permissions_owner" to
 views

When this reloption is set to "false", all permissions on the underlying
relations will be checked against the invoking user rather than the view
owner.  The latter remains the default behavior.

Author: Christoph Heiss 
Co-Author: Laurenz Albe 
Reviewed-By: Laurenz Albe, Wolfgang Walther
Discussion: https://postgr.es/m/b66dd6d6-ad3e-c6f2-8b90-47be773da240%40cybertec.at
Signed-off-by: Christoph Heiss 
---
 doc/src/sgml/ref/alter_view.sgml  | 13 ++-
 doc/src/sgml/ref/create_view.sgml | 72 ++---
 src/backend/access/common/reloptions.c| 11 +++
 

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Stephen Frost
Greetings,

* Nathan Bossart (nathandboss...@gmail.com) wrote:
> On Tue, Mar 01, 2022 at 08:44:51AM -0600, David Steele wrote:
> > Personally, I am in favor of removing it. We change/rename
> > functions/tables/views when we need to, and this happens in almost every
> > release.
> > 
> > What we need to do is make sure that an older installation won't silently
> > work in a broken way, i.e. if we remove the exclusive flag somebody
> > expecting the pre-9.6 behavior might not receive an error and think
> > everything is OK. That would not be good.
> > 
> > One option might be to rename the functions. Something like
> > pg_backup_start/stop.
> 
> I'm fine with this approach.

+1.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-01 Thread Justin Pryzby
Can I suggest to copy the patch authors on bulk emails like these ?

(Obviously, an extended discussion about a particular patch should happen on
its original thread, but that's not what this is about).

-- 
Justin




Re: Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-01 Thread Julien Rouhaud
Hi,

On Tue, Mar 01, 2022 at 11:16:36AM -0500, Greg Stark wrote:
>
> > 1608: schema variables, LET command
> > ===
> > After 18 CF's and two very long threads it seems to be nearing completion
> > judging by Tomas' review.  There is an ask in that review for a second pass
> > over the docs by a native speaker, any takers?
>
> Patch has a new name, "session variables, LET command"
>
> There's been a *lot* of work on this patch so I'm loath to bump it.
> The last review was from Julien Rouhaud which had mostly code style
> comments but it had one particular concern about using xact callback
> in core and about EOX cleanup.
>
> Pavel, do you have a plan to improve this or are you looking for
> suggestions from someone about how you should solve this problem?

There has indeed been a lot of work done on the patch during the last commit
fest, and Pavel always fixed all the reported issues promptly, which is why
apart from the EOX cleanup thing most of the last review was minor problems.

Pavel sent a new version today that address the EOX problem (and everything
else) so I'm now the one that needs to do my reviewer job (which I already
started).  I didn't get through all the changes yet but as far as I can
see the patch is in a very good shape.  I'm quite optimistic about this patch
being ready for committer very soon, so I think it would be good to keep it and
seeif we can get it committed in pg 15.  Note that some committers already
showed interest in the patch.




Commitfest 2022-03 Starts Now

2022-03-01 Thread Greg Stark
The final commitfest of this release begins now.

Whereas most commitfests are about getting feedback to authors so they
can advance the patch -- this one is about actually committing patches
to wrap up the release.

Please when reviewing patches try to push yourself to make the
difficult call about whether a patch will actually be feasible to
commit in this commitfest. If not be up front about it and say so
because we need to focus energy in this commitfest on patches that are
committable in Postgres 15.

There are a lot of great patches in the commitfest that would really
be great to get committed!

--
greg




Commitfest 2022-03 Patch Triage Part 1a.i

2022-03-01 Thread Greg Stark
Last November Daniel Gustafsson  did a patch triage. It took him three
emails to get through the patches in the commitfest back then. Since
then we've had the November and the January commitfests so I was
interested to see how many of these patches had advanced

I'm only part way through the first email but so far only two patches
have changed status -- and both to "Returned with feedback" :(

So I'm going to post updates but I'm going to break it up into smaller
batches because otherwise it'll take me a month before I post
anything.



> 1608: schema variables, LET command
> ===
> After 18 CF's and two very long threads it seems to be nearing completion
> judging by Tomas' review.  There is an ask in that review for a second pass
> over the docs by a native speaker, any takers?

Patch has a new name, "session variables, LET command"

There's been a *lot* of work on this patch so I'm loath to bump it.
The last review was from Julien Rouhaud which had mostly code style
comments but it had one particular concern about using xact callback
in core and about EOX cleanup.

Pavel, do you have a plan to improve this or are you looking for
suggestions from someone about how you should solve this problem?

> 1741: Index Skip Scan
> =
> An often requested feature which has proven hard to reach consensus on an
> implementation for.  The thread(s) have stalled since May, is there any hope 
> of
> taking this further?  Where do we go from here with this patch?

"Often requested indeed! I would love to be able to stop explaining to
people that Postgres can't handle this case well.

It seems there are multiple patch variants around and suggestions from
Heikki and Peter G about fundamental interface choices. It would be
nice to have a good summary from someone who is involved about what's
actually left unresolved.


> 1712: Remove self join on a unique column
> =
> This has moved from "don't write bad queries" to "maybe we should do something
> about this".  It seems like there is concensus that it can be worth paying the
> added planner cost for this, especially if gated by a GUC to keep it out of
> installations where it doesn't apply.  The regression on large join queries
> hasn't been benchmarked it seems (or I missed it), but the patch seems to have
> matured and be quite ready.  Any takers on getting it across the finish line?

There hasn't been any review since the v29 patch was posted in July.
That said, to my eye it seemed like pretty basic functionality errors
were being corrected quite late. All the bugs got patch updates
quickly but does this have enough tests to be sure it's working right?

The only real objection was about whether the planning time justified
the gains since the gains are small. But I think they were resolved by
making the optimization optional. Do we have consensus that that
resolved the issue or do we still need benchmarks showing the planning
time hit is reasonable?

> 2161: standby recovery fails when re-replaying due to missing directory which
> was removed in previous replay.
> =
> Tom and Robert seem to be in agreement that parts of this patchset are good,
> and that some parts are not.  The thread has stalled and the patch no longer
> apply, so unless someone feels like picking up the good pieces this seems like
> a contender to close for now.

Tom's feedback seems to have been acted on last November. And the last
update in January was that it was passing CI now. Is this ready to
commit now?


> 2138: Incremental Materialized View Maintenance
> ===
> The size of the
> patchset and the length of the thread make it hard to gauge just far away it
> is, maybe the author or a reviewer can summarize the current state and outline
> what is left for it to be committable.

There is an updated patch set as of February but I have the same
difficulty wrapping my head around the amount of info here.

Is this one really likely to be commitable in 15? If not I think we
should move this to 16 now and concentrate on patches that will be
commitable in this release.


> 2218: Implement INSERT SET syntax
> =
> The author has kept this patch updated, and has seemingly updated according to
> the review comments.  Tom: do you have any opinions on whether the updated
> version addresses your concerns wrt the SELECT rewrite?

I don't see any discussion implying that Tom's concerns were met. I'm
not exactly clear why Tom's concerns are real problems though --
wouldn't it be a *good* thing if we have a more expressive syntax? But
that's definitely what needs to be resolved before it can move ahead.

So unless there's objections I'm going to update this to "waiting on author".

-- 
greg




Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Chapman Flack
On 03/01/22 09:44, David Steele wrote:
> Personally, I am in favor of removing it. We change/rename
> functions/tables/views when we need to, and this happens in almost every
> release.

For clarification, is that a suggestion to remove the 'exclusive' parameter
in some later release, after using this release to default it to false and
reject calls with true?

I can get behind that proposal, even if we don't have a practical way
to add the warning I suggested. I'd be happier with the warning, but can
live without it. Release notes can be the warning.

That way, at least, there would be a period of time where procedures
that currently work (by passing exclusive => false) would continue to work,
and could be adapted as time permits by removing that argument, with no
behavioral change.

The later release removing the argument would then break only procedures
that had never done so. That's comparable to what's proposed for this
release, which will only break procedures that have never migrated away
from exclusive mode despite the time and notice to do so.

That seems ok to me.

Regards,
-Chap




Re: convert libpq uri-regress tests to tap test

2022-03-01 Thread Andres Freund
Hi, 

On March 1, 2022 7:44:27 AM PST, Andrew Dunstan  wrote:
>
>On 2/24/22 07:19, Andrew Dunstan wrote:
>> On 2/23/22 20:52, Tom Lane wrote:
>>> Peter Eisentraut  writes:
 On 23.02.22 23:58, Tom Lane wrote:
> Peter Eisentraut  writes:
>> libpq TAP tests should be in src/interfaces/libpq/t/.
> That's failing to account for the fact that a libpq test can't
> really be a pure-perl TAP test; you need some C code to drive the
> library.
 Such things could be put under src/interfaces/libpq/test, or some other 
 subdirectory.  We already have src/interfaces/ecpg/test.
>>> OK, but then the TAP scripts are under src/interfaces/libpq/test/t,
>>> which isn't what you said.  I have no great objection to moving
>>> src/test/modules/libpq_pipeline/ to src/interfaces/libpq/test/,
>>> though, as long as the buildfarm will cope.
>>>
>>> 
>>
>> It won't without some adjustment.
>
>
>
>See
>
>and
>

Thanks! 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: convert libpq uri-regress tests to tap test

2022-03-01 Thread Andrew Dunstan


On 2/24/22 07:19, Andrew Dunstan wrote:
> On 2/23/22 20:52, Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> On 23.02.22 23:58, Tom Lane wrote:
 Peter Eisentraut  writes:
> libpq TAP tests should be in src/interfaces/libpq/t/.
 That's failing to account for the fact that a libpq test can't
 really be a pure-perl TAP test; you need some C code to drive the
 library.
>>> Such things could be put under src/interfaces/libpq/test, or some other 
>>> subdirectory.  We already have src/interfaces/ecpg/test.
>> OK, but then the TAP scripts are under src/interfaces/libpq/test/t,
>> which isn't what you said.  I have no great objection to moving
>> src/test/modules/libpq_pipeline/ to src/interfaces/libpq/test/,
>> though, as long as the buildfarm will cope.
>>
>>  
>
> It won't without some adjustment.



See

and



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread Nathan Bossart
On Tue, Mar 01, 2022 at 08:44:51AM -0600, David Steele wrote:
> Personally, I am in favor of removing it. We change/rename
> functions/tables/views when we need to, and this happens in almost every
> release.
> 
> What we need to do is make sure that an older installation won't silently
> work in a broken way, i.e. if we remove the exclusive flag somebody
> expecting the pre-9.6 behavior might not receive an error and think
> everything is OK. That would not be good.
> 
> One option might be to rename the functions. Something like
> pg_backup_start/stop.

I'm fine with this approach.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Pre-allocating WAL files

2022-03-01 Thread Nathan Bossart
On Tue, Mar 01, 2022 at 08:40:44AM -0600, Justin Pryzby wrote:
> FYI: this is currently failing in cfbot on linux.
> 
> https://cirrus-ci.com/task/4934371210690560
> https://api.cirrus-ci.com/v1/artifact/task/4934371210690560/log/src/test/regress/regression.diffs
> 
>  DROP TABLESPACE regress_tblspace_renamed;
> +ERROR:  tablespace "regress_tblspace_renamed" is not empty

I believe this is due to an existing bug.  This patch set seems to
influence the timing to make it more likely.  I'm tracking the fix here:

https://commitfest.postgresql.org/37/3544/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Skipping logical replication transactions on subscriber side

2022-03-01 Thread Masahiko Sawada
On Tue, Feb 15, 2022 at 7:35 PM Peter Eisentraut
 wrote:
>
> On 14.02.22 10:16, Amit Kapila wrote:
> > I think exposing LSN is a better approach as it doesn't have the
> > dangers of wraparound. And, I think users can use it with the existing
> > function pg_replication_origin_advance() which will save us from
> > adding additional code for this feature. We can explain/expand in docs
> > how users can use the error information from view/error_logs and use
> > the existing function to skip conflicting transactions. We might want
> > to even expose error_origin to make it a bit easier for users but not
> > sure. I feel the need for the new syntax (and then added code
> > complexity due to that) isn't warranted if we expose error_LSN and let
> > users use it with the existing functions.
>
> Well, the whole point of this feature is to provide a higher-level
> interface instead of pg_replication_origin_advance().  Replication
> origins are currently not something the users have to deal with
> directly.  We already document that you can use
> pg_replication_origin_advance() to skip erroring transactions.  But that
> seems unsatisfactory.  It'd be like using pg_surgery to fix unique
> constraint violations.

+1

I’ve considered a plan for the skipping logical replication
transaction feature toward PG15. Several ideas and patches have been
proposed here and another related thread[1][2] for the skipping
logical replication transaction feature as follows:

A. Change pg_stat_subscription_workers (committed 7a8507329085)
B. Add origin name and commit-LSN to logical replication worker
errcontext (proposed[2])
C. Store error information (e.g., the error message and commit-LSN) to
the system catalog
D. Introduce ALTER SUBSCRIPTION SKIP
E. Record the skipped data somewhere: server logs or a table

Given the remaining time for PG15, it’s unlikely to complete all of
them for PG15 by the feature freeze. The most realistic plan for PG15
in my mind is to complete B and D. With these two items, the LSN of
the error-ed transaction is shown in the server log, and we can ask
users to check server logs for the LSN and use it with ALTER
SUBSCRIPTION SKIP command. If the community agrees with B+D, we will
have a user-visible feature for PG15 which can be further
extended/improved in PG16 by adding C and E. I started a new thread[2]
for B yesterday. In this thread, I'd like to discuss D.

I've attached an updated patch for D and here is the summary:

* Introduce a new command ALTER SUBSCRIPTION ... SKIP (lsn =
'0/1234'). The user can get the commit-LSN of the transaction in
question from the server logs thanks to B[2].
* The user-specified LSN (say skip-LSN) is stored in the
pg_subscription catalog.
* The apply worker skips the whole transaction if the transaction's
commit-LSN exactly matches to skip-LSN.
* The skip-LSN has an effect on only the first non-empty transaction
since the worker started to apply changes. IOW it's cleared after
either skipping the whole transaction or successfully committing a
non-empty transaction, preventing the skip-LSN to remain in the
catalog. Also, since the latter case means that the user set the wrong
skip-LSN we clear it with a warning.
* ALTER SUBSCRIPTION SKIP doesn't support tablesync workers. But it
would not be a problem in practice since an error during table
synchronization is not common and could be resolved by truncating the
table and restarting the synchronization.

For the above reasons, ALTER SUBSCRIPTION SKIP command is safer than
the existing way of using pg_replication_origin_advance().

I've attached an updated patch along with two patches for cfbot tests
since the main patch (0003) depends on the other two patches. Both
0001 and 0002 patches are the same ones I attached on another
thread[2].

Regards,

[1] 
https://www.postgresql.org/message-id/20220125063131.4cmvsxbz2tdg6g65%40alap3.anarazel.de
[2] 
https://www.postgresql.org/message-id/CAD21AoBarBf2oTF71ig2g_o%3D3Z_Dt6_sOpMQma1kFgbnA5OZ_w%40mail.gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
Description: Binary data


v12-0001-Use-complete-sentences-in-logical-replication-wo.patch
Description: Binary data


v12-0002-Add-the-origin-name-and-remote-commit-LSN-to-log.patch
Description: Binary data


Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2022-03-01 Thread David Steele

On 2/28/22 23:51, Nathan Bossart wrote:

On Sat, Feb 26, 2022 at 02:06:14PM -0800, Nathan Bossart wrote:

On Sat, Feb 26, 2022 at 04:48:52PM +, Chapman Flack wrote:

Assuming the value is false, so no error is thrown, is it practical to determine
from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would
further suggest reporting a deprecation WARNING if it was explicitly supplied,
with a HINT that the argument can simply be removed at the call site, and will
become unrecognized in some future release.


This is a good point.  I think I agree with your proposed changes.  I
believe it is possible to add a deprecation warning only when 'exclusive'
is specified.  If anything, we can create a separate function that accepts
the 'exclusive' parameter and that always emits a NOTICE or WARNING.


I've spent some time looking into this, and I haven't found a clean way to
emit a WARNING only if the "exclusive" parameter is supplied (and set to
false).  AFAICT flinfo->fn_expr doesn't tell us whether the parameter was
supplied or the default value was used.  I was able to get it working by
splitting pg_start_backup() into 3 separate internal functions (i.e.,
pg_start_backup_1arg(), pg_start_backup_2arg(), and
pg_start_backup_3arg()), but this breaks calls such as
pg_start_backup('mylabel', exclusive => false), and it might complicate
privilege management for users.

Without a WARNING, I think it will be difficult to justify removing the
"exclusive" parameter in the future.  We would either need to leave it
around forever, or we would have to risk unnecessarily breaking some
working backup scripts.  I wonder if we should just remove it now and make
sure that this change is well-documented in the release notes.


Personally, I am in favor of removing it. We change/rename 
functions/tables/views when we need to, and this happens in almost every 
release.


What we need to do is make sure that an older installation won't 
silently work in a broken way, i.e. if we remove the exclusive flag 
somebody expecting the pre-9.6 behavior might not receive an error and 
think everything is OK. That would not be good.


One option might be to rename the functions. Something like 
pg_backup_start/stop.


Regards,
-David




Re: Pre-allocating WAL files

2022-03-01 Thread Justin Pryzby
On Thu, Dec 30, 2021 at 02:51:10PM +0300, Maxim Orlov wrote:
> I did check the patch too and found it to be ok. Check and check-world are
> passed.

FYI: this is currently failing in cfbot on linux.

https://cirrus-ci.com/task/4934371210690560
https://api.cirrus-ci.com/v1/artifact/task/4934371210690560/log/src/test/regress/regression.diffs

 DROP TABLESPACE regress_tblspace_renamed;
+ERROR:  tablespace "regress_tblspace_renamed" is not empty

-- 
Justin




Re: Temporary file access API

2022-03-01 Thread Stephen Frost
Greetings,

* Antonin Houska (a...@cybertec.at) wrote:
> Here I'm starting a new thread to discuss a topic that's related to the
> Transparent Data Encryption (TDE), but could be useful even without that.  The
> problem has been addressed somehow in the Cybertec TDE fork, and I can post
> the code here if it helps. However, after reading [1] (and the posts
> upthread), I've got another idea, so let's try to discuss it first.

Yeah, I tend to agree that it makes sense to discuss the general idea of
doing buffered I/O for the temporary files that are being created today
with things like fwrite and have that be independent of encryption.  As
mentioned on that thread, doing our own buffering and having file
accesses done the same way throughout the system is just generally a
good thing and focusing on that will certainly help with any TDE effort
that may or may not happen later.

> It makes sense to me if we first implement the buffering (i.e. writing/reading
> certain amount of data at a time) and make the related functions aware of
> encryption later: as long as we use a block cipher, we also need to read/write
> (suitably sized) chunks rather than individual bytes (or arbitrary amounts of
> data). (In theory, someone might need encryption but reject buffering, but I'm
> not sure if this is a realistic use case.)

Agreed.

> For the buffering, I imagine a "file stream" object that user creates on the
> top of a file descriptor, such as
> 
> FileStream  *FileStreamCreate(File file, int buffer_size)
> 
> or
> 
> FileStream  *FileStreamCreateFD(int fd, int buffer_size)
> 
> and uses functions like
> 
> int FileStreamWrite(FileStream *stream, char *buffer, int amount)
> 
> and
> 
> int FileStreamRead(FileStream *stream, char *buffer, int amount)
> 
> to write and read data respectively.

Yeah, something along these lines was also what I had in mind, in
particular as it would mean fewer changes to places like reorderbuffer.c
(or, at least, very clear changes).

> Besides functions to close the streams explicitly (e.g. FileStreamClose() /
> FileStreamFDClose()), we'd need to ensure automatic closing where that happens
> to the file. For example, if OpenTemporaryFile() was used to obtain the file
> descriptor, the user expects that the file will be closed and deleted on
> transaction boundary, so the corresponding stream should be freed
> automatically as well.

Sure.  We do have some places that want to write using offsets and we'll
want to support that also, I'd think.

> To avoid code duplication, buffile.c should use these streams internally as
> well, as it also performs buffering. (Here we'd also need functions to change
> reading/writing position.)

Yeah... though we should really go through and make sure that we
understand the use-cases for each of the places that decided to use
their own code rather than what already existed, to the extent that we
can figure that out, and make sure that we're solving those cases and
not writing a bunch of new code that won't end up getting used.  We
really want to be sure that all writes are going through these code
paths and make sure there aren't any reasons for them not to be used.

> Once we implement the encryption, we might need add an argument to the
> FileStreamCreate...() functions that helps to generate an unique IV, but the
> ...Read() / ...Write() functions would stay intact. And possibly one more
> argument to specify the kind of cipher, in case we support more than one.

As long as we're only needing to pass these into the Create function,
that seems reasonable to me.  I wouldn't want to go through the effort
of this and then still have to change every single place we read/write
using this system.

Thanks,

Stephen


signature.asc
Description: PGP signature


  1   2   >