Re: Materialized view rewrite is broken when there is an event trigger

2022-08-16 Thread Michael Paquier
On Tue, Aug 09, 2022 at 10:35:01AM -0400, Tom Lane wrote:
> Agreed this is a bug, but I do not think we should add the proposed
> regression test, regardless of where exactly.  It looks like expending
> a lot of cycles forevermore to watch for an extremely unlikely thing,
> ie that we break this for ALTER MATERIALIZED VIEW and not anything
> else.

Hmm.  I see a second point in keeping a test in this area, because we
have nothing that directly checks after AT_REWRITE_ACCESS_METHOD as
introduced by b048326.  It makes me wonder whether we should have a
second test for a plain table with SET ACCESS METHOD, actually, but
we have already cases for rewrites there, so..

> I think the real problem here is that we don't have any mechanism
> for verifying that table_rewrite_ok is correct.  The "cross-check"
> in EventTriggerCommonSetup is utterly worthless, as this failure
> shows.  Does it give any confidence at all that there are no other
> mislabelings?  I sure have none now.  What can we do to verify that
> more rigorously?  Or maybe we should find a way to get rid of the
> table_rewrite_ok flag altogether?

This comes down to the dependency between the event trigger paths in
utility.c and tablecmds.c, which gets rather trickier with the ALTERs
on various relkinds.  I don't really know about if we could cut this
specific flag, perhaps we could manage a list of command tags
supported for it as that's rather short.  I can also see that
something could be done for the firing matrix in the docs, as well
(the proposed patch has forgotten the docs).  That's not something
that should be done for v15 anyway, so I have fixed the issue at hand
to take care of this open item.
--
Michael


signature.asc
Description: PGP signature


Re: XLogBeginRead's header comment lies

2022-08-16 Thread Dilip Kumar
On Tue, Aug 16, 2022 at 11:28 PM Robert Haas  wrote:
>
> It claims that:
>
>  * 'RecPtr' should point to the beginning of a valid WAL record.  Pointing at
>  * the beginning of a page is also OK, if there is a new record right after
>  * the page header, i.e. not a continuation.
>
> But this actually doesn't seem to work. This function doesn't itself
> have any problem with any LSNs you want to pass it, so if you call
> this function with an LSN that is at the beginning of a page, you'll
> end up with EndRecPtr set to the LSN you specify and DecodeRecPtr set
> to NULL. When you then call XLogReadRecord, you'll reach
> XLogDecodeNextRecord, which will do this:
>
> if (state->DecodeRecPtr != InvalidXLogRecPtr)
> {
> /* read the record after the one we just read */
>
> /*
>  * NextRecPtr is pointing to end+1 of the previous WAL record.  If
>  * we're at a page boundary, no more records can fit on the current
>  * page. We must skip over the page header, but we can't do that until
>  * we've read in the page, since the header size is variable.
>  */
> }
> else
> {
> /*
>  * Caller supplied a position to start at.
>  *
>  * In this case, NextRecPtr should already be pointing to a valid
>  * record starting position.
>  */
> Assert(XRecOffIsValid(RecPtr));
> randAccess = true;
> }
>
> Since DecodeRecPtr is NULL, you take the else branch, and then you
> fail an assertion.
>
> I tried adding a --beginread argument to pg_waldump (patch attached)
> to further verify this:
>
> [rhaas pgsql]$ pg_waldump -n1
> /Users/rhaas/pgstandby/pg_wal/0002000500A0
> rmgr: Heaplen (rec/tot): 72/72, tx:5778572, lsn:
> 5/A028, prev 5/9FB8, desc: HOT_UPDATE off 39 xmax 5778572
> flags 0x20 ; new off 62 xmax 0, blkref #0: rel 1663/16388/16402 blk 1
> [rhaas pgsql]$ pg_waldump -n1 --beginread
> /Users/rhaas/pgstandby/pg_wal/0002000500A0
> Assertion failed: (((RecPtr) % 8192 >= (((uintptr_t)
> ((sizeof(XLogPageHeaderData))) + ((8) - 1)) & ~((uintptr_t) ((8) -
> 1), function XLogDecodeNextRecord, file xlogreader.c, line 582.
> Abort trap: 6 (core dumped)
>
> The WAL record begins at offset 0x28 in the block, which I believe is
> the length of a long page header, so this is indeed a WAL segment that
> begins with a brand new record, not a continuation record.
>
> There are two ways we could fix this, I believe. One is to correct the
> comment at the start of XLogBeginRead() to reflect the way things
> actually work at present. The other is to correct the code to do what
> the header comment claims. I would prefer the latter, because I'd like
> to be able to use the EndRecPtr of the last record read by one
> xlogreader as the starting point for a new xlogreader created at a
> later time. I've found that, when there's no record spanning the block
> boundary, the EndRecPtr points to the start of the next block, not the
> start of the first record in the next block. I could dodge the problem
> here by just always using XLogFindNextRecord() rather than
> XLogBeginRecord(), but I'd actually like it to go boom if I somehow
> end up trying to start from an LSN that's in the middle of a record
> somewhere (or the middle of the page header) because those cases
> shouldn't happen. But if I just have an LSN that happens to be the
> start of the block header rather than the start of the record that
> follows the block header, I'd like that case to be tolerated, because
> the LSN I'm using came from the xlogreader machinery.
>
> Thoughts?

Yeah I think it makes sense to make it work as per the comment in
XLogBeginRecord().  I think if we modify the Assert as per the comment
of XLogBeginRecord() then the remaining code of the
XLogDecodeNextRecord() is capable enough to take care of skipping the
page header if we are pointing at the beginning of the block.

See attached patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From aa3a39d8bf0e10f3664bdb8c9ca009f17c46d30d Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 17 Aug 2022 11:15:58 +0530
Subject: [PATCH] Adjust XLogDecodeNextRecord assert to match with
 XLogBeginRead

---
 src/backend/access/transam/xlogreader.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 06e9154..4c901ad 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -577,9 +577,10 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking)
 		 * Caller supplied a position to start at.
 		 *
 		 * In this case, NextRecPtr should already be pointing to a valid
-		 * record starting position.
+		 * record starting position or to beginning of a page, refer comments
+		 * atop XLogBeginRead.
 		 */
-		Assert(XRecOffIsValid(RecPtr));
+		Assert(RecPtr % 

Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-08-16 Thread Bharath Rupireddy
On Wed, Aug 17, 2022 at 2:52 AM Nathan Bossart  wrote:
>
> On Tue, Jul 19, 2022 at 05:29:10PM +0530, Bharath Rupireddy wrote:
> > I've also added the total number of WAL files a checkpoint has
> > processed (scanned the pg_wal directory) while removing old WAL files.
> > This helps to estimate the pg_wal disk space at the time of a
> > particular checkpoint, especially useful for debugging issues.
>
> І don't think it's clear what "processed" means here.  In any case, I think
> this part belongs in a separate patch or maybe even a new thread.

Agreed. PSA v11 patch.

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
From f5e0761835e863dd5b308756b262f897c99938b2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 17 Aug 2022 05:45:00 +
Subject: [PATCH v11] Add time taken for processing logical decoding files to
 checkpoint log

At times, there can be many snapshot and mapping files under
pg_logical dir that the checkpoint might have to delete/fsync
based on the cutoff LSN which can increase the checkpoint time.
Add stats related to these files to better understand the delays
or time spent by the checkpointer processing them.
---
 src/backend/access/transam/xlog.c | 22 +-
 src/include/access/xlog.h |  4 
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9cedd6876f..251b73a4f0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6090,7 +6090,8 @@ LogCheckpointEnd(bool restartpoint)
 sync_msecs,
 total_msecs,
 longest_msecs,
-average_msecs;
+average_msecs,
+l_dec_ops_msecs;
 	uint64		average_sync_time;
 
 	CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
@@ -6127,6 +6128,9 @@ LogCheckpointEnd(bool restartpoint)
 			CheckpointStats.ckpt_sync_rels;
 	average_msecs = (long) ((average_sync_time + 999) / 1000);
 
+	l_dec_ops_msecs = TimestampDifferenceMilliseconds(CheckpointStats.l_dec_ops_start_t,
+	  CheckpointStats.l_dec_ops_end_t);
+
 	/*
 	 * ControlFileLock is not required to see ControlFile->checkPoint and
 	 * ->checkPointCopy here as we are the only updator of those variables at
@@ -6139,7 +6143,8 @@ LogCheckpointEnd(bool restartpoint)
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		"distance=%d kB, estimate=%d kB; "
-		"lsn=%X/%X, redo lsn=%X/%X",
+		"lsn=%X/%X, redo lsn=%X/%X; "
+		"logical decoding file(s) processing time=%ld.%03d s",
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
@@ -6154,7 +6159,8 @@ LogCheckpointEnd(bool restartpoint)
 		(int) (PrevCheckPointDistance / 1024.0),
 		(int) (CheckPointDistanceEstimate / 1024.0),
 		LSN_FORMAT_ARGS(ControlFile->checkPoint),
-		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo;
+		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo),
+		l_dec_ops_msecs / 1000, (int) (l_dec_ops_msecs % 1000;
 	else
 		ereport(LOG,
 (errmsg("checkpoint complete: wrote %d buffers (%.1f%%); "
@@ -6162,7 +6168,8 @@ LogCheckpointEnd(bool restartpoint)
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		"distance=%d kB, estimate=%d kB; "
-		"lsn=%X/%X, redo lsn=%X/%X",
+		"lsn=%X/%X, redo lsn=%X/%X; "
+		"logical decoding file(s) processing time=%ld.%03d s",
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
@@ -6177,7 +6184,8 @@ LogCheckpointEnd(bool restartpoint)
 		(int) (PrevCheckPointDistance / 1024.0),
 		(int) (CheckPointDistanceEstimate / 1024.0),
 		LSN_FORMAT_ARGS(ControlFile->checkPoint),
-		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo;
+		LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo),
+		l_dec_ops_msecs / 1000, (int) (l_dec_ops_msecs % 1000;
 }
 
 /*
@@ -6846,8 +6854,12 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
 	CheckPointReplicationSlots();
+
+	CheckpointStats.l_dec_ops_start_t = GetCurrentTimestamp();
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
+	CheckpointStats.l_dec_ops_end_t = GetCurrentTimestamp();
+
 	CheckPointReplicationOrigin();
 
 	/* Write out all dirty data in SLRUs and the main buffer pool */
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index cd674c3c23..c71aca8534 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -173,6 +173,10 @@ typedef struct CheckpointStatsData
 	 * times, which is not necessarily the
 	 * same as the total elapsed time for the
 	 * entire sync phase. */
+
+	/* start and end timestamps of logical 

Assertion failure on PG15 with modified test_shm_mq test

2022-08-16 Thread Pavan Deolasee
Hello,

I've a slightly modified version of test_shm_mq, that I changed to include
a shared fileset.  The motivation to do that came because I hit an
assertion failure with PG15 while doing some development work on BDR and I
suspected it to be a PG15 bug.

The stack trace looks as below:

(lldb) bt
* thread #1
  * frame #0: 0x7ff8187b100e libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7ff8187e71ff libsystem_pthread.dylib`pthread_kill + 263
frame #2: 0x7ff818732d24 libsystem_c.dylib`abort + 123
frame #3: 0x00010fce1bab
postgres`ExceptionalCondition(conditionName="pgstat_is_initialized &&
!pgstat_is_shutdown", errorType="FailedAssertion", fileName="pgstat.c",
lineNumber=1227) at assert.c:69:2
frame #4: 0x00010fb06412 postgres`pgstat_assert_is_up at
pgstat.c:1227:2
frame #5: 0x00010fb0d2c7
postgres`pgstat_get_entry_ref(kind=PGSTAT_KIND_DATABASE, dboid=0, objoid=0,
create=true, created_entry=0x) at pgstat_shmem.c:406:2
frame #6: 0x00010fb07579
postgres`pgstat_prep_pending_entry(kind=PGSTAT_KIND_DATABASE, dboid=0,
objoid=0, created_entry=0x) at pgstat.c:1068:14
frame #7: 0x00010fb09cce
postgres`pgstat_prep_database_pending(dboid=0) at pgstat_database.c:327:14
frame #8: 0x00010fb09dff
postgres`pgstat_report_tempfile(filesize=0) at pgstat_database.c:179:10
frame #9: 0x00010fa8dbe9
postgres`ReportTemporaryFileUsage(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0",
size=0) at fd.c:1521:2
frame #10: 0x00010fa8db3c
postgres`PathNameDeleteTemporaryFile(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0",
error_on_failure=false) at fd.c:1945:3
frame #11: 0x00010fa8d3a8
postgres`unlink_if_exists_fname(fname="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0",
isdir=false, elevel=15) at fd.c:3674:3
frame #12: 0x00010fa8d270
postgres`walkdir(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset",
action=(postgres`unlink_if_exists_fname at fd.c:3663),
process_symlinks=false, elevel=15) at fd.c:3573:5
frame #13: 0x00010fa8d0e2
postgres`PathNameDeleteTemporaryDir(dirname="base/pgsql_tmp/pgsql_tmp17312.0.fileset")
at fd.c:1689:2
frame #14: 0x00010fa91ac1
postgres`FileSetDeleteAll(fileset=0x000119240870) at fileset.c:165:3
frame #15: 0x00010fa92b08
postgres`SharedFileSetOnDetach(segment=0x7f93ff00a7c0,
datum=4716759152) at sharedfileset.c:119:3
frame #16: 0x00010fa96b00
postgres`dsm_detach(seg=0x7f93ff00a7c0) at dsm.c:801:3
frame #17: 0x00010fa96f51 postgres`dsm_backend_shutdown at
dsm.c:738:3
frame #18: 0x00010fa99402 postgres`shmem_exit(code=1) at ipc.c:259:2
frame #19: 0x00010fa99227 postgres`proc_exit_prepare(code=1) at
ipc.c:194:2
frame #20: 0x00010fa99133 postgres`proc_exit(code=1) at ipc.c:107:2
frame #21: 0x00010fce318c postgres`errfinish(filename="postgres.c",
lineno=3204, funcname="ProcessInterrupts") at elog.c:661:3
frame #22: 0x00010fad7c51 postgres`ProcessInterrupts at
postgres.c:3201:4
frame #23: 0x00011924d85b
test_shm_mq.so`test_shm_mq_main(main_arg=1155036180) at worker.c:159:2
frame #24: 0x00010f9da11e postgres`StartBackgroundWorker at
bgworker.c:858:2
frame #25: 0x00010f9e80b4
postgres`do_start_bgworker(rw=0x7f93ef904080) at postmaster.c:5823:4
frame #26: 0x00010f9e2524 postgres`maybe_start_bgworkers at
postmaster.c:6047:9
frame #27: 0x00010f9e0e63
postgres`sigusr1_handler(postgres_signal_arg=30) at postmaster.c:5204:3
frame #28: 0x7ff8187fcdfd libsystem_platform.dylib`_sigtramp + 29
frame #29: 0x7ff8187b2d5b libsystem_kernel.dylib`__select + 11
frame #30: 0x00010f9e268d postgres`ServerLoop at
postmaster.c:1770:13
frame #31: 0x00010f9e0157 postgres`PostmasterMain(argc=8,
argv=0x62f30190) at postmaster.c:1478:11
frame #32: 0x00010f8bc930 postgres`main(argc=8,
argv=0x62f30190) at main.c:202:3
frame #33: 0x00011f7d651e dyld`start + 462

I notice that pgstat_shutdown_hook() is registered as a before-shmem-exit
callback. The callback is responsible for detaching from the pgstat shared
memory segment. But looks like other parts of the system still expect it to
be available during later stages of proc exit.

It's not clear to me if pgstat shutdown should happen later or code that
gets executed later in the cycle should not try to use pgstat. It's also
entirely possible that my usage of SharedFileSet is completely wrong. If
that's the case, please let me know the mistake in the usage.

Patch modifying the test case is attached. In order to reproduce the
problem quickly,  I added a CHECK_FOR_INTERRUPTS() in the test, but I don't
see why that would be a bad coding practice.

Thanks,
Pavan


crash_pgstat.patch
Description: Binary data


Remove dummyret definition

2022-08-16 Thread Peter Eisentraut


dummyret hasn't been used in a while (last use removed by 50d22de932, 
and before that 84b6d5f359), and since we are now preferring inline 
functions over complex macros, it's unlikely to be needed again.From 513c41d4a79cad768d5ead3366a43def2b3e272b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 17 Aug 2022 07:22:34 +0200
Subject: [PATCH] Remove dummyret definition

This hasn't been used in a while (last use removed by 50d22de932, and
before that 84b6d5f359), and since we are now preferring inline
functions over complex macros, it's unlikely to be needed again.
---
 src/include/c.h | 10 --
 1 file changed, 10 deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index 65e91a6b89..dfc366b026 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -333,16 +333,6 @@
_61,_62,_63,  N, ...) \
(N)
 
-/*
- * dummyret is used to set return values in macros that use ?: to make
- * assignments.  gcc wants these to be void, other compilers like char
- */
-#ifdef __GNUC__/* GNU cc */
-#define dummyret   void
-#else
-#define dummyret   char
-#endif
-
 /*
  * Generic function pointer.  This can be used in the rare cases where it's
  * necessary to cast a function pointer to a seemingly incompatible function
-- 
2.37.1



Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Amit Kapila
On Wed, Aug 17, 2022 at 6:27 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-08-16 19:55:18 -0400, Tom Lane wrote:
> > Robert Haas  writes:
> > > What sort of random things would someone do with pageinspect functions
> > > that would hold buffer pins on one buffer while locking another one?
> > > The functions in hashfuncs.c don't even seem like they would access
> > > multiple buffers in total, let alone at overlapping times. And I don't
> > > think that a query pageinspect could realistically be suspended while
> > > holding a buffer pin either. If you wrapped it in a cursor it'd be
> > > suspended before or after accessing any given buffer, not right in the
> > > middle of that operation.
> >
> > pin != access.  Unless things have changed really drastically since
> > I last looked, a seqscan will sit on a buffer pin throughout the
> > series of fetches from a single page.
>
> That's still the case. But for heap that shouldn't be a problem, because we'll
> never try to take a cleanup lock while holding another page locked (nor even
> another heap page pinned, I think).
>
>
> I find it *highly* suspect that hash needs to acquire a cleanup lock while
> holding another buffer locked. The recovery aspect alone makes that seem quite
> unwise. Even if there's possibly no deadlock here for some reason or another.
>
>
> Looking at the non-recovery code makes me even more suspicious:
>
> /*
>  * Physically allocate the new bucket's primary page.  We want to do 
> this
>  * before changing the metapage's mapping info, in case we can't get 
> the
>  * disk space.  Ideally, we don't need to check for cleanup lock on 
> new
>  * bucket as no other backend could find this bucket unless meta page 
> is
>  * updated.  However, it is good to be consistent with old bucket 
> locking.
>  */
> buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
> if (!IsBufferCleanupOK(buf_nblkno))
> {
> _hash_relbuf(rel, buf_oblkno);
> _hash_relbuf(rel, buf_nblkno);
> goto fail;
> }
>
>
> _hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
> memset(0)s the whole page. What does it even mean to check whether you
> effectively have a cleanup lock after you zeroed out the page?
>
> Reading the README and the comment above makes me wonder if this whole cleanup
> lock business here is just cargo culting and could be dropped?
>

I think it is okay to not acquire a clean-up lock on the new bucket
page both in recovery and non-recovery paths. It is primarily required
on the old bucket page to avoid concurrent scans/inserts. As mentioned
in the comments and as per my memory serves, it is mainly for keeping
it consistent with old bucket locking.

-- 
With Regards,
Amit Kapila.




Re: pg_walinspect: ReadNextXLogRecord's first_record argument

2022-08-16 Thread Bharath Rupireddy
On Tue, Aug 16, 2022 at 10:04 PM Robert Haas  wrote:
>
> Hi,
>
> I was looking at the code for pg_walinspect today and I think I may
> have found a bug (or else I'm confused about how this all works, which
> is also possible). ReadNextXLogRecord() takes an argument first_record
> of type XLogRecPtr which is used only for error reporting purposes: if
> we fail to read the next record for a reason other than end-of-WAL, we
> complain that we couldn't read the WAL at the LSN specified by
> first_record.
>
> ReadNextXLogRecord() has three callers. In pg_get_wal_record_info(),
> we're just reading a single record, and the LSN passed as first_record
> is the LSN at which that record starts. Cool. But in the other two
> callers, GetWALRecordsInfo() and GetWalStats(), we're reading multiple
> records, and first_record is always passed as the LSN of the first
> record. That's logical enough given the name of the argument, but the
> effect of it seems to be that an error while reading any of the
> records will be reported using the LSN of the first record, which does
> not seem right.

Indeed. Thanks a lot for finding it.

> By contrast, pg_rewind's extractPageMap() reports the error using
> xlogreader->EndRecPtr. I think that's correct. The toplevel xlogreader
> function that we're calling here is XLogReadRecord(), which in turn
> calls XLogNextRecord(), which has this comment:
>
> /*
>  * state->EndRecPtr is expected to have been set by the last call to
>  * XLogBeginRead() or XLogNextRecord(), and is the location of the
>  * error.
>  */
>
> Thoughts?

Agreed.

Here's a patch (for V15 as well) fixing this bug, please review.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/


v1-master-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch
Description: Binary data


v1-PG15-Use-correct-LSN-for-error-reporting-in-pg_walinsp.patch
Description: Binary data


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-16 Thread Dilip Kumar
On Fri, Aug 12, 2022 at 6:33 PM Robert Haas  wrote:
>
> On Thu, Aug 11, 2022 at 2:15 PM Robert Haas  wrote:
> > As far as I know, this 0001 addresses all outstanding comments and
> > fixes the reported bug.
> >
> > Does anyone think otherwise?
>
> If they do, they're keeping quiet, so I committed this and
> back-patched it to v15.
>
> Regarding 0002 -- should it, perhaps, use PGAlignedBlock?

Yes we can do that, although here we are not using this buffer
directly as a "Page" so we do not have any real alignment issue but I
do not see any problem in using PGAlignedBlock so change that.

> Although 0002 is formally a performance optimization, I'm inclined to
> think that if we're going to commit it, it should also be back-patched
> into v15, because letting the code diverge when we're not even out of
> beta yet seems painful.

Yeah that makes sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 59fadefe04f8f2eeb6bc5e2e02efde56d5ace8aa Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Fri, 5 Aug 2022 11:25:23 +0530
Subject: [PATCH v4] Optimize copy storage from source to destination

Instead of extending block at a time directly bulkextend the destination
relation and then just perform the block level copy.
---
 src/backend/storage/buffer/bufmgr.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 9c1bd50..7a1202c 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3712,6 +3712,7 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	bool		use_wal;
 	BlockNumber nblocks;
 	BlockNumber blkno;
+	PGAlignedBlock buf;
 	BufferAccessStrategy bstrategy_src;
 	BufferAccessStrategy bstrategy_dst;
 
@@ -3730,6 +3731,14 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 	if (nblocks == 0)
 		return;
 
+	/*
+	 * Bulk extend the destination relation of the same size as the source
+	 * relation before starting to copy block by block.
+	 */
+	memset(buf.data, 0, BLCKSZ);
+	smgrextend(smgropen(dstlocator, InvalidBackendId), forkNum, nblocks - 1,
+			   buf.data, true);
+
 	/* This is a bulk operation, so use buffer access strategies. */
 	bstrategy_src = GetAccessStrategy(BAS_BULKREAD);
 	bstrategy_dst = GetAccessStrategy(BAS_BULKWRITE);
@@ -3747,7 +3756,7 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
 		srcPage = BufferGetPage(srcBuf);
 
 		/* Use P_NEW to extend the destination relation. */
-		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, P_NEW,
+		dstBuf = ReadBufferWithoutRelcache(dstlocator, forkNum, blkno,
 		   RBM_NORMAL, bstrategy_dst,
 		   permanent);
 		LockBuffer(dstBuf, BUFFER_LOCK_EXCLUSIVE);
-- 
1.8.3.1



Regarding availability of 32bit client drivers for postgresql 13/14

2022-08-16 Thread Aravind Phaneendra
Hi,



My name is Aravind and I am part of IBM CICS TX product development and 
support. We have a requirement from one of our customers to use IBM CICS TX 
with Postgresql 13/14. IBM CICS TX is a Transaction Manager middleware product 
that is deployed as container on Kubernetes platforms. IBM CICS TX can interact 
with database products such as DB2, Oracle, MSSQL, Postgresql through XA/Open 
standards.

CICS TX is a 32bit C runtime product and uses the databases’ 32bit client 
libraries to perform embedded SQL operations. The customer applications are 
Embedded SQL C or COBOL programs deployed on CICS TX and CICS TX runtime 
executes them as transactions ensuring the data integrity.

We observed there are no 32bit client binaries/libraries available with 
Postgresql 13/14 and CICS TX require them to interact with the postgresql 
server. Currently we have tested with Postgresql 10.12.1 and our customer wants 
to upgrade to Postgresql 13 or 14.

Based on the above requirements and details, we have few questions which 
require your support.



  1.  Can we get 32bit client binaries/libraries for postgresql 14 ?
  2.  We also found that the libraries can be built by using the postgresql 14 
source. Is it possible to build the 32bit client binaries/libraries from the 
source available ?
  3.  Is there an official support for 32bit client libraries/binaries built 
out of source for customers ?
  4.  Can the postgresql 10.12.1 client work with Postgresql 14 server ? Do you 
still support postgresql 10.12.1 client ?


Thanks & Regards,
Aravind Phaneendra
CICS TX and TXSeries Development & L3 Support
India Systems Development Labs
IBM Systems



Regarding availability of 32bit client drivers for postgresql 13/14

2022-08-16 Thread Aravind Phaneendra
Hi,



My name is Aravind and I am part of IBM CICS TX product development and 
support. We have a requirement from one of our customers to use IBM CICS TX 
with Postgresql 13/14. IBM CICS TX is a Transaction Manager middleware product 
that is deployed as container on Kubernetes platforms. IBM CICS TX can interact 
with database products such as DB2, Oracle, MSSQL, Postgresql through XA/Open 
standards.

CICS TX is a 32bit C runtime product and uses the databases’ 32bit client 
libraries to perform embedded SQL operations. The customer applications are 
Embedded SQL C or COBOL programs deployed on CICS TX and CICS TX runtime 
executes them as transactions ensuring the data integrity.

We observed there are no 32bit client binaries/libraries available with 
Postgresql 13/14 and CICS TX require them to interact with the postgresql 
server. Currently we have tested with Postgresql 10.12.1 and our customer wants 
to upgrade to Postgresql 13 or 14.

Based on the above requirements and details, we have few questions which 
require your support.



  1.  Can we get 32bit client binaries/libraries for postgresql 14 ?
  2.  We also found that the libraries can be built by using the postgresql 14 
source. Is it possible to build the 32bit client binaries/libraries from the 
source available ?
  3.  Is there an official support for 32bit client libraries/binaries built 
out of source for customers ?
  4.  Can the postgresql 10.12.1 client work with Postgresql 14 server ? Do you 
still support postgresql 10.12.1 client ?



Thanks & Regards,
Aravind Phaneendra
CICS TX and TXSeries Development & L3 Support
India Systems Development Labs
IBM Systems



RE: Handle infinite recursion in logical replication setup

2022-08-16 Thread houzj.f...@fujitsu.com
On Tuesday, August 2, 2022 8:00 PM Amit Kapila  wrote:
> On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  wrote:
> >
> > On Tue, Jul 26, 2022 at 7:13 AM Jonathan S. Katz 
> wrote:
> > >
> > > Thanks for the example. I agree that it is fairly simple to reproduce.
> > >
> > > I understand that "copy_data = force" is meant to protect a user
> > > from hurting themself. I'm not convinced that this is the best way to do 
> > > so.
> > >
> > > For example today I can subscribe to multiple publications that
> > > write to the same table. If I have a primary key on that table, and
> > > two of the subscriptions try to write an identical ID, we conflict.
> > > We don't have any special flags or modes to guard against that from
> > > happening, though we do have documentation on conflicts and managing
> them.
> > >
> > > AFAICT the same issue with "copy_data" also exists in the above
> > > scenario too, even without the "origin" attribute.
> > >
> >
> > That's true but there is no parameter like origin = NONE which
> > indicates that constraint violations or duplicate data problems won't
> > occur due to replication. In the current case, I think the situation
> > is different because a user has specifically asked not to replicate
> > any remote data by specifying origin = NONE, which should be dealt
> > differently. Note that current users or their setup won't see any
> > difference/change unless they specify the new parameter origin as
> > NONE.
> >
> 
> Let me try to summarize the discussion so that it is easier for others to 
> follow.
> The work in this thread is to avoid loops, and duplicate data in logical
> replication when the operations happened on the same table in multiple nodes.
> It has been explained in email [1] with an example of how a logical 
> replication
> setup can lead to duplicate or inconsistent data.
> 
> The idea to solve this problem is that we don't replicate data that is not
> generated locally which we can normally identify based on origin of data in
> WAL. The commit 366283961a achieves that for replication but still the
> problem can happen during initial sync which is performed internally via copy.
> We can't differentiate the data in heap based on origin. So, we decided to
> prohibit the subscription operations that can perform initial sync (ex. Create
> Subscription, Alter Subscription ... Refresh) by detecting that the publisher 
> has
> subscribed to the same table from some other publisher.
> 
> To prohibit the subscription operations, the currently proposed patch throws
> an error.  Then, it also provides a new copy_data option 'force' under which
> the user will still be able to perform the operation. This could be useful 
> when
> the user intentionally wants to replicate the initial data even if it 
> contains data
> from multiple nodes (for example, when in a multi-node setup, one decides to
> get the initial data from just one node and then allow replication of data to
> proceed from each of respective nodes).
> 
> The other alternative discussed was to just give a warning for subscription
> operations and probably document the steps for users to avoid it. But the
> problem with that is once the user sees this warning, it won't be able to do
> anything except recreate the setup, so why not give an error in the first 
> place?
> 
> Thoughts?

Thanks for the summary.

I think it's fine to make the user use the copy_data option more carefully to
prevent duplicate copies by reporting an ERROR.

But I also have similar concern with Sawada-san as it's possible for user to
receive an ERROR in some unexpected cases.

For example I want to build bi-directional setup between two nodes:

Node A: TABLE test (has actual data)
Node B: TABLE test (empty)

Step 1:
CREATE PUBLICATION on both Node A and B.

Step 2:
CREATE SUBSCRIPTION on Node A with (copy_data = on)
-- this is fine as there is no data on Node B

Step 3:
CREATE SUBSCRIPTION on Node B with (copy_data = on)
-- this should be fine as user needs to copy data from Node A to Node B,
-- but we still report an error for this case.

It looks a bit strict to report an ERROR in this case and it seems not easy to
avoid this. So, personally, I think it might be better to document the correct
steps to build the bi-directional replication and probably also docuemnt the
steps to recover if user accidently did duplicate initial copy if not
documented yet.

In addition, we could also LOG some additional information about the ORIGIN and
initial copy which might help user to analyze if needed.


- Some other thoughts about the duplicate initial copy problem.

Actually, I feel the better way to address the possible duplicate copy problem
is to provide a command like "bi_setup" which can help user build the
bi-directional replication in all nodes and can handle the initial copy
automtically. But that might be too far.

Another naive idea I once thought is that maybe we can add a publication option
like: data_source_in_bi_group. If 

Re: build remaining Flex files standalone

2022-08-16 Thread John Naylor
On Wed, Aug 17, 2022 at 8:14 AM Andres Freund  wrote:

> > > > +/* functions shared between guc.c and guc-file.l */
> > > > [...]
> > > I think I prefer your suggestion of a guc_internal.h upthread.
> >
> > Started in 0002, but left open the headerscheck failure.
> >
> > Also, if such a thing is meant to be #include'd only by two generated
> > files, maybe it should just live in the directory where they live, and
> > not in the src/include dir?
>
> It's not something we've done for the backend afaics, but I don't see a reason
> not to start at some point.

BTW, I forgot to mention I did this for the json path parser, which
makes the makefile code simpler than what was there before
550b9d26f80fa30. AFAICS, we could also do the same for gramparse.h,
which is internal to parser.c. If I'm not mistaken, the only reason we
symlink gram.h to src/include/* is so that gramparse.h can include it.
So keeping gramparse.h in the backend could allow removing some gram.h
makefile incantations.

> > > Why does this need to be defined in a semi-public header? If we do this in
> > > multiple files we'll end up with the danger of macro redefinition 
> > > warnings.
> >
> > I tried to put all the Flex/Bison stuff in another *_internal header,
> > but that breaks the build. Putting just this one symbol in a header is
> > silly, but done that way for now. Maybe two copies of the symbol?
>
> The problem is that if it's in a header you can't include another header with
> such a define. That's fine if it's a .h that's just intended to be included by
> a limited set of files, but for something like a header for a datatype that
> might need to be included to e.g. define a PL transform or a new operator or
> ...  This would be solved by the %code requires thing, right?

I believe it would.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Bug: When user-defined AM is used, the index path cannot be selected correctly

2022-08-16 Thread Tom Lane
Quan Zongliang  writes:
> 1. When using extended PGroonga
> ...
> 3. Neither ID = 'f' nor id= 't' can use the index correctly.

This works fine for btree indexes.  I think the actual problem
is that IsBooleanOpfamily only accepts the btree and hash
opclasses, and that's what needs to be improved.  Your proposed
patch fails to do that, which makes it just a crude hack that
solves some aspects of the issue (and probably breaks other
things).

It might work to change IsBooleanOpfamily so that it checks to
see whether BooleanEqualOperator is a member of the opclass.
That's basically what we need to know before we dare generate
substitute index clauses.  It's kind of an expensive test
though, and the existing coding assumes that IsBooleanOpfamily
is cheap ...

regards, tom lane




Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2022-08-16 Thread Justin Pryzby
On Tue, Aug 02, 2022 at 11:21:04PM +1200, David Rowley wrote:
> I've now pushed the patch.

I've not studied the patch at all.

But in a few places, it removes the locally-computed group_pathkeys:

-   List   *group_pathkeys = root->group_pathkeys;

However it doesn't do that here:

/*
 * Instead of operating directly on the input relation, we can
 * consider finalizing a partially aggregated path.
 */
if (partially_grouped_rel != NULL)
{
foreach(lc, partially_grouped_rel->pathlist)
{
ListCell   *lc2;
Path   *path = (Path *) lfirst(lc);
Path   *path_original = path;
 
List   *pathkey_orderings = NIL;
 
List   *group_pathkeys = 
root->group_pathkeys;

I noticed because that creates a new shadow variable, which seems accidental.

make src/backend/optimizer/plan/planner.o COPT=-Wshadow=compatible-local

src/backend/optimizer/plan/planner.c:6642:14: warning: declaration of 
‘group_pathkeys’ shadows a previous local [-Wshadow=compatible-local]
 6642 | List*group_pathkeys = root->group_pathkeys;
  |  ^~
src/backend/optimizer/plan/planner.c:6438:12: note: shadowed declaration is here
 6438 |   List*group_pathkeys;

-- 
Justin




Re: Inconsistent error message for varchar(n)

2022-08-16 Thread Bruce Momjian
On Sun, Nov 14, 2021 at 10:33:19AM +0800, Japin Li wrote:
> 
> On Sat, 13 Nov 2021 at 23:42, Tom Lane  wrote:
> > Japin Li  writes:
> >> postgres=# CREATE TABLE tbl (s varchar(2147483647));
> >> ERROR:  length for type varchar cannot exceed 10485760
> >> LINE 1: CREATE TABLE tbl (s varchar(2147483647));
> >> ^
> >
> >> postgres=# CREATE TABLE tbl (s varchar(2147483648));
> >> ERROR:  syntax error at or near "2147483648"
> >> LINE 1: CREATE TABLE tbl (s varchar(2147483648));
> >> ^
> >
> > I'm having a very hard time getting excited about that.  We could maybe
> > switch the grammar production to use generic expr_list syntax for the
> > typmod, like GenericType does.  But that would just result in this:
> >
> > regression=# CREATE TABLE tbl (s "varchar"(2147483648));
> > ERROR:  value "2147483648" is out of range for type integer
> > LINE 1: CREATE TABLE tbl (s "varchar"(2147483648));
> > ^
> >
> > which doesn't seem any less confusing for a novice who doesn't know
> > that typmods are constrained to be integers.
> >
> > There might be something to be said for switching all the hard-wired
> > type productions to use opt_type_modifiers and pushing the knowledge
> > that's in, eg, opt_float out to per-type typmodin routines.  But any
> > benefit would be in reduction of the grammar size, and I'm dubious
> > that it'd be worth the trouble.  I suspect that overall, the resulting
> > error messages would be slightly worse not better --- note for example
> > the poorer placement of the error cursor above.  A related example is
> >
> > regression=# CREATE TABLE tbl (s varchar(2,3));
> > ERROR:  syntax error at or near ","
> > LINE 1: CREATE TABLE tbl (s varchar(2,3));
> >  ^
> > regression=# CREATE TABLE tbl (s "varchar"(2,3));
> > ERROR:  invalid type modifier
> > LINE 1: CREATE TABLE tbl (s "varchar"(2,3));
> > ^
> >
> > That's explained by the comment in anychar_typmodin:
> >
> >  * we're not too tense about good error message here because grammar
> >  * shouldn't allow wrong number of modifiers for CHAR
> >
> > and we could surely improve that message, but anychar_typmodin can't give
> > a really on-point error cursor.
> >
> 
> Oh! I didn't consider this situation.  Since the max size of varchar cannot
> exceed 10485760, however, I cannot find this in documentation [1]. Is there
> something I missed? Should we mention this in the documentation?
> 
> [1] https://www.postgresql.org/docs/devel/datatype-character.html

Sorry for my long delay in reviewing this issue.  You are correct this
should be documented --- patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 4cc9e59270..07c3654b21 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1217,6 +1217,8 @@ SELECT '52093.89'::money::numeric::float8;
 char(n) are aliases for character
 varying(n) and
 character(n), respectively.
+The length specification must be greater than zero and cannot
+exceed 10485760.
 character without length specifier is equivalent to
 character(1). If character varying is used
 without length specifier, the type accepts strings of any size. The


Re: build remaining Flex files standalone

2022-08-16 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-16 17:41:43 +0700, John Naylor wrote:
>> That directive has been supported in Bison since 2.4.2.

> 2.4.2 is from 2010. So I think we could just start relying on it?

Apple is still shipping 2.3.  Is this worth enough to make Mac
users install a non-default Bison?  I seriously doubt it.

I don't say that there won't be a reason that justifies that
at some point, but letting headerscheck test autogenerated
files seems of only microscopic benefit :-(

regards, tom lane




Re: SQL/JSON features for v15

2022-08-16 Thread Jonathan S. Katz

Hi,

On 8/15/22 10:14 PM, Andres Freund wrote:


I pushed a few cleanups to https://github.com/anarazel/postgres/commits/json
while I was hacking on this (ignore that it's based on the meson tree, that's
just faster for me). Some of them might not be applicable anymore, but it
might still make sense for you to look at.


With RMT hat on, this appears to be making progress. A few questions / 
comments for the group:


1. Nikita: Did you have a chance to review Andres's changes as well?

2. There seems to be some, though limited, progress on design docs. 
Andres keeps making a point on adding additional comments to the code to 
make it easier to follow. Please do not lose sight of this.


3. Robert raised a point about the use of subtransactions and the 
increased risk of wraparound on busy systems using the SQL/JSON 
features. Do these patches help reduce this risk? I read some clarity on 
the use of subtransactions within the patchset, but want to better 
understand if the risks pointed out are a concern.


Thanks everyone for your work on this so far!

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Bug: When user-defined AM is used, the index path cannot be selected correctly

2022-08-16 Thread Quan Zongliang


Hi,

1. When using extended PGroonga

CREATE EXTENSION pgroonga;

CREATE TABLE memos (
  id boolean,
  content varchar
);

CREATE INDEX idxA ON memos USING pgroonga (id);

2. Disable bitmapscan and seqscan:

SET enable_seqscan=off;
SET enable_indexscan=on;
SET enable_bitmapscan=off;

3. Neither ID = 'f' nor id= 't' can use the index correctly.

postgres=# explain select * from memos where id='f';
QUERY PLAN
--
 Seq Scan on memos  (cost=100.00..101.06 rows=3 width=33)
   Filter: (NOT id)
(2 rows)

postgres=# explain select * from memos where id='t';
QUERY PLAN
--
 Seq Scan on memos  (cost=100.00..101.06 rows=3 width=33)
   Filter: id
(2 rows)

postgres=# explain select * from memos where id>='t';
QUERY PLAN
---
 Index Scan using idxa on memos  (cost=0.00..4.01 rows=2 width=33)
   Index Cond: (id >= true)
(2 rows)



The reason is that these expressions are converted to BoolExpr and Var.
match_clause_to_indexcol does not use them to check boolean-index.

patch attached.

--
Quan Zongliang
Beijing Vastdatadiff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 7d176e7b00..31229ce16c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2295,7 +2295,7 @@ match_clause_to_indexcol(PlannerInfo *root,
 
/* First check for boolean-index cases. */
opfamily = index->opfamily[indexcol];
-   if (IsBooleanOpfamily(opfamily))
+   if (IsBooleanOpfamily(opfamily) || IsA(clause, BoolExpr) || IsA(clause, 
Var))
{
iclause = match_boolean_index_clause(root, rinfo, indexcol, 
index);
if (iclause)


Re: build remaining Flex files standalone

2022-08-16 Thread Andres Freund
Hi,

On 2022-08-16 17:41:43 +0700, John Naylor wrote:
> For v3, I addressed some comments and added .h files to the
> headerscheck exceptions.

Thanks!


> /*
>  * NB: include bootparse.h only AFTER including bootstrap.h, because 
> bootstrap.h
>  * includes node definitions needed for YYSTYPE.
>  */
> 
> Future cleanup: I see this in headerscheck:
> 
> # We can't make these Bison output files compilable standalone
> # without using "%code require", which old Bison versions lack.
> # parser/gram.h will be included by parser/gramparse.h anyway.
> 
> That directive has been supported in Bison since 2.4.2.

2.4.2 is from 2010. So I think we could just start relying on it?


> > > +/* functions shared between guc.c and guc-file.l */
> > > [...]
> > I think I prefer your suggestion of a guc_internal.h upthread.
> 
> Started in 0002, but left open the headerscheck failure.
> 
> Also, if such a thing is meant to be #include'd only by two generated
> files, maybe it should just live in the directory where they live, and
> not in the src/include dir?

It's not something we've done for the backend afaics, but I don't see a reason
not to start at some point.


> > > From 7d4ecfcb3e91f3b45e94b9e64c7c40f1bbd22aa8 Mon Sep 17 00:00:00 2001
> > > From: John Naylor 
> > > Date: Fri, 12 Aug 2022 15:45:24 +0700
> > > Subject: [PATCH v201 2/9] Build booscanner.c standalone
> >
> > > -# bootscanner is compiled as part of bootparse
> > > -bootparse.o: bootscanner.c
> > > +# See notes in src/backend/parser/Makefile about the following two rules
> > > +bootparse.h: bootparse.c
> > > + touch $@
> > > +
> > > +bootparse.c: BISONFLAGS += -d
> > > +
> > > +# Force these dependencies to be known even without dependency info 
> > > built:
> > > +bootparse.o bootscan.o: bootparse.h
> >
> > Wonder if we could / should wrap this is something common. It's somewhat
> > annoying to repeat this stuff everywhere.
> 
> I haven't looked at the Meson effort recently, but if the build rule
> is less annoying there, I'm inclined to leave this as a wart until
> autotools are retired.

The only complicating thing in the rules there is the dependencies from one .c
file to another .c file.


> > > diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h
> > > index dbe7d4f742..0b373048b5 100644
> > > --- a/contrib/cube/cubedata.h
> > > +++ b/contrib/cube/cubedata.h
> > > @@ -67,3 +67,7 @@ extern void cube_scanner_finish(void);
> > >
> > >  /* in cubeparse.y */
> > >  extern int   cube_yyparse(NDBOX **result);
> > > +
> > > +/* All grammar constructs return strings */
> > > +#define YYSTYPE char *
> >
> > Why does this need to be defined in a semi-public header? If we do this in
> > multiple files we'll end up with the danger of macro redefinition warnings.
> 
> I tried to put all the Flex/Bison stuff in another *_internal header,
> but that breaks the build. Putting just this one symbol in a header is
> silly, but done that way for now. Maybe two copies of the symbol?

The problem is that if it's in a header you can't include another header with
such a define. That's fine if it's a .h that's just intended to be included by
a limited set of files, but for something like a header for a datatype that
might need to be included to e.g. define a PL transform or a new operator or
...  This would be solved by the %code requires thing, right?


Greetings,

Andres Freund




Re: Propose a new function - list_is_empty

2022-08-16 Thread Peter Smith
On Wed, Aug 17, 2022 at 6:34 AM Daniel Gustafsson  wrote:
>
> > On 16 Aug 2022, at 16:03, Tom Lane  wrote:
>
> > I agree though that while simplifying list_length() calls, I'd lean to using
> > explicit comparisons to NIL.
>
>
> Agreed, I prefer that too.
>

Thanks for the feedback.

PSA patch v3 which now uses explicit comparisons to NIL everywhere,
and also addresses the other review comments.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0001-use-NIL-test-for-empty-List-checks.patch
Description: Binary data


Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Andres Freund
Hi,

On 2022-08-16 19:55:18 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > What sort of random things would someone do with pageinspect functions
> > that would hold buffer pins on one buffer while locking another one?
> > The functions in hashfuncs.c don't even seem like they would access
> > multiple buffers in total, let alone at overlapping times. And I don't
> > think that a query pageinspect could realistically be suspended while
> > holding a buffer pin either. If you wrapped it in a cursor it'd be
> > suspended before or after accessing any given buffer, not right in the
> > middle of that operation.
> 
> pin != access.  Unless things have changed really drastically since
> I last looked, a seqscan will sit on a buffer pin throughout the
> series of fetches from a single page.

That's still the case. But for heap that shouldn't be a problem, because we'll
never try to take a cleanup lock while holding another page locked (nor even
another heap page pinned, I think).


I find it *highly* suspect that hash needs to acquire a cleanup lock while
holding another buffer locked. The recovery aspect alone makes that seem quite
unwise. Even if there's possibly no deadlock here for some reason or another.


Looking at the non-recovery code makes me even more suspicious:

/*
 * Physically allocate the new bucket's primary page.  We want to do 
this
 * before changing the metapage's mapping info, in case we can't get the
 * disk space.  Ideally, we don't need to check for cleanup lock on new
 * bucket as no other backend could find this bucket unless meta page is
 * updated.  However, it is good to be consistent with old bucket 
locking.
 */
buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
if (!IsBufferCleanupOK(buf_nblkno))
{
_hash_relbuf(rel, buf_oblkno);
_hash_relbuf(rel, buf_nblkno);
goto fail;
}


_hash_getnewbuf() calls _hash_pageinit() which calls PageInit(), which
memset(0)s the whole page. What does it even mean to check whether you
effectively have a cleanup lock after you zeroed out the page?

Reading the README and the comment above makes me wonder if this whole cleanup
lock business here is just cargo culting and could be dropped?



> Admittedly, that's about *heap* page pins while indexscans have
> different rules.  But I recall that btrees at least use persistent
> pins as well.

I think that's been changed, although not in an unproblematic way.


> Having said that, you're right that this is qualitatively different
> from the other bug, in that this is a deadlock not apparent data
> corruption.  However, IIUC it's an LWLock deadlock, which we don't
> handle all that nicely.

Theoretically the startup side could be interrupted. Except that we don't
accept the startup process dying...

Greetings,

Andres Freund




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Andres Freund
Hi,

On 2022-08-16 17:02:27 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > I had that thought too, but I don't *think* it's the case. This
> > function acquires a lock on the oldest bucket page, then on the new
> > bucket page. We could deadlock if someone who holds a pin on the new
> > bucket page tries to take a content lock on the old bucket page. But
> > who would do that? The new bucket page isn't yet linked from the
> > metapage at this point, so no scan should do that. There can be no
> > concurrent writers during replay. I think that if someone else has the
> > new page pinned they probably should not be taking content locks on
> > other buffers at the same time.
> 
> Agreed, the core code shouldn't do that, but somebody doing random stuff
> with pageinspect functions could probably make a query do this.
> See [1]; unless we're going to reject that bug with "don't do that",
> I'm not too comfortable with this line of reasoning.

I don't think we can defend against lwlock deadlocks where somebody doesn't
follow the AM's deadlock avoidance strategy. I.e. it's fine to pin and lock
pages from some AM without knowing that AM's rules, as long as you only block
while holding a pin/lock of a single page. But it is *not* ok to block waiting
for an lwlock / pin while already holding an lwlock / pin on some other
buffer.  If we were concerned about this we'd have to basically throw many of
our multi-page operations that rely on lock order logic out.

Greetings,

Andres Freund




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Tom Lane
Robert Haas  writes:
> What sort of random things would someone do with pageinspect functions
> that would hold buffer pins on one buffer while locking another one?
> The functions in hashfuncs.c don't even seem like they would access
> multiple buffers in total, let alone at overlapping times. And I don't
> think that a query pageinspect could realistically be suspended while
> holding a buffer pin either. If you wrapped it in a cursor it'd be
> suspended before or after accessing any given buffer, not right in the
> middle of that operation.

pin != access.  Unless things have changed really drastically since
I last looked, a seqscan will sit on a buffer pin throughout the
series of fetches from a single page.

Admittedly, that's about *heap* page pins while indexscans have
different rules.  But I recall that btrees at least use persistent
pins as well.

It may be that there is indeed no way to make this happen with
available SQL tools.  But I wouldn't put a lot of money on that,
and even less that it'll stay true in the future.

Having said that, you're right that this is qualitatively different
from the other bug, in that this is a deadlock not apparent data
corruption.  However, IIUC it's an LWLock deadlock, which we don't
handle all that nicely.

regards, tom lane




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Robert Haas
On Tue, Aug 16, 2022 at 5:02 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I had that thought too, but I don't *think* it's the case. This
> > function acquires a lock on the oldest bucket page, then on the new
> > bucket page. We could deadlock if someone who holds a pin on the new
> > bucket page tries to take a content lock on the old bucket page. But
> > who would do that? The new bucket page isn't yet linked from the
> > metapage at this point, so no scan should do that. There can be no
> > concurrent writers during replay. I think that if someone else has the
> > new page pinned they probably should not be taking content locks on
> > other buffers at the same time.
>
> Agreed, the core code shouldn't do that, but somebody doing random stuff
> with pageinspect functions could probably make a query do this.
> See [1]; unless we're going to reject that bug with "don't do that",
> I'm not too comfortable with this line of reasoning.

I don't see the connection. The problem there has to do with bypassing
shared buffers, but this operation isn't bypassing shared buffers.

What sort of random things would someone do with pageinspect functions
that would hold buffer pins on one buffer while locking another one?
The functions in hashfuncs.c don't even seem like they would access
multiple buffers in total, let alone at overlapping times. And I don't
think that a query pageinspect could realistically be suspended while
holding a buffer pin either. If you wrapped it in a cursor it'd be
suspended before or after accessing any given buffer, not right in the
middle of that operation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: identifying the backend that owns a temporary schema

2022-08-16 Thread Nathan Bossart
On Mon, Aug 15, 2022 at 02:47:25PM -0700, Jeremy Schneider wrote:
> I'll take a look at the patch if I can... and I'm hopeful that we're
> able to move this idea forward and get this little gap in PG filled once
> and for all!

Thanks!

I noticed that the "result" variable in pg_stat_get_backend_idset() is kind
of pointless after my patch is applied, so here is a v2 with it removed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 0f2af75e8b1be3dc75979f4902cc4634636a3fe2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 12 Aug 2022 23:07:37 -0700
Subject: [PATCH v2 1/1] Adjust pg_stat_get_backend_*() to use backends' PGPROC
 backend IDs.

Presently, these functions use the index of the backend's entry in
localBackendStatusTable as the backend ID.  While this might bear
some resemblance to the backend ID of the backend's PGPROC struct,
it can quickly diverge as sessions connect and disconnect.  This
change modifies the pg_stat_get_backend_*() suite of functions to
use the PGPROC backend IDs instead.  While we could instead
introduce a new function for retrieving PGPROC backend IDs,
presenting two sets of backend IDs to users seems likely to cause
even more confusion than what already exists.

This is primarily useful for discovering the session that owns a
resource named with its PGPROC backend ID.  For example, temporary
schema names include the PGPROC backend ID, and it might be
necessary to identify the session that owns a temporary table that
is putting the cluster in danger of transaction ID wraparound.

Author: Nathan Bossart
---
 doc/src/sgml/monitoring.sgml|  8 ++---
 src/backend/utils/activity/backend_status.c | 40 +++--
 src/backend/utils/adt/pgstatfuncs.c | 11 +++---
 src/include/utils/backend_status.h  |  8 +
 4 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..ecd0410229 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5488,10 +5488,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
information.  In such cases, an older set of per-backend statistics
access functions can be used; these are shown in .
-   These access functions use a backend ID number, which ranges from one
-   to the number of currently active backends.
+   These access functions use the process's backend ID number.
The function pg_stat_get_backend_idset provides a
-   convenient way to generate one row for each active backend for
+   convenient way to list all the active backends' ID numbers for
invoking these functions.  For example, to show the PIDs and
current queries of all backends:
 
@@ -5526,8 +5525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 setof integer


-Returns the set of currently active backend ID numbers (from 1 to the
-number of active backends).
+Returns the set of currently active backend ID numbers.

   
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..159d022070 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -849,6 +849,7 @@ pgstat_read_current_status(void)
 			BackendIdGetTransactionIds(i,
 	   >backend_xid,
 	   >backend_xmin);
+			localentry->backend_id = i;
 
 			localentry++;
 			localappname += NAMEDATALEN;
@@ -1045,6 +1046,22 @@ pgstat_get_my_query_id(void)
 	return MyBEEntry->st_query_id;
 }
 
+/* --
+ * cmp_lbestatus
+ *
+ *	Comparison function for bsearch() on an array of LocalPgBackendStatus.  The
+ *	backend_id field is used to compare the arguments.
+ *
+ * --
+ */
+static int
+cmp_lbestatus(const void *a, const void *b)
+{
+	LocalPgBackendStatus *lbestatus1 = (LocalPgBackendStatus *) a;
+	LocalPgBackendStatus *lbestatus2 = (LocalPgBackendStatus *) b;
+
+	return lbestatus1->backend_id - lbestatus2->backend_id;
+}
 
 /* --
  * pgstat_fetch_stat_beentry() -
@@ -1052,6 +1069,10 @@ pgstat_get_my_query_id(void)
  *	Support function for the SQL-callable pgstat* functions. Returns
  *	our local copy of the current-activity entry for one backend.
  *
+ *	Unlike pgstat_fetch_stat_local_beentry(), the beid argument refers to the
+ *	backend ID stored in the backend's PGPROC struct instead of its index in
+ *	the localBackendStatusTable.
+ *
  *	NB: caller is responsible for a check if the user is permitted to see
  *	this info (especially the querystring).
  * --
@@ -1059,12 +1080,23 @@ pgstat_get_my_query_id(void)
 PgBackendStatus *
 pgstat_fetch_stat_beentry(int beid)
 {
+	LocalPgBackendStatus key;
+	LocalPgBackendStatus *ret;
+
 	pgstat_read_current_status();
 
-	if (beid < 1 || beid > localNumBackends)
+	if (beid < 1 || beid > NumBackendStatSlots)
 		return NULL;
 
-	return [beid - 

Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-08-16 Thread Nathan Bossart
On Tue, Jul 19, 2022 at 05:29:10PM +0530, Bharath Rupireddy wrote:
> I've also added the total number of WAL files a checkpoint has
> processed (scanned the pg_wal directory) while removing old WAL files.
> This helps to estimate the pg_wal disk space at the time of a
> particular checkpoint, especially useful for debugging issues.

І don't think it's clear what "processed" means here.  In any case, I think
this part belongs in a separate patch or maybe even a new thread.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Generalize ereport_startup_progress infrastructure

2022-08-16 Thread Nathan Bossart
On Wed, Aug 10, 2022 at 11:00:20AM -0400, Robert Haas wrote:
> Maybe the checkpointer is a better candidate, but somehow I feel that
> we can't consider this sort of thing separate from the existing
> progress reporting that checkpointer already does. Perhaps we need to
> think of changing or improving that in some way rather than adding
> something wholly new alongside the existing system.

I agree that the checkpointer has a good chance of being a better
candidate.  Are you thinking of integrating this into log_checkpoints
somehow?  Perhaps this parameter could optionally accept an interval for
logging the progress of ongoing checkpoints.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Tom Lane
Robert Haas  writes:
> I had that thought too, but I don't *think* it's the case. This
> function acquires a lock on the oldest bucket page, then on the new
> bucket page. We could deadlock if someone who holds a pin on the new
> bucket page tries to take a content lock on the old bucket page. But
> who would do that? The new bucket page isn't yet linked from the
> metapage at this point, so no scan should do that. There can be no
> concurrent writers during replay. I think that if someone else has the
> new page pinned they probably should not be taking content locks on
> other buffers at the same time.

Agreed, the core code shouldn't do that, but somebody doing random stuff
with pageinspect functions could probably make a query do this.
See [1]; unless we're going to reject that bug with "don't do that",
I'm not too comfortable with this line of reasoning.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/17568-ef121b956ec1559c%40postgresql.org




Re: Making Vars outer-join aware

2022-08-16 Thread Tom Lane
I wrote:
> ... We can fix
> that by adding an index array to go straight from phid to the
> PlaceHolderInfo.  While thinking about where to construct such
> an index array, I decided it'd be a good idea to have an explicit
> step to "freeze" the set of PlaceHolderInfos, at the start of
> deconstruct_jointree.

On further thought, it seems better to maintain the index array
from the start, allowing complete replacement of the linear list
searches.  We can add a separate bool flag to denote frozen-ness.
This does have minor downsides:

* Some fiddly code is needed to enlarge the index array at need.
But it's not different from that for, say, simple_rel_array.

* If we ever have a need to mutate the placeholder_list as a whole,
we'd have to reconstruct the index array to point to the new
objects.  We don't do that at present, except in one place in
analyzejoins.c, which is easily fixed.  While the same argument
could be raised against the v1 patch, it's not very likely that
we'd be doing such mutation beyond the start of deconstruct_jointree.

Hence, see v2 attached.

regards, tom lane

diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c
index 3b0f0584f0..f0e8cd9965 100644
--- a/src/backend/optimizer/util/placeholder.c
+++ b/src/backend/optimizer/util/placeholder.c
@@ -400,14 +400,16 @@ add_placeholders_to_base_rels(PlannerInfo *root)
 
 /*
  * add_placeholders_to_joinrel
- *		Add any required PlaceHolderVars to a join rel's targetlist;
- *		and if they contain lateral references, add those references to the
- *		joinrel's direct_lateral_relids.
+ *		Add any newly-computable PlaceHolderVars to a join rel's targetlist;
+ *		and if computable PHVs contain lateral references, add those
+ *		references to the joinrel's direct_lateral_relids.
  *
  * A join rel should emit a PlaceHolderVar if (a) the PHV can be computed
  * at or below this join level and (b) the PHV is needed above this level.
- * However, condition (a) is sufficient to add to direct_lateral_relids,
- * as explained below.
+ * Our caller build_join_rel() has already added any PHVs that were computed
+ * in either join input rel, so we need add only newly-computable ones to
+ * the targetlist.  However, direct_lateral_relids must be updated for every
+ * PHV computable at or below this join, as explained below.
  */
 void
 add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
@@ -426,13 +428,10 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 			/* Is it still needed above this joinrel? */
 			if (bms_nonempty_difference(phinfo->ph_needed, relids))
 			{
-/* Yup, add it to the output */
-joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs,
-	phinfo->ph_var);
-joinrel->reltarget->width += phinfo->ph_width;
-
 /*
- * Charge the cost of evaluating the contained expression if
+ * Yes, but only add to tlist if it wasn't computed in either
+ * input; otherwise it should be there already.  Also, we
+ * charge the cost of evaluating the contained expression if
  * the PHV can be computed here but not in either input.  This
  * is a bit bogus because we make the decision based on the
  * first pair of possible input relations considered for the
@@ -445,12 +444,15 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 if (!bms_is_subset(phinfo->ph_eval_at, outer_rel->relids) &&
 	!bms_is_subset(phinfo->ph_eval_at, inner_rel->relids))
 {
+	PlaceHolderVar *phv = phinfo->ph_var;
 	QualCost	cost;
 
-	cost_qual_eval_node(, (Node *) phinfo->ph_var->phexpr,
-		root);
+	joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs,
+		phv);
+	cost_qual_eval_node(, (Node *) phv->phexpr, root);
 	joinrel->reltarget->cost.startup += cost.startup;
 	joinrel->reltarget->cost.per_tuple += cost.per_tuple;
+	joinrel->reltarget->width += phinfo->ph_width;
 }
 			}
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..442c12acef 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -688,6 +688,8 @@ build_join_rel(PlannerInfo *root,
 	 */
 	build_joinrel_tlist(root, joinrel, outer_rel);
 	build_joinrel_tlist(root, joinrel, inner_rel);
+
+	/* Add any newly-computable PlaceHolderVars, too */
 	add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
 
 	/*
@@ -966,6 +968,7 @@ min_join_parameterization(PlannerInfo *root,
  * The join's targetlist includes all Vars of its member relations that
  * will still be needed above the join.  This subroutine adds all such
  * Vars from the specified input rel's tlist to the join rel's tlist.
+ * Likewise for any PlaceHolderVars emitted by the input rel.
  *
  * We also compute the expected width of the join's output, making use
  * of data that was cached at the baserel level by 

Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

2022-08-16 Thread Robert Haas
On Wed, Aug 10, 2022 at 1:28 AM Andres Freund  wrote:
> I assume this is trying to defend against some sort of deadlock by not
> actually getting a cleanup lock (by passing get_cleanup_lock = true to
> XLogReadBufferForRedoExtended()).

I had that thought too, but I don't *think* it's the case. This
function acquires a lock on the oldest bucket page, then on the new
bucket page. We could deadlock if someone who holds a pin on the new
bucket page tries to take a content lock on the old bucket page. But
who would do that? The new bucket page isn't yet linked from the
metapage at this point, so no scan should do that. There can be no
concurrent writers during replay. I think that if someone else has the
new page pinned they probably should not be taking content locks on
other buffers at the same time.

So maybe we can just apply something like the attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


hash-xlog-split-allocate-page-v1.patch
Description: Binary data


Re: Propose a new function - list_is_empty

2022-08-16 Thread Daniel Gustafsson
> On 16 Aug 2022, at 16:03, Tom Lane  wrote:

> I agree though that while simplifying list_length() calls, I'd lean to using
> explicit comparisons to NIL.


Agreed, I prefer that too.

--
Daniel Gustafsson   https://vmware.com/





Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-08-16 Thread Nathan Bossart
On Wed, Aug 10, 2022 at 03:28:25PM +0530, Bharath Rupireddy wrote:
>   snprintf(path, sizeof(path), "pg_logical/mappings/%s", 
> mapping_de->d_name);
> - if (lstat(path, ) == 0 && !S_ISREG(statbuf.st_mode))
> + if (get_dirent_type(path, mapping_de, false, LOG) != 
> PGFILETYPE_REG)
>   continue;

Previously, failure to lstat() wouldn't lead to skipping the entry.  With
this patch, a failure to determine the file type will cause the entry to be
skipped.  This might be okay in some places (e.g., CheckPointSnapBuild())
but not in others.  For example, in CheckPointLogicalRewriteHeap(), this
could cause us to skip fsync-ing a file due to a get_dirent_type() failure,
which seems bad.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add support for DEFAULT specification in COPY FROM

2022-08-16 Thread Andrew Dunstan


On 2022-08-16 Tu 14:12, Israel Barth Rubio wrote:
> Hello all,
>
> With the current implementation of COPY FROM in PostgreSQL we are able to
> load the DEFAULT value/expression of a column if the column is absent
> in the
> list of specified columns. We are not able to explicitly ask that
> PostgreSQL uses
> the DEFAULT value/expression in a column that is being fetched from
> the input
> file, though.
>
> This patch adds support for handling DEFAULT values in COPY FROM. It
> works
> similarly to NULL in COPY FROM: whenever the marker that was set for
> DEFAULT
> value/expression is read from the input stream, it will evaluate the
> DEFAULT
> value/expression of the corresponding column.
>
> I'm currently working as a support engineer, and both me and some
> customers had
> already faced a situation where we missed an implementation like this
> in COPY
> FROM, and had to work around that by using an input file where the
> column which
> has a DEFAULT value/expression was removed.
>
> That does not solve all issues though, as it might be the case that we
> just want a
> DEFAULT value to take place if no other value was set for the column
> in the input
> file, meaning we would like to have a column in the input file that
> sometimes assume
> the DEFAULT value/expression, and sometimes assume an actual given value.
>
> The implementation was performed about one month ago and included all
> regression
> tests regarding the changes that were introduced. It was just rebased
> on top of the
> master branch before submitting this patch, and all tests are still
> succeeding.
>
> The implementation takes advantage of the logic that was already
> implemented to
> handle DEFAULT values for missing columns in COPY FROM. I just
> modified it to
> make it available the DEFAULT values/expressions for all columns
> instead of only
> for the ones that were missing in the specification. I had to change
> the variables
> accordingly, so it would index the correct positions in the new array
> of DEFAULT
> values/expressions.
>
> Besides that, I also copied and pasted most of the checks that are
> performed for the
> NULL feature of COPY FROM, as the DEFAULT behaves somehow similarly.
>
>


Interesting, and probably useful. I've only had a brief look, but it's
important that the default marker not be quoted in CSV mode (c.f. NULL)
-f it is it should be taken as a literal rather than a special value.
Maybe that's taken care of, but there should at least be a test for it,
which I didn't see.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: EINTR in ftruncate()

2022-08-16 Thread Thomas Munro
On Sat, Jul 16, 2022 at 5:18 PM Thomas Munro  wrote:
> On Sat, Jul 16, 2022 at 1:28 AM Tom Lane  wrote:
> > Thomas Munro  writes:
> > > On Fri, Jul 15, 2022 at 9:34 AM Tom Lane  wrote:
> > >> (Someday we oughta go ahead and make our Windows signal API look more
> > >> like POSIX, as I suggested back in 2015.  I'm still not taking
> > >> point on that, though.)
> >
> > > For the sigprocmask() part, here's a patch that passes CI.  Only the
> > > SIG_SETMASK case is actually exercised by our current code, though.
> >
> > Passes an eyeball check, but I can't actually test it.
>
> Thanks.  Pushed.
>
> I'm not brave enough to try to write a replacement sigaction() yet,
> but it does appear that we could rip more ugliness and inconsistencies
> that way, eg sa_mask.

Here's a draft patch that adds a minimal sigaction() implementation
for Windows.  It doesn't implement stuff we're not using, for sample
sa_sigaction functions, but it has the sa_mask feature so we can
harmonize the stuff that I believe you were talking about.
From f00f169e10b884dab337b73b31de89577fb662de Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 17 Aug 2022 07:36:53 +1200
Subject: [PATCH] Provide sigaction for Windows.

Harmonize Unix and Windows code by modernizing our signal emulation to
use sigaction() instead of signal().

Discussion: https://postgr.es/m/CA%2BhUKGK5ax1AoFMvXksLROrxwB1%3D-EsT%3D7%2B6oUtCn47GbC9zjA%40mail.gmail.com
---
 src/backend/libpq/pqsignal.c|  9 -
 src/backend/port/win32/signal.c | 41 +++--
 src/backend/postmaster/postmaster.c | 57 ++---
 src/include/libpq/pqsignal.h| 14 +++
 src/port/pqsignal.c | 22 +--
 5 files changed, 56 insertions(+), 87 deletions(-)

diff --git a/src/backend/libpq/pqsignal.c b/src/backend/libpq/pqsignal.c
index 26ed671d1a..1ab34c5214 100644
--- a/src/backend/libpq/pqsignal.c
+++ b/src/backend/libpq/pqsignal.c
@@ -110,16 +110,10 @@ pqinitmask(void)
  * postmaster ever unblocks signals.
  *
  * pqinitmask() must have been invoked previously.
- *
- * On Windows, this function is just an alias for pqsignal()
- * (and note that it's calling the code in src/backend/port/win32/signal.c,
- * not src/port/pqsignal.c).  On that platform, the postmaster's signal
- * handlers still have to block signals for themselves.
  */
 pqsigfunc
 pqsignal_pm(int signo, pqsigfunc func)
 {
-#ifndef WIN32
 	struct sigaction act,
 oact;
 
@@ -142,7 +136,4 @@ pqsignal_pm(int signo, pqsigfunc func)
 	if (sigaction(signo, , ) < 0)
 		return SIG_ERR;
 	return oact.sa_handler;
-#else			/* WIN32 */
-	return pqsignal(signo, func);
-#endif
 }
diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index 53b93a50b2..d533de1bc6 100644
--- a/src/backend/port/win32/signal.c
+++ b/src/backend/port/win32/signal.c
@@ -34,7 +34,7 @@ HANDLE		pgwin32_initial_signal_pipe = INVALID_HANDLE_VALUE;
 static CRITICAL_SECTION pg_signal_crit_sec;
 
 /* Note that array elements 0 are unused since they correspond to signal 0 */
-static pqsigfunc pg_signal_array[PG_SIGNAL_COUNT];
+static struct sigaction pg_signal_array[PG_SIGNAL_COUNT];
 static pqsigfunc pg_signal_defaults[PG_SIGNAL_COUNT];
 
 
@@ -85,7 +85,9 @@ pgwin32_signal_initialize(void)
 
 	for (i = 0; i < PG_SIGNAL_COUNT; i++)
 	{
-		pg_signal_array[i] = SIG_DFL;
+		pg_signal_array[i].sa_handler = SIG_DFL;
+		pg_signal_array[i].sa_mask = 0;
+		pg_signal_array[i].sa_flags = 0;
 		pg_signal_defaults[i] = SIG_IGN;
 	}
 	pg_signal_mask = 0;
@@ -131,15 +133,27 @@ pgwin32_dispatch_queued_signals(void)
 			if (exec_mask & sigmask(i))
 			{
 /* Execute this signal */
-pqsigfunc	sig = pg_signal_array[i];
+struct sigaction *act = _signal_array[i];
+pqsigfunc	sig = act->sa_handler;
 
 if (sig == SIG_DFL)
 	sig = pg_signal_defaults[i];
 pg_signal_queue &= ~sigmask(i);
 if (sig != SIG_ERR && sig != SIG_IGN && sig != SIG_DFL)
 {
+	sigset_t	block_mask;
+	sigset_t	save_mask;
+
 	LeaveCriticalSection(_signal_crit_sec);
+
+	block_mask = act->sa_mask;
+	if ((act->sa_flags & SA_NODEFER) == 0)
+		block_mask |= sigmask(i);
+
+	sigprocmask(SIG_BLOCK, _mask, _mask);
 	sig(i);
+	sigprocmask(SIG_SETMASK, _mask, NULL);
+
 	EnterCriticalSection(_signal_crit_sec);
 	break;		/* Restart outer loop, in case signal mask or
  * queue has been modified inside signal
@@ -187,22 +201,25 @@ pqsigprocmask(int how, const sigset_t *set, sigset_t *oset)
 	return 0;
 }
 
-
 /*
  * Unix-like signal handler installation
  *
  * Only called on main thread, no sync required
  */
-pqsigfunc
-pqsignal(int signum, pqsigfunc handler)
+int
+pqsigaction(int signum, const struct sigaction *act,
+			struct sigaction *oldact)
 {
-	pqsigfunc	prevfunc;
-
 	if (signum >= PG_SIGNAL_COUNT || signum < 0)
-		return SIG_ERR;
-	prevfunc = pg_signal_array[signum];
-	pg_signal_array[signum] = handler;
-	return prevfunc;
+	{
+		errno = EINVAL;

Re: MultiXact\SLRU buffers configuration

2022-08-16 Thread i . lazarev

Andrey Borodin wrote 2022-07-23 11:39:

Yura, thank you for your benchmarks!
We already knew that patch can save the day on pathological workloads,
now we have a proof of this.
Also there's the evidence that user can blindly increase size of SLRU
if they want (with the second patch). So there's no need for hard
explanations on how to tune the buffers size.


Hi @Andrey.Borodin, With some considerations and performance checks from 
@Yura.Sokolov we simplified your approach by the following:


1. Preamble. We feel free to increase any SLRU's, since there's no 
performance degradation on large Buffers count using your SLRU buckets 
solution.
2. `slru_buffers_size_scale` is only one config param introduced for all 
SLRUs. It scales SLRUs upper cap by power 2.
3. All SLRU buffers count are capped by both `MBuffers (shared_buffers)` 
and `slru_buffers_size_scale`. see
4. Magic initial constants `NUM_*_BUFFERS << slru_buffers_size_scale` 
are applied for every SLRU.
5. All SLRU buffers are always sized as power of 2, their hash bucket 
size is always 8.


There's attached patch for your consideration. It does gather and 
simplify both `v21-0001-Make-all-SLRU-buffer-sizes-configurable.patch` 
and `v21-0002-Divide-SLRU-buffers-into-8-associative-banks.patch` to 
much simpler approach.


Thank you, Yours,
- Ivan
Index: src/include/miscadmin.h
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
--- a/src/include/miscadmin.h	(revision 020258fbd30d37ddd03d0ec68264d1544f8d2838)
+++ b/src/include/miscadmin.h	(date 1660678108730)
@@ -176,6 +176,7 @@
 extern PGDLLIMPORT int MaxConnections;
 extern PGDLLIMPORT int max_worker_processes;
 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int slru_buffers_size_scale;
 
 extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
Index: src/include/access/subtrans.h
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
--- a/src/include/access/subtrans.h	(revision 020258fbd30d37ddd03d0ec68264d1544f8d2838)
+++ b/src/include/access/subtrans.h	(date 1660678108698)
@@ -12,7 +12,7 @@
 #define SUBTRANS_H
 
 /* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
+#define NUM_SUBTRANS_BUFFERS	(32 << slru_buffers_size_scale)
 
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
Index: src/backend/utils/init/globals.c
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
--- a/src/backend/utils/init/globals.c	(revision 020258fbd30d37ddd03d0ec68264d1544f8d2838)
+++ b/src/backend/utils/init/globals.c	(date 1660678064048)
@@ -150,3 +150,5 @@
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
+
+int			slru_buffers_size_scale = 2;	/* power 2 scale for SLRU buffers */
Index: src/backend/access/transam/slru.c
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
--- a/src/backend/access/transam/slru.c	(revision 020258fbd30d37ddd03d0ec68264d1544f8d2838)
+++ b/src/backend/access/transam/slru.c	(date 1660678063980)
@@ -59,6 +59,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "port/pg_bitutils.h"
 
 #define SlruFileName(ctl, path, seg) \
 	snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
@@ -71,6 +72,17 @@
  */
 #define MAX_WRITEALL_BUFFERS	16
 
+/*
+ * To avoid overflowing internal arithmetic and the size_t data type, the
+ * number of buffers should not exceed this number.
+ */
+#define SLRU_MAX_ALLOWED_BUFFERS ((1024 * 1024 * 1024) / BLCKSZ)
+
+/*
+ * SLRU bank size for slotno hash banks
+ */
+#define SLRU_BANK_SIZE 8
+
 typedef struct SlruWriteAllData
 {
 	int			num_files;		/* # files actually open */
@@ -134,7 +146,7 @@
 static SlruErrorCause slru_errcause;
 static int	slru_errno;
 
-
+static void SlruAdjustNSlots(int *nslots, int *bankmask);
 static void SimpleLruZeroLSNs(SlruCtl ctl, int slotno);
 static void SimpleLruWaitIO(SlruCtl ctl, int slotno);
 static void SlruInternalWritePage(SlruCtl ctl, int slotno, SlruWriteAll fdata);
@@ -148,6 +160,25 @@
 	  int segpage, void *data);
 static void SlruInternalDeleteSegment(SlruCtl ctl, int segno);
 
+/*
+ * Pick number of slots and bank size optimal for hashed associative SLRU buffers.
+ * We declare SLRU 

Add support for DEFAULT specification in COPY FROM

2022-08-16 Thread Israel Barth Rubio
Hello all,

With the current implementation of COPY FROM in PostgreSQL we are able to
load the DEFAULT value/expression of a column if the column is absent in the
list of specified columns. We are not able to explicitly ask that
PostgreSQL uses
the DEFAULT value/expression in a column that is being fetched from the
input
file, though.

This patch adds support for handling DEFAULT values in COPY FROM. It works
similarly to NULL in COPY FROM: whenever the marker that was set for DEFAULT
value/expression is read from the input stream, it will evaluate the DEFAULT
value/expression of the corresponding column.

I'm currently working as a support engineer, and both me and some customers
had
already faced a situation where we missed an implementation like this in
COPY
FROM, and had to work around that by using an input file where the column
which
has a DEFAULT value/expression was removed.

That does not solve all issues though, as it might be the case that we just
want a
DEFAULT value to take place if no other value was set for the column in the
input
file, meaning we would like to have a column in the input file that
sometimes assume
the DEFAULT value/expression, and sometimes assume an actual given value.

The implementation was performed about one month ago and included all
regression
tests regarding the changes that were introduced. It was just rebased on
top of the
master branch before submitting this patch, and all tests are still
succeeding.

The implementation takes advantage of the logic that was already
implemented to
handle DEFAULT values for missing columns in COPY FROM. I just modified it
to
make it available the DEFAULT values/expressions for all columns instead of
only
for the ones that were missing in the specification. I had to change the
variables
accordingly, so it would index the correct positions in the new array of
DEFAULT
values/expressions.

Besides that, I also copied and pasted most of the checks that are
performed for the
NULL feature of COPY FROM, as the DEFAULT behaves somehow similarly.

Best regards,
Israel.


v1-0001-Added-support-for-DEFAULT-in-COPY-FROM.patch
Description: Binary data


Re: Making Vars outer-join aware

2022-08-16 Thread Tom Lane
I wrote:
> Richard Guo  writes:
>> If the two issues are indeed something we need to fix, maybe we can
>> change add_placeholders_to_joinrel() to search the PHVs from
>> outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
>> if needed, just like what we do in build_joinrel_tlist(). The PHVs there
>> should have the correct phnullingrels (at least the PHVs in base rels'
>> targetlists have correctly set phnullingrels to NULL).

> Yeah, maybe we should do something more invasive and make use of the
> input targetlists rather than doing this from scratch.  Not sure.
> I'm worried that doing it that way would increase the risk of getting
> different join tlist contents depending on which pair of input rels
> we happen to consider first.

After chewing on that for awhile, I've concluded that that is the way
to go.  0001 attached is a standalone patch to rearrange the way that
PHVs are added to joinrel targetlists.  It results in one cosmetic
change in the regression tests, where the targetlist order for an
intermediate join node changes.  I think that's fine; if anything,
the new order is more sensible than the old because it matches the
inputs' targetlist orders better.

I believe the reason I didn't do it like this to start with is that
I was concerned about the cost of searching the placeholder_list
repeatedly.  With a lot of PHVs, build_joinrel_tlist becomes O(N^2)
just from the repeated find_placeholder_info lookups.  We can fix
that by adding an index array to go straight from phid to the
PlaceHolderInfo.  While thinking about where to construct such
an index array, I decided it'd be a good idea to have an explicit
step to "freeze" the set of PlaceHolderInfos, at the start of
deconstruct_jointree.  This allows getting rid of the create_new_ph
flags for find_placeholder_info and add_vars_to_targetlist, which
I've always feared were bugs waiting to happen: they require callers
to have a very clear understanding of when they're invoked.  There
might be some speed gain over existing code too, but I've not really
tried to measure it.  I did drop a couple of hacks that were only
meant to short-circuit find_placeholder_info calls; that function
should now be cheap enough to not matter.

Barring objections, I'd like to push these soon and then rebase
the main outer-join-vars patch set over them.

regards, tom lane

diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c
index 3b0f0584f0..f0e8cd9965 100644
--- a/src/backend/optimizer/util/placeholder.c
+++ b/src/backend/optimizer/util/placeholder.c
@@ -400,14 +400,16 @@ add_placeholders_to_base_rels(PlannerInfo *root)
 
 /*
  * add_placeholders_to_joinrel
- *		Add any required PlaceHolderVars to a join rel's targetlist;
- *		and if they contain lateral references, add those references to the
- *		joinrel's direct_lateral_relids.
+ *		Add any newly-computable PlaceHolderVars to a join rel's targetlist;
+ *		and if computable PHVs contain lateral references, add those
+ *		references to the joinrel's direct_lateral_relids.
  *
  * A join rel should emit a PlaceHolderVar if (a) the PHV can be computed
  * at or below this join level and (b) the PHV is needed above this level.
- * However, condition (a) is sufficient to add to direct_lateral_relids,
- * as explained below.
+ * Our caller build_join_rel() has already added any PHVs that were computed
+ * in either join input rel, so we need add only newly-computable ones to
+ * the targetlist.  However, direct_lateral_relids must be updated for every
+ * PHV computable at or below this join, as explained below.
  */
 void
 add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
@@ -426,13 +428,10 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 			/* Is it still needed above this joinrel? */
 			if (bms_nonempty_difference(phinfo->ph_needed, relids))
 			{
-/* Yup, add it to the output */
-joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs,
-	phinfo->ph_var);
-joinrel->reltarget->width += phinfo->ph_width;
-
 /*
- * Charge the cost of evaluating the contained expression if
+ * Yes, but only add to tlist if it wasn't computed in either
+ * input; otherwise it should be there already.  Also, we
+ * charge the cost of evaluating the contained expression if
  * the PHV can be computed here but not in either input.  This
  * is a bit bogus because we make the decision based on the
  * first pair of possible input relations considered for the
@@ -445,12 +444,15 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 if (!bms_is_subset(phinfo->ph_eval_at, outer_rel->relids) &&
 	!bms_is_subset(phinfo->ph_eval_at, inner_rel->relids))
 {
+	PlaceHolderVar *phv = phinfo->ph_var;
 	QualCost	cost;
 
-	cost_qual_eval_node(, (Node *) phinfo->ph_var->phexpr,
-		root);
+	joinrel->reltarget->exprs = 

Re: XLogBeginRead's header comment lies

2022-08-16 Thread Robert Haas
Forgot the attachment.


waldump-beginread.patch
Description: Binary data


XLogBeginRead's header comment lies

2022-08-16 Thread Robert Haas
It claims that:

 * 'RecPtr' should point to the beginning of a valid WAL record.  Pointing at
 * the beginning of a page is also OK, if there is a new record right after
 * the page header, i.e. not a continuation.

But this actually doesn't seem to work. This function doesn't itself
have any problem with any LSNs you want to pass it, so if you call
this function with an LSN that is at the beginning of a page, you'll
end up with EndRecPtr set to the LSN you specify and DecodeRecPtr set
to NULL. When you then call XLogReadRecord, you'll reach
XLogDecodeNextRecord, which will do this:

if (state->DecodeRecPtr != InvalidXLogRecPtr)
{
/* read the record after the one we just read */

/*
 * NextRecPtr is pointing to end+1 of the previous WAL record.  If
 * we're at a page boundary, no more records can fit on the current
 * page. We must skip over the page header, but we can't do that until
 * we've read in the page, since the header size is variable.
 */
}
else
{
/*
 * Caller supplied a position to start at.
 *
 * In this case, NextRecPtr should already be pointing to a valid
 * record starting position.
 */
Assert(XRecOffIsValid(RecPtr));
randAccess = true;
}

Since DecodeRecPtr is NULL, you take the else branch, and then you
fail an assertion.

I tried adding a --beginread argument to pg_waldump (patch attached)
to further verify this:

[rhaas pgsql]$ pg_waldump -n1
/Users/rhaas/pgstandby/pg_wal/0002000500A0
rmgr: Heaplen (rec/tot): 72/72, tx:5778572, lsn:
5/A028, prev 5/9FB8, desc: HOT_UPDATE off 39 xmax 5778572
flags 0x20 ; new off 62 xmax 0, blkref #0: rel 1663/16388/16402 blk 1
[rhaas pgsql]$ pg_waldump -n1 --beginread
/Users/rhaas/pgstandby/pg_wal/0002000500A0
Assertion failed: (((RecPtr) % 8192 >= (((uintptr_t)
((sizeof(XLogPageHeaderData))) + ((8) - 1)) & ~((uintptr_t) ((8) -
1), function XLogDecodeNextRecord, file xlogreader.c, line 582.
Abort trap: 6 (core dumped)

The WAL record begins at offset 0x28 in the block, which I believe is
the length of a long page header, so this is indeed a WAL segment that
begins with a brand new record, not a continuation record.

There are two ways we could fix this, I believe. One is to correct the
comment at the start of XLogBeginRead() to reflect the way things
actually work at present. The other is to correct the code to do what
the header comment claims. I would prefer the latter, because I'd like
to be able to use the EndRecPtr of the last record read by one
xlogreader as the starting point for a new xlogreader created at a
later time. I've found that, when there's no record spanning the block
boundary, the EndRecPtr points to the start of the next block, not the
start of the first record in the next block. I could dodge the problem
here by just always using XLogFindNextRecord() rather than
XLogBeginRecord(), but I'd actually like it to go boom if I somehow
end up trying to start from an LSN that's in the middle of a record
somewhere (or the middle of the page header) because those cases
shouldn't happen. But if I just have an LSN that happens to be the
start of the block header rather than the start of the record that
follows the block header, I'd like that case to be tolerated, because
the LSN I'm using came from the xlogreader machinery.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Column Filtering in Logical Replication

2022-08-16 Thread vignesh C
On Mon, Aug 8, 2022 at 2:08 PM Peter Smith  wrote:
>
> PSA patch version v1* for a new "Column Lists" pgdocs section
>
> This is just a first draft, but I wanted to post it as-is, with the
> hope that I can get some feedback while continuing to work on it.

Few comments:
1) Row filters mentions that "It has no effect on TRUNCATE commands.",
the same is not present in case of column filters. We should keep the
changes similarly for consistency.
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -90,8 +90,7 @@ CREATE PUBLICATION name
  
   When a column list is specified, only the named columns are replicated.
   If no column list is specified, all columns of the table are replicated
-  through this publication, including any columns added later.  If a column
-  list is specified, it must include the replica identity columns.
+  through this publication, including any columns added later.

2) The document says that "if the table uses REPLICA IDENTITY FULL,
specifying a column list is not allowed.":
+   publishes only INSERT operations. Furthermore, if the
+   table uses REPLICA IDENTITY FULL, specifying a column
+   list is not allowed.
+  

Did you mean specifying a column list during create publication for
REPLICA IDENTITY FULL table like below scenario:
postgres=# create table t2(c1 int, c2 int, c3 int);
CREATE TABLE
postgres=# alter table t2 replica identity full ;
ALTER TABLE
postgres=# create publication pub1 for table t2(c1,c2);
CREATE PUBLICATION

If so, the document says specifying column list is not allowed, but
creating a publication with column list on replica identity full was
successful.

Regards,
Vignesh




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-08-16 Thread Jacob Champion
On Tue, Aug 16, 2022 at 2:02 AM Drouvot, Bertrand  wrote:
> On 8/14/22 11:57 AM, Michael Paquier wrote:
> >One thing was itching me about the serialization and
> > deserialization logic though: could it be more readable if we used an
> > intermediate structure to store the length of the serialized strings?
> > We use this approach in other areas, like for the snapshot data in
> > snapmgr.c.  This would handle the case of an empty and NULL string, by
> > storing -1 as length for NULL and >= 0 for the string length if there
> > is something set, while making the addition of more fields a
> > no-brainer.
>
> I think that's a good idea and I think that would be more readable (as
> compare to storing a "hint" in the first byte).

Sounds good. v3, attached, should make the requested changes:
- declare `struct ClientConnectionInfo`
- use an intermediate serialization struct
- switch to length-"prefixing" for the string

I do like the way this reads compared to before.

Thanks,
--Jacob
commit 753c46352adc967a903a60ea65a3068252d685e6
Author: Jacob Champion 
Date:   Tue Aug 16 09:14:58 2022 -0700

squash! Allow parallel workers to read authn_id

Per review,
- add an intermediate struct for serialization,
- switch to length-prefixing for the authn_id string, and
- make sure `struct ClientConnectionInfo` is declared for use elsewhere.

diff --git a/src/backend/utils/init/miscinit.c 
b/src/backend/utils/init/miscinit.c
index 155ba92c67..58772d0a4a 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -943,19 +943,29 @@ GetUserNameFromId(Oid roleid, bool noerr)
 
 ClientConnectionInfo MyClientConnectionInfo;
 
+/*
+ * Intermediate representation of ClientConnectionInfo for easier 
serialization.
+ * Variable-length fields are allocated right after this header.
+ */
+typedef struct SerializedClientConnectionInfo
+{
+   int32   authn_id_len; /* strlen(authn_id), or -1 if NULL */
+   UserAuthauth_method;
+} SerializedClientConnectionInfo;
+
 /*
  * Calculate the space needed to serialize MyClientConnectionInfo.
  */
 Size
 EstimateClientConnectionInfoSpace(void)
 {
-   Sizesize = 1;
+   Sizesize = 0;
+
+   size = add_size(size, sizeof(SerializedClientConnectionInfo));
 
if (MyClientConnectionInfo.authn_id)
size = add_size(size, strlen(MyClientConnectionInfo.authn_id) + 
1);
 
-   size = add_size(size, sizeof(UserAuth));
-
return size;
 }
 
@@ -965,32 +975,29 @@ EstimateClientConnectionInfoSpace(void)
 void
 SerializeClientConnectionInfo(Size maxsize, char *start_address)
 {
-   /*
-* First byte is an indication of whether or not authn_id has been set 
to
-* non-NULL, to differentiate that case from the empty string.
-*/
-   Assert(maxsize > 0);
-   start_address[0] = MyClientConnectionInfo.authn_id ? 1 : 0;
-   start_address++;
-   maxsize--;
+   SerializedClientConnectionInfo serialized = {0};
+
+   serialized.authn_id_len = -1;
+   serialized.auth_method = MyClientConnectionInfo.auth_method;
 
if (MyClientConnectionInfo.authn_id)
-   {
-   Size len;
+   serialized.authn_id_len = 
strlen(MyClientConnectionInfo.authn_id);
 
-   len = strlcpy(start_address, MyClientConnectionInfo.authn_id, 
maxsize) + 1;
-   Assert(len <= maxsize);
-   maxsize -= len;
-   start_address += len;
-   }
+   /* Copy serialized representation to buffer */
+   Assert(maxsize >= sizeof(serialized));
+   memcpy(start_address, , sizeof(serialized));
 
-   {
-   UserAuth   *auth_method = (UserAuth*) start_address;
+   maxsize -= sizeof(serialized);
+   start_address += sizeof(serialized);
 
-   Assert(sizeof(*auth_method) <= maxsize);
-   *auth_method = MyClientConnectionInfo.auth_method;
-   maxsize -= sizeof(*auth_method);
-   start_address += sizeof(*auth_method);
+   /* Copy authn_id into the space after the struct. */
+   if (serialized.authn_id_len >= 0)
+   {
+   Assert(maxsize >= (serialized.authn_id_len + 1));
+   memcpy(start_address,
+  MyClientConnectionInfo.authn_id,
+  /* include the NULL terminator to ease 
deserialization */
+  serialized.authn_id_len + 1);
}
 }
 
@@ -1000,25 +1007,19 @@ SerializeClientConnectionInfo(Size maxsize, char 
*start_address)
 void
 RestoreClientConnectionInfo(char *conninfo)
 {
-   if (conninfo[0] == 0)
-   {
-   MyClientConnectionInfo.authn_id = NULL;
-   conninfo++;
-   }
-   else
-   {
-   conninfo++;
-   MyClientConnectionInfo.authn_id = 
MemoryContextStrdup(TopMemoryContext,
-   

Re: SYSTEM_USER reserved word implementation

2022-08-16 Thread Jacob Champion
On Fri, Aug 12, 2022 at 6:32 AM Drouvot, Bertrand  wrote:
> It has been agreed that the work on this patch is on hold until the
> ClientConnectionInfo related work is finished (see the discussion in [1]).
>
> Having said that I'm attaching a new patch
> "v2-0004-system_user-implementation.patch" for the SYSTEM_USER.

(Not a full review.) Now that the implementation has increased in
complexity, the original tests for the parallel workers have become
underpowered. As a concrete example, I forgot to serialize auth_method
during my most recent rewrite, and the tests still passed.

I think it'd be better to check the contents of SYSTEM_USER, where we
can, rather than only testing for existence. Something like the
attached, maybe? And it would also be good to add a similar test to
the authentication suite, so that you don't have to have Kerberos
enabled to fully test SYSTEM_USER.

--Jacob
commit adaff75cb96ec842d15e15df2ee42dd4b3fa1349
Author: Jacob Champion 
Date:   Tue Aug 16 09:32:45 2022 -0700

squash! SYSTEM_USER implementation

Check the contents of SYSTEM_USER in the Kerberos tests, not just its
existence.

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 7ce3b33da6..be0dd7c62d 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -178,11 +178,11 @@ $node->safe_psql('postgres', 'CREATE USER test1;');
 
 # Set up a table for SYSTEM_USER parallel worker testing.
 $node->safe_psql('postgres',
-'CREATE TABLE nulls (n) AS SELECT NULL FROM generate_series(1, 20);'
+"CREATE TABLE ids (id) AS SELECT 'gss:test1\@$realm' FROM 
generate_series(1, 20);"
 );
 
 $node->safe_psql('postgres',
-'GRANT SELECT ON nulls TO public;'
+'GRANT SELECT ON ids TO public;'
 );
 
 note "running tests";
@@ -333,7 +333,7 @@ test_query(
. "SET parallel_setup_cost TO 0;\n"
. "SET parallel_tuple_cost TO 0;\n"
. "SET max_parallel_workers_per_gather TO 2;\n"
-   . "SELECT bool_and(SYSTEM_USER IS DISTINCT FROM n) FROM nulls;",
+   . "SELECT bool_and(SYSTEM_USER = id) FROM ids;",
qr/^t$/s,
'gssencmode=require',
'testing system_user with parallel workers');


Re: Add find_in_log() and advance_wal() perl functions to core test framework (?)

2022-08-16 Thread Alvaro Herrera
On 2022-Aug-16, Andrew Dunstan wrote:

> I don't think there's a hard and fast rule about it. Certainly the case
> would be more compelling if the functions were used across different TAP
> suites. The SSL suite has suite-specific modules. That's a pattern also
> worth considering. e.g something like.
> 
>     use FindBin qw($Bin);
>     use lib $Bin;
>     use MySuite;
> 
> and then you put your common routines in MySuite.pm in the same
> directory as the TAP test files.

Yeah, I agree with that for advance_wal.  Regarding find_in_log, that
one seems general enough to warrant being in Cluster.pm -- consider
issues_sql_like, which also slurps_file($log).  That could be unified a
little bit, I think.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




pg_walinspect: ReadNextXLogRecord's first_record argument

2022-08-16 Thread Robert Haas
Hi,

I was looking at the code for pg_walinspect today and I think I may
have found a bug (or else I'm confused about how this all works, which
is also possible). ReadNextXLogRecord() takes an argument first_record
of type XLogRecPtr which is used only for error reporting purposes: if
we fail to read the next record for a reason other than end-of-WAL, we
complain that we couldn't read the WAL at the LSN specified by
first_record.

ReadNextXLogRecord() has three callers. In pg_get_wal_record_info(),
we're just reading a single record, and the LSN passed as first_record
is the LSN at which that record starts. Cool. But in the other two
callers, GetWALRecordsInfo() and GetWalStats(), we're reading multiple
records, and first_record is always passed as the LSN of the first
record. That's logical enough given the name of the argument, but the
effect of it seems to be that an error while reading any of the
records will be reported using the LSN of the first record, which does
not seem right.

By contrast, pg_rewind's extractPageMap() reports the error using
xlogreader->EndRecPtr. I think that's correct. The toplevel xlogreader
function that we're calling here is XLogReadRecord(), which in turn
calls XLogNextRecord(), which has this comment:

/*
 * state->EndRecPtr is expected to have been set by the last call to
 * XLogBeginRead() or XLogNextRecord(), and is the location of the
 * error.
 */

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_receivewal and SIGTERM

2022-08-16 Thread Bharath Rupireddy
On Tue, Aug 16, 2022 at 7:26 PM Christoph Berg  wrote:
>
> Re: Bharath Rupireddy
> > Don't we need a similar explanation [1] for pg_recvlogical docs?
> >
> > [1]
> >
> > In the absence of fatal errors, pg_receivewal
> > -   will run until terminated by the SIGINT signal
> > -   ( > action="simul">ControlC).
> > +   will run until terminated by the SIGINT
> > +   ( > action="simul">ControlC)
> > +   or SIGTERM signal.
>
> Coped that from pg_receivewal(1) now.

Thanks.

pg_receivewal will exit with status 0 when
-   terminated by the SIGINT signal.  (That is the
+   terminated by the SIGINT or
+   SIGTERM signal.  (That is the
normal way to end it.  Hence it is not an error.)  For fatal errors or
other signals, the exit status will be nonzero.

Can we specify the reason in the docs why a SIGTERM causes (which
typically would cause a program to end with non-zero exit code)
pg_receivewal and pg_recvlogical exit with zero exit code? Having this
in the commit message would help developers but the documentation will
help users out there.

Thoughts?

[1]
pg_receivewal, pg_recvlogical: Exit cleanly on SIGTERM

In pg_receivewal, compressed output is only flushed on clean exits. The
reason to support SIGTERM here as well is that pg_receivewal might well
be running as a daemon, and systemd's default KillSignal is SIGTERM.

Since pg_recvlogical is also supposed to run as a daemon, teach it about
SIGTERM as well.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/




Re: Add find_in_log() and advance_wal() perl functions to core test framework (?)

2022-08-16 Thread Andrew Dunstan


On 2022-08-15 Mo 22:25, Bharath Rupireddy wrote:
> Hi,
>
> It seems like find_in_log() and advance_wal() functions (which are now
> being used in at least 2 places). find_in_log() is defined and being
> used in 2 places 019_replslot_limit.pl and 033_replay_tsp_drops.pl.
> The functionality of advancing WAL is implemented in
> 019_replslot_limit.pl with advance_wal() and 001_stream_repl.pl with
> the same logic as advance_wal() but no function there and an
> in-progress feature [1] needs advance_wal() as-is for tests.
>
> Do these functions qualify to be added to the core test framework in
> Cluster.pm? Or do we need more usages of these functions before we
> generalize and add to the core test framework? If added, a bit of
> duplicate code can be reduced and they become more usable across the
> entire tests for future use.
>
> Thoughts?
>
> [1] 
> https://www.postgresql.org/message-id/calj2acuyz1z6qpdugn5gguckfd-ko44j4hkcomtp6fzv9xe...@mail.gmail.com
>

I don't think there's a hard and fast rule about it. Certainly the case
would be more compelling if the functions were used across different TAP
suites. The SSL suite has suite-specific modules. That's a pattern also
worth considering. e.g something like.

    use FindBin qw($Bin);
    use lib $Bin;
    use MySuite;

and then you put your common routines in MySuite.pm in the same
directory as the TAP test files.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-08-16 Thread Jacob Champion
Hello,

On Fri, Aug 12, 2022 at 6:34 AM Drouvot, Bertrand  wrote:
> +typedef struct
> +{
> +   /*
> +* Authenticated identity.  The meaning of this identifier is
> dependent on
>
> has to be replaced by:
>
> +typedef struct ClientConnectionInfo
> +{
> +   /*
> +* Authenticated identity.  The meaning of this identifier is
> dependent on

Okay, will do in the next patch (coming soon).

> +* Authenticated identity.  The meaning of this identifier is
> dependent on
>
> There is one extra space before "The"

This comment block was just moved verbatim; the double-spaced
sentences were there before.

>  From a coding style point of view, shouldn't "ClientConnectionInfo
> MyClientConnectionInfo;" be moved to the top of the file?

The style in this file seems to be to declare the variables at the top
of the section in which they're used. See the sections for "User ID
state" and "Library preload support".

Thanks,
--Jacob




Re: Making Vars outer-join aware

2022-08-16 Thread Tom Lane
Richard Guo  writes:
> When we add required PlaceHolderVars to a join rel's targetlist, if the
> PHV can be computed in the nullable-side of the join, we would add the
> join's RT index to phnullingrels. This is correct as we know the PHV's
> value can be nulled at this join. But I'm wondering if it is necessary
> since we have already propagated any varnullingrels into the PHV when we
> apply pullup variable replacement in perform_pullup_replace_vars().

I'm not seeing the connection there?  Any nullingrels that are set
during perform_pullup_replace_vars would refer to outer joins within the
pulled-up subquery, whereas what you are talking about here is what
happens when the PHV's value propagates up through outer joins of the
parent query.  There's no overlap between those relid sets, or if there
is we have some fault in the logic that constrains join order to ensure
that there's a valid place to compute each PHV.

> On the other hand, the phnullingrels may contain RT indexes of outer
> joins above this join level. It seems not good to add such a PHV to the
> joinrel's targetlist.

Hmm, yeah, add_placeholders_to_joinrel is doing this wrong.  The
intent was to match what build_joinrel_tlist does with plain Vars,
but in that case we're adding the join's relid to what we find in
varnullingrels in the input tlist.  Using the phnullingrels from
the placeholder_list entry is wrong.  (I wonder whether a
placeholder_list entry's phnullingrels is meaningful at all?
Maybe it isn't.)  I think it might work to take the intersection
of the join's relids with root->outer_join_rels.

> If the two issues are indeed something we need to fix, maybe we can
> change add_placeholders_to_joinrel() to search the PHVs from
> outer_rel/inner_rel's targetlist, and add the ojrelid to phnullingrels
> if needed, just like what we do in build_joinrel_tlist(). The PHVs there
> should have the correct phnullingrels (at least the PHVs in base rels'
> targetlists have correctly set phnullingrels to NULL).

Yeah, maybe we should do something more invasive and make use of the
input targetlists rather than doing this from scratch.  Not sure.
I'm worried that doing it that way would increase the risk of getting
different join tlist contents depending on which pair of input rels
we happen to consider first.

regards, tom lane




Re: pg_upgrade test writes to source directory

2022-08-16 Thread Andres Freund
Hi,

On 2022-08-16 11:33:12 -0400, Andrew Dunstan wrote:
> On 2022-08-15 Mo 23:20, Andres Freund wrote:
> >
> > I've also attached a 0003 that splits the log location from the data
> > location. That could be used to make the log file location symmetrical 
> > between
> > pg_regress (log/) and tap tests (tmp_check/log).  But it'd break the
> > buildfarm's tap test log file collection, so I don't think that's something 
> > we
> > really can do soon-ish?
> 
> 
> Where would you like to have the buildfarm client search? Currently it
> does this:
> 
> 
>     my @logs = glob("$dir/tmp_check/log/*");
> 
>     $log->add_log($_) foreach (@logs);
> 
> I can add another pattern in that glob expression. I'm intending to put
> out a new release pretty soon (before US Labor Day).

$dir/log, so it's symmetric to the location of log files of regress/isolation
tests.

Thanks!

Andres




Re: Include the dependent extension information in describe command.

2022-08-16 Thread Bruce Momjian
On Mon, Aug 15, 2022 at 10:09:29PM +0530, vignesh C wrote:
> I have updated the patch to display "Objects depending on extension"
> as describe extension footer. The changes for the same are available
> in the v2 version patch attached. Thoughts?

I wonder if we would be better off with a backslash command that showed
the dependencies of any object.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: pg_upgrade test writes to source directory

2022-08-16 Thread Andrew Dunstan


On 2022-08-15 Mo 23:20, Andres Freund wrote:
>
> I've also attached a 0003 that splits the log location from the data
> location. That could be used to make the log file location symmetrical between
> pg_regress (log/) and tap tests (tmp_check/log).  But it'd break the
> buildfarm's tap test log file collection, so I don't think that's something we
> really can do soon-ish?


Where would you like to have the buildfarm client search? Currently it
does this:


    my @logs = glob("$dir/tmp_check/log/*");

    $log->add_log($_) foreach (@logs);

I can add another pattern in that glob expression. I'm intending to put
out a new release pretty soon (before US Labor Day).


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_upgrade test writes to source directory

2022-08-16 Thread Tom Lane
Andres Freund  writes:
> On 2022-08-11 11:26:39 -0400, Tom Lane wrote:
>> Given that it's no longer going to be the same tmp_check dir used
>> elsewhere, maybe we could s/tmp_check/tab_comp_dir/g or something
>> like that?  That'd add some clarity I think.

> Done in the attached patch (0001).

I was confused by 0001, because with the present test setup that will
result in creating an extra tab_comp_dir that isn't inside tmp_check,
leading to needing cleanup infrastructure that isn't there.  However,
0002 clarifies that: you're redefining TESTDIR.  I think 0001 is OK
as long as you apply it after, or integrate it into, 0002.

> patch 0002 just moves the addition of /tmp_check from Utils.pm to the places
> in which TESTDIR is defined.

I see some references to TESTDIR in src/tools/msvc/ecpg_regression.proj.
It looks like those are not references to this variable but uses of the

 
  ..\..\interfaces\ecpg\test

thingy at the top of the file.  Still, it's a bit confusing --- should
we rename that?  Maybe not worth the trouble given the short expected
lifespan of the MSVC test scripts.  0002 seems fine otherwise.

> I've also attached a 0003 that splits the log location from the data
> location. That could be used to make the log file location symmetrical between
> pg_regress (log/) and tap tests (tmp_check/log).  But it'd break the
> buildfarm's tap test log file collection, so I don't think that's something we
> really can do soon-ish?

No particular opinion about 0003 -- as you say, that's going to be
gated by the buildfarm.

regards, tom lane




Re: Patch proposal: New hooks in the connection path

2022-08-16 Thread Gurjeet Singh
On Tue, Aug 16, 2022 at 3:16 AM Bharath Rupireddy
 wrote:
>
> On Tue, Aug 16, 2022 at 1:55 PM Drouvot, Bertrand  wrote:
> >
> > Hi,
> >
> > On 8/16/22 10:10 AM, Bharath Rupireddy wrote:
> > > On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand  
> > > wrote:
> > >> On 8/14/22 7:52 AM, Gurjeet Singh wrote:
> > >>> On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand  
> > >>> wrote:
> > >>> I think we can reduce the number of places the hook is called, if we
> > >>> call the hook from proc_exit(), and at all the other places we simply 
> > >>> set
> > >>> a global variable to signify the reason for the failure. The case of
> > >>> _exit(1) from the signal-handler cannot use such a mechanism, but I
> > >>> think all the other cases of interest can simply register one of the
> > >>> FCET_* values, and let the call from proc_exit() pass that value
> > >>> to the hook.
> > >> That looks like a good idea to me. I'm tempted to rewrite the patch that
> > >> way (and addressing the first comment in the same time).
> > >>
> > >> Curious to hear about others hackers thoughts too.

I agree that we need feedback from long-timers here,  on the decision
of whether to use proc_exit() for this purpose.

> > > IMO, calling the hook from proc_exit() is not a good design as
> > > proc_exit() is a generic code called from many places in the source
> > > code, even the simple code of kind  if(call_failed_conn_hook) {
> > > falied_conn_hook(params);} can come in the way of many exit code paths
> > > which is undesirable, and the likelihood of introducing new bugs may
> > > increase.
> >
> > Thanks for the feedback.
> >
> > What do you think about calling the hook only if the new global variable
> > is not equal to its default value (which would mean don't trigger the
> > hook)?
>
> IMO, that's not a good design as explained above. Why should the
> failed connection hook related code get hit for each and every
> proc_exit() call? Here, the code duplication i.e. the number of places
> the failed connection hook gets called mustn't be the reason to move
> that code to proc_exit().

I agree, it doesn't feel _clean_, having to maintain a global
variable, pass it to hook at exit, etc. But the alternative feels less
cleaner.

This hook needs to be called when the process has decided to exit, so
it makes sense to place this call in stack above proc_exit(), whose
sole job is to let the process die gracefully, and take care of things
on the way out.

There are quite a few places in core that leverage proc_exit()'s
facilities (by registering on_proc_exit callbacks), so an
extension/hook doing so wouldn't be out of the ordinary; (apparently
contrib/sepgsql has already set the precedent on an extension using
the on_proc_exit callback). Admittedly, in this case the core will be
managing and passing it the additional global variable needed to
record the connection failure reason, FCET_*.

If we agree that proc_exit() is a good place to place this call, then
this hook can be converted into a on_proc_exit callback. If the global
variable is exported, then the extension(s) can access it in the
callback to ascertain why the process is exiting, and proc_exit()
won't have to know anything special about the extension, or hook, or
the global variable.

The on_proc_exit callback method wouldn't work for the _exit() called
in StartupPacketTimeoutHandler(), so that will need to be handled
separately.

Best regards,
Gurjeet
http://Gurje.et




Re: Propose a new function - list_is_empty

2022-08-16 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 15, 2022 at 9:28 PM Tom Lane  wrote:
>> That's because the *correct* way to write it is either "alist == NIL"
>> or just "!alist".

> I think the alist == NIL (or alist != NIL) style often makes the code
> easier to read. I recommend we standardize on that one.

I have a general preference for comparing to NIL because (as Daniel
noted nearby) it reminds you of what data type you're dealing with.
However, I'm not up for trying to forbid the bare-boolean-test style
altogether.  It'd be near impossible to find all the instances;
besides which we don't insist that other pointer checks be written
as explicit comparisons to NULL --- we do whichever of those seems
clearest in context.  So I'm happy for this patch to leave either
of those existing usages alone.  I agree though that while simplifying
list_length() calls, I'd lean to using explicit comparisons to NIL.

regards, tom lane




Re: pg_receivewal and SIGTERM

2022-08-16 Thread Christoph Berg
Re: Bharath Rupireddy
> Don't we need a similar explanation [1] for pg_recvlogical docs?
> 
> [1]
>
> In the absence of fatal errors, pg_receivewal
> -   will run until terminated by the SIGINT signal
> -   ( action="simul">ControlC).
> +   will run until terminated by the SIGINT
> +   ( action="simul">ControlC)
> +   or SIGTERM signal.

Coped that from pg_receivewal(1) now.

Christoph
>From 6c51c559a86754623cd33fe4cef3563c18fccea3 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Mon, 15 Aug 2022 14:29:43 +0200
Subject: [PATCH] pg_receivewal, pg_recvlogical: Exit cleanly on SIGTERM

In pg_receivewal, compressed output is only flushed on clean exits. The
reason to support SIGTERM here as well is that pg_receivewal might well
be running as a daemon, and systemd's default KillSignal is SIGTERM.

Since pg_recvlogical is also supposed to run as a daemon, teach it about
SIGTERM as well.
---
 doc/src/sgml/ref/pg_receivewal.sgml| 8 +---
 doc/src/sgml/ref/pg_recvlogical.sgml   | 7 +++
 src/bin/pg_basebackup/pg_receivewal.c  | 7 ---
 src/bin/pg_basebackup/pg_recvlogical.c | 7 ---
 4 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 4fe9e1a874..5f83ba1893 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -118,8 +118,9 @@ PostgreSQL documentation
 
   
In the absence of fatal errors, pg_receivewal
-   will run until terminated by the SIGINT signal
-   (ControlC).
+   will run until terminated by the SIGINT
+   (ControlC)
+   or SIGTERM signal.
   
  
 
@@ -457,7 +458,8 @@ PostgreSQL documentation
 
   
pg_receivewal will exit with status 0 when
-   terminated by the SIGINT signal.  (That is the
+   terminated by the SIGINT or
+   SIGTERM signal.  (That is the
normal way to end it.  Hence it is not an error.)  For fatal errors or
other signals, the exit status will be nonzero.
   
diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml
index 1a88225409..012ee4468b 100644
--- a/doc/src/sgml/ref/pg_recvlogical.sgml
+++ b/doc/src/sgml/ref/pg_recvlogical.sgml
@@ -46,6 +46,13 @@ PostgreSQL documentation
 a slot without consuming it, use
pg_logical_slot_peek_changes.
   
+
+  
+   In the absence of fatal errors, pg_recvlogical
+   will run until terminated by the SIGINT
+   (ControlC)
+   or SIGTERM signal.
+  
  
 
  
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index f064cff4ab..e2cf924017 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -673,13 +673,13 @@ StreamLog(void)
 }
 
 /*
- * When sigint is called, just tell the system to exit at the next possible
+ * When SIGINT/SIGTERM are caught, just tell the system to exit at the next possible
  * moment.
  */
 #ifndef WIN32
 
 static void
-sigint_handler(int signum)
+sigexit_handler(int signum)
 {
 	time_to_stop = true;
 }
@@ -932,7 +932,8 @@ main(int argc, char **argv)
 	 * if one is needed, in GetConnection.)
 	 */
 #ifndef WIN32
-	pqsignal(SIGINT, sigint_handler);
+	pqsignal(SIGINT, sigexit_handler);
+	pqsignal(SIGTERM, sigexit_handler);
 #endif
 
 	/*
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 2a4c8b130a..363102cf85 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -650,11 +650,11 @@ error:
 #ifndef WIN32
 
 /*
- * When sigint is called, just tell the system to exit at the next possible
+ * When SIGINT/SIGTERM are caught, just tell the system to exit at the next possible
  * moment.
  */
 static void
-sigint_handler(int signum)
+sigexit_handler(int signum)
 {
 	time_to_abort = true;
 }
@@ -922,7 +922,8 @@ main(int argc, char **argv)
 	 * if one is needed, in GetConnection.)
 	 */
 #ifndef WIN32
-	pqsignal(SIGINT, sigint_handler);
+	pqsignal(SIGINT, sigexit_handler);
+	pqsignal(SIGTERM, sigexit_handler);
 	pqsignal(SIGHUP, sighup_handler);
 #endif
 
-- 
2.35.1



Re: SQL/JSON features for v15

2022-08-16 Thread Robert Haas
On Mon, Aug 15, 2022 at 6:39 PM Andres Freund  wrote:
> I don't think it's sane from a performance view to start a subtransaction for
> every coercion, particularly because most coercion paths will never trigger an
> error, leaving things like out-of-memory or interrupts aside. And those are
> re-thrown by ExecEvalJsonExprSubtrans().  A quick and dirty benchmark shows
> ERROR ON ERROR nearly 2xing speed.  I'm worried about the system impact of
> using subtransactions this heavily, it's not exactly the best performing
> system - the only reason it's kind of ok here is that it's going to be very
> rare to allocate a subxid, I think.

I agree. It kinda surprises me that we thought it was OK to commit
something that uses that many subtransactions. I feel like that's
going to cause people to hose themselves in ways that we can't really
do anything about. Like they'll test it out, it will work, and then
when they put it into production, they'll have constant wraparound
issues for which the only real solution is to not use the feature they
relied on to build the application.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Move NON_EXEC_STATIC from c.h

2022-08-16 Thread Tom Lane
Peter Eisentraut  writes:
> Looking to tidy up c.h a bit, I think the NON_EXEC_STATIC #define 
> doesn't need to be known globally, and it's not related to establishing 
> a portable C environment, so I propose to move it to a more localized 
> header, such as postmaster.h, as in the attached patch.

Hmm, postgres.h seems like a better choice, since in principle any
backend file might need this.  This arrangement could require
postmaster.h to be included just for this macro.

Also, the macro was severely underdocumented already, and I don't
find "no comment at all" to be better.  Can't we afford a couple
of lines of explanation?

regards, tom lane




Re: Propose a new function - list_is_empty

2022-08-16 Thread Robert Haas
On Mon, Aug 15, 2022 at 9:28 PM Tom Lane  wrote:
> Peter Smith  writes:
> > During a recent code review I was going to suggest that some new code
> > would be more readable if the following:
> > if (list_length(alist) == 0) ...
>
> > was replaced with:
> > if (list_is_empty(alist)) ...
>
> > but then I found that actually no such function exists.
>
> That's because the *correct* way to write it is either "alist == NIL"
> or just "!alist".

I think the alist == NIL (or alist != NIL) style often makes the code
easier to read. I recommend we standardize on that one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Propose a new function - list_is_empty

2022-08-16 Thread Tom Lane
Daniel Gustafsson  writes:
> I think these are nice cleanups to simplify and streamline the code, just a 
> few
> small comments from reading the patch:

>   /* If no subcommands, don't collect */
> - if 
> (list_length(currentEventTriggerState->currentCommand->d.alterTable.subcmds) 
> != 0)
> + if (currentEventTriggerState->currentCommand->d.alterTable.subcmds)
> Here the current coding gives context about the data structure used for the
> subcmds member which is now lost.  I don't mind the change but rewording the
> comment above to indicate that subcmds is a list would be good IMHO.

I think testing for equality to NIL is better where that's a concern.

> Might be personal taste, but I think the parenthesis should be kept here as a
> visual aid for the reader.

+1

regards, tom lane




Re: pg_receivewal and SIGTERM

2022-08-16 Thread Bharath Rupireddy
On Tue, Aug 16, 2022 at 5:15 PM Daniel Gustafsson  wrote:
>
> > On 16 Aug 2022, at 13:36, Christoph Berg  wrote:
>
> >>  pqsignal(SIGINT, sigint_handler);
> >> +pqsignal(SIGTERM, sigint_handler);
> >> Tiny nitpick, I think we should rename sigint_handler to just sig_handler 
> >> as it
> >> does handle more than sigint.
> >
> > I went with sigexit_handler since pg_recvlogical has also a
> > sighup_handler and "sig_handler" would be confusing there.
>
> Good point, sigexit_handler is a better name here.

+1.

Don't we need a similar explanation [1] for pg_recvlogical docs?

[1]
   
In the absence of fatal errors, pg_receivewal
-   will run until terminated by the SIGINT signal
-   (ControlC).
+   will run until terminated by the SIGINT
+   (ControlC)
+   or SIGTERM signal.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/




Re: Support logical replication of global object commands

2022-08-16 Thread Amit Kapila
On Tue, Aug 16, 2022 at 11:35 AM Zheng Li  wrote:
>
> > Can we think of relying to send WAL of such DDLs just based on whether
> > there is a corresponding publication (ex. publication of ALL OBJECTS)?
> > I mean avoid database-specific filtering in decoding for such DDL
> > commands but not sure how much better it is than the current proposal?
>
> I think a publication of ALL OBJECTS sounds intuitive. Does it mean we'll
> publish all DDL commands, all commit and abort operations in every
> database if there is such publication of ALL OBJECTS?
>

Actually, I intend something for global objects. But the main thing
that is worrying me about this is that we don't have a clean way to
untie global object replication from database-specific object
replication.

-- 
With Regards,
Amit Kapila.




Re: pg_receivewal and SIGTERM

2022-08-16 Thread Daniel Gustafsson
> On 16 Aug 2022, at 13:36, Christoph Berg  wrote:

>>  pqsignal(SIGINT, sigint_handler);
>> +pqsignal(SIGTERM, sigint_handler);
>> Tiny nitpick, I think we should rename sigint_handler to just sig_handler as 
>> it
>> does handle more than sigint.
> 
> I went with sigexit_handler since pg_recvlogical has also a
> sighup_handler and "sig_handler" would be confusing there.

Good point, sigexit_handler is a better name here.

--
Daniel Gustafsson   https://vmware.com/





Re: pg_receivewal and SIGTERM

2022-08-16 Thread Christoph Berg
Re: Daniel Gustafsson
> In general that's a good idea, but they are so trivial that I don't really see
> much point in doing that in this particular case.

Plus the variable they set is called differently...

Christoph




Re: pg_receivewal and SIGTERM

2022-08-16 Thread Daniel Gustafsson
> On 16 Aug 2022, at 13:40, Bharath Rupireddy 
>  wrote:
> 
> On Tue, Aug 16, 2022 at 5:06 PM Christoph Berg  wrote:

>> I went with sigexit_handler since pg_recvlogical has also a
>> sighup_handler and "sig_handler" would be confusing there.
> 
> Can we move these signal handlers to streamutil.h/.c so that both
> pg_receivewal and pg_recvlogical can make use of it avoiding duplicate
> code?

In general that's a good idea, but they are so trivial that I don't really see
much point in doing that in this particular case.

--
Daniel Gustafsson   https://vmware.com/





Re: pg_receivewal and SIGTERM

2022-08-16 Thread Bharath Rupireddy
On Tue, Aug 16, 2022 at 5:06 PM Christoph Berg  wrote:
>
> Re: Daniel Gustafsson
> > Do you think pg_recvlogical should support SIGTERM as well?  (The signals 
> > which
> > it does trap should be added to the documentation which just now says "until
> > terminated by a signal" but that's a separate thing.)
>
> Ack, that makes sense, added in the attached updated patch.
>
> >   pqsignal(SIGINT, sigint_handler);
> > + pqsignal(SIGTERM, sigint_handler);
> > Tiny nitpick, I think we should rename sigint_handler to just sig_handler 
> > as it
> > does handle more than sigint.
>
> I went with sigexit_handler since pg_recvlogical has also a
> sighup_handler and "sig_handler" would be confusing there.

Can we move these signal handlers to streamutil.h/.c so that both
pg_receivewal and pg_recvlogical can make use of it avoiding duplicate
code?

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/




Re: pg_receivewal and SIGTERM

2022-08-16 Thread Christoph Berg
Re: Daniel Gustafsson
> Do you think pg_recvlogical should support SIGTERM as well?  (The signals 
> which
> it does trap should be added to the documentation which just now says "until
> terminated by a signal" but that's a separate thing.)

Ack, that makes sense, added in the attached updated patch.

>   pqsignal(SIGINT, sigint_handler);
> + pqsignal(SIGTERM, sigint_handler);
> Tiny nitpick, I think we should rename sigint_handler to just sig_handler as 
> it
> does handle more than sigint.

I went with sigexit_handler since pg_recvlogical has also a
sighup_handler and "sig_handler" would be confusing there.

Christoph
>From beb3bcb32a9eabf2bf83e259706f89ccdac276d3 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Mon, 15 Aug 2022 14:29:43 +0200
Subject: [PATCH] pg_receivewal, pg_recvlogical: Exit cleanly on SIGTERM

In pg_receivewal, compressed output is only flushed on clean exits. The
reason to support SIGTERM here as well is that pg_receivewal might well
be running as a daemon, and systemd's default KillSignal is SIGTERM.

Since pg_recvlogical is also supposed to run as a daemon, teach it about
SIGTERM as well.
---
 doc/src/sgml/ref/pg_receivewal.sgml| 8 +---
 src/bin/pg_basebackup/pg_receivewal.c  | 7 ---
 src/bin/pg_basebackup/pg_recvlogical.c | 7 ---
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 4fe9e1a874..5f83ba1893 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -118,8 +118,9 @@ PostgreSQL documentation
 
   
In the absence of fatal errors, pg_receivewal
-   will run until terminated by the SIGINT signal
-   (ControlC).
+   will run until terminated by the SIGINT
+   (ControlC)
+   or SIGTERM signal.
   
  
 
@@ -457,7 +458,8 @@ PostgreSQL documentation
 
   
pg_receivewal will exit with status 0 when
-   terminated by the SIGINT signal.  (That is the
+   terminated by the SIGINT or
+   SIGTERM signal.  (That is the
normal way to end it.  Hence it is not an error.)  For fatal errors or
other signals, the exit status will be nonzero.
   
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index f064cff4ab..e2cf924017 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -673,13 +673,13 @@ StreamLog(void)
 }
 
 /*
- * When sigint is called, just tell the system to exit at the next possible
+ * When SIGINT/SIGTERM are caught, just tell the system to exit at the next possible
  * moment.
  */
 #ifndef WIN32
 
 static void
-sigint_handler(int signum)
+sigexit_handler(int signum)
 {
 	time_to_stop = true;
 }
@@ -932,7 +932,8 @@ main(int argc, char **argv)
 	 * if one is needed, in GetConnection.)
 	 */
 #ifndef WIN32
-	pqsignal(SIGINT, sigint_handler);
+	pqsignal(SIGINT, sigexit_handler);
+	pqsignal(SIGTERM, sigexit_handler);
 #endif
 
 	/*
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 2a4c8b130a..363102cf85 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -650,11 +650,11 @@ error:
 #ifndef WIN32
 
 /*
- * When sigint is called, just tell the system to exit at the next possible
+ * When SIGINT/SIGTERM are caught, just tell the system to exit at the next possible
  * moment.
  */
 static void
-sigint_handler(int signum)
+sigexit_handler(int signum)
 {
 	time_to_abort = true;
 }
@@ -922,7 +922,8 @@ main(int argc, char **argv)
 	 * if one is needed, in GetConnection.)
 	 */
 #ifndef WIN32
-	pqsignal(SIGINT, sigint_handler);
+	pqsignal(SIGINT, sigexit_handler);
+	pqsignal(SIGTERM, sigexit_handler);
 	pqsignal(SIGHUP, sighup_handler);
 #endif
 
-- 
2.35.1



Move NON_EXEC_STATIC from c.h

2022-08-16 Thread Peter Eisentraut
Looking to tidy up c.h a bit, I think the NON_EXEC_STATIC #define 
doesn't need to be known globally, and it's not related to establishing 
a portable C environment, so I propose to move it to a more localized 
header, such as postmaster.h, as in the attached patch.From 29f2f1a8f8111b3d232c776a145f57759131396a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 16 Aug 2022 10:56:41 +0200
Subject: [PATCH] Move NON_EXEC_STATIC from c.h to postmaster.h

It is not needed at the scope of c.h, so move it to a more localized
place.
---
 src/backend/storage/lmgr/proc.c | 1 +
 src/include/c.h | 7 ---
 src/include/postmaster/postmaster.h | 6 ++
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 37aaab1338..448e7a2efa 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -41,6 +41,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
+#include "postmaster/postmaster.h"
 #include "replication/slot.h"
 #include "replication/syncrep.h"
 #include "replication/walsender.h"
diff --git a/src/include/c.h b/src/include/c.h
index 65e91a6b89..17d1cc28e3 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -1348,13 +1348,6 @@ typedef intptr_t sigjmp_buf[5];
 #endif /* __MINGW64__ */
 #endif /* WIN32 */
 
-/* EXEC_BACKEND defines */
-#ifdef EXEC_BACKEND
-#define NON_EXEC_STATIC
-#else
-#define NON_EXEC_STATIC static
-#endif
-
 /* /port compatibility functions */
 #include "port.h"
 
diff --git a/src/include/postmaster/postmaster.h 
b/src/include/postmaster/postmaster.h
index 90e333ccd2..03c624fb1b 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -55,6 +55,12 @@ extern int   MaxLivePostmasterChildren(void);
 
 extern bool PostmasterMarkPIDForWorkerNotify(int);
 
+#ifdef EXEC_BACKEND
+#define NON_EXEC_STATIC
+#else
+#define NON_EXEC_STATIC static
+#endif
+
 #ifdef EXEC_BACKEND
 extern pid_t postmaster_forkexec(int argc, char *argv[]);
 extern void SubPostmasterMain(int argc, char *argv[]) pg_attribute_noreturn();
-- 
2.37.1



Re: build remaining Flex files standalone

2022-08-16 Thread John Naylor
For v3, I addressed some comments and added .h files to the
headerscheck exceptions.

On Tue, Aug 16, 2022 at 1:11 AM Andres Freund  wrote:

> On 2022-08-13 15:39:06 +0700, John Naylor wrote:
> > Here are the rest. Most of it was pretty straightforward, with the
> > main exception of jsonpath_scan.c, which is not quite finished. That
> > one passes tests but still has one compiler warning. I'm unsure how
> > much of what is there already is really necessary or was cargo-culted
> > from elsewhere without explanation. For starters, I'm not sure why the
> > grammar has a forward declaration of "union YYSTYPE". It's noteworthy
> > that it used to compile standalone, but with a bit more stuff, and
> > that was reverted in 550b9d26f80fa30. I can hack on it some more later
> > but I ran out of steam today.

I've got it in half-way decent shape now, with an *internal.h header
and some cleanups.

> > - Include order seems to matter for the grammar's .h file. I didn't
> > test if that was the case every time, and after a few miscompiles just
> > always made it the last inclusion, but I'm wondering if we should keep
> > those inclusions outside %top{} and put it at the start of the next
> > %{} ?
>
> I think we have a few of those dependencies already, see e.g.
> /*
>  * NB: include gram.h only AFTER including scanner.h, because scanner.h
>  * is what #defines YYLTYPE.
>  */

Went with something like this in all cases:

/*
 * NB: include bootparse.h only AFTER including bootstrap.h, because bootstrap.h
 * includes node definitions needed for YYSTYPE.
 */

Future cleanup: I see this in headerscheck:

# We can't make these Bison output files compilable standalone
# without using "%code require", which old Bison versions lack.
# parser/gram.h will be included by parser/gramparse.h anyway.

That directive has been supported in Bison since 2.4.2.

> > From d723ba14acf56fd432e9e263db937fcc13fc0355 Mon Sep 17 00:00:00 2001
> > From: John Naylor 
> > Date: Thu, 11 Aug 2022 19:38:37 +0700
> > Subject: [PATCH v201 1/9] Build guc-file.c standalone
>
> Might be worth doing some of the moving around here separately from the
> parser/scanner specific bits.

Done in 0001/0003.

> > +/* functions shared between guc.c and guc-file.l */
> > [...]
> I think I prefer your suggestion of a guc_internal.h upthread.

Started in 0002, but left open the headerscheck failure.

Also, if such a thing is meant to be #include'd only by two generated
files, maybe it should just live in the directory where they live, and
not in the src/include dir?

> > From 7d4ecfcb3e91f3b45e94b9e64c7c40f1bbd22aa8 Mon Sep 17 00:00:00 2001
> > From: John Naylor 
> > Date: Fri, 12 Aug 2022 15:45:24 +0700
> > Subject: [PATCH v201 2/9] Build booscanner.c standalone
>
> > -# bootscanner is compiled as part of bootparse
> > -bootparse.o: bootscanner.c
> > +# See notes in src/backend/parser/Makefile about the following two rules
> > +bootparse.h: bootparse.c
> > + touch $@
> > +
> > +bootparse.c: BISONFLAGS += -d
> > +
> > +# Force these dependencies to be known even without dependency info built:
> > +bootparse.o bootscan.o: bootparse.h
>
> Wonder if we could / should wrap this is something common. It's somewhat
> annoying to repeat this stuff everywhere.

I haven't looked at the Meson effort recently, but if the build rule
is less annoying there, I'm inclined to leave this as a wart until
autotools are retired.

> > diff --git a/src/test/isolation/specscanner.l 
> > b/src/test/isolation/specscanner.l
> > index aa6e89268e..2dc292c21d 100644
> > --- a/src/test/isolation/specscanner.l
> > +++ b/src/test/isolation/specscanner.l
> > @@ -1,4 +1,4 @@
> > -%{
> > +%top{
> >  /*-
> >   *
> >   * specscanner.l
> > @@ -9,7 +9,14 @@
> >   *
> >   *-
> >   */
> > +#include "postgres_fe.h"
>
> Miniscule nitpick: I think we typically leave an empty line between header and
> first include.

In a small unscientific sample it seems like the opposite is true
actually, but I'll at least try to be consistent within the patch set.

> > diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h
> > index dbe7d4f742..0b373048b5 100644
> > --- a/contrib/cube/cubedata.h
> > +++ b/contrib/cube/cubedata.h
> > @@ -67,3 +67,7 @@ extern void cube_scanner_finish(void);
> >
> >  /* in cubeparse.y */
> >  extern int   cube_yyparse(NDBOX **result);
> > +
> > +/* All grammar constructs return strings */
> > +#define YYSTYPE char *
>
> Why does this need to be defined in a semi-public header? If we do this in
> multiple files we'll end up with the danger of macro redefinition warnings.

I tried to put all the Flex/Bison stuff in another *_internal header,
but that breaks the build. Putting just this one symbol in a header is
silly, but done that way for now. Maybe two copies of the symbol?

Another future cleanup: "%define api.prefix {cube_yy}" etc would 

Re: pg_receivewal and SIGTERM

2022-08-16 Thread Daniel Gustafsson
> On 15 Aug 2022, at 14:45, Christoph Berg  wrote:

> The problem was that systemd's default KillSignal is SIGTERM, while
> pg_receivewal flushes the output compression buffers on SIGINT only.

Supporting SIGTERM here makes sense, especially given how systemd works.

> The attached patch makes it do the same for SIGTERM as well. (Most
> places in PG that install a SIGINT handler also install a SIGTERM
> handler already.)

Not really when it comes to utilities though; initdb, pg_dump and pg_test_fsync
seems to be the ones doing so.  (That's probably mostly due to them not running
in a daemon-like way as what's discussed here.)

Do you think pg_recvlogical should support SIGTERM as well?  (The signals which
it does trap should be added to the documentation which just now says "until
terminated by a signal" but that's a separate thing.)

pqsignal(SIGINT, sigint_handler);
+   pqsignal(SIGTERM, sigint_handler);
Tiny nitpick, I think we should rename sigint_handler to just sig_handler as it
does handle more than sigint.

In relation to this.  Reading over this and looking around I realized that the
documentation for pg_waldump lacks a closing parenthesis on Ctrl+C so I will be
pushing the below to fix it:

--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -263,7 +263,7 @@ PostgreSQL documentation

 If pg_waldump is terminated by signal
 SIGINT
-(ControlC,
+(ControlC),
 the summary of the statistics computed is displayed up to the
 termination point. This operation is not supported on
 Windows.

--
Daniel Gustafsson   https://vmware.com/





Re: Patch proposal: New hooks in the connection path

2022-08-16 Thread Bharath Rupireddy
On Tue, Aug 16, 2022 at 1:55 PM Drouvot, Bertrand  wrote:
>
> Hi,
>
> On 8/16/22 10:10 AM, Bharath Rupireddy wrote:
> > On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand  
> > wrote:
> >> On 8/14/22 7:52 AM, Gurjeet Singh wrote:
> >>> On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand  
> >>> wrote:
> >>> I think we can reduce the number of places the hook is called, if we
> >>> call the hook from proc_exit(), and at all the other places we simply set
> >>> a global variable to signify the reason for the failure. The case of
> >>> _exit(1) from the signal-handler cannot use such a mechanism, but I
> >>> think all the other cases of interest can simply register one of the
> >>> FCET_* values, and let the call from proc_exit() pass that value
> >>> to the hook.
> >> That looks like a good idea to me. I'm tempted to rewrite the patch that
> >> way (and addressing the first comment in the same time).
> >>
> >> Curious to hear about others hackers thoughts too.
> > IMO, calling the hook from proc_exit() is not a good design as
> > proc_exit() is a generic code called from many places in the source
> > code, even the simple code of kind  if(call_failed_conn_hook) {
> > falied_conn_hook(params);} can come in the way of many exit code paths
> > which is undesirable, and the likelihood of introducing new bugs may
> > increase.
>
> Thanks for the feedback.
>
> What do you think about calling the hook only if the new global variable
> is not equal to its default value (which would mean don't trigger the
> hook)?

IMO, that's not a good design as explained above. Why should the
failed connection hook related code get hit for each and every
proc_exit() call? Here, the code duplication i.e. the number of places
the failed connection hook gets called mustn't be the reason to move
that code to proc_exit().

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/




Re: Logical WAL sender unresponsive during decoding commit

2022-08-16 Thread Masahiko Sawada
On Tue, Aug 16, 2022 at 2:32 PM Masahiko Sawada  wrote:
>
> On Tue, Aug 16, 2022 at 2:31 PM Amit Kapila  wrote:
> >
> > On Tue, Aug 16, 2022 at 10:56 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Aug 16, 2022 at 2:08 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Tue, Aug 16, 2022 at 9:28 AM Andrey Borodin  
> > > > wrote:
> > > > >
> > > > > Hi hackers!
> > > > >
> > > > > Some time ago I've seen a hanging logical replication that was trying 
> > > > > to send transaction commit after doing table pg_repack.
> > > > > I understand that those things do not mix well. Yet walsender was 
> > > > > ignoring pg_terminate_backend() and I think this worth fixing.
> > > > > Can we add CHECK_FOR_INTERRUPTS(); somewhere in this backtrace?
> > > > >
> > > >
> > > > I think if we want to do this in this code path then it may be it is
> > > > better to add it in ReorderBufferProcessTXN where we are looping to
> > > > process each change.
> > >
> > > +1
> > >
> > > The same issue is recently reported[1] on -bugs and I proposed the
> > > patch that adds CHECK_FOR_INTERRUPTS() to the loop in
> > > ReorderBufferProcessTXN(). I think it should be backpatched.
> > >
> >
> > I agree that it is better to backpatch this as well. Would you like to
> > verify if your patch works for all branches or if it need some tweaks?
> >
>
> Yes, I've confirmed v10 and master but will do that for other branches
> and send patches for all supported branches.
>

I've attached patches for all supported branches.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


REL11_v2-0001-Add-CHECK_FOR_INTERRUPTS-in-logical-decoding-s-pr.patch
Description: Binary data


REL15_v2-0001-Add-CHECK_FOR_INTERRUPTS-in-logical-decoding-s-pr.patch
Description: Binary data


REL14_v2-0001-Add-CHECK_FOR_INTERRUPTS-in-logical-decoding-s-pr.patch
Description: Binary data


REL13_v2-0001-Add-CHECK_FOR_INTERRUPTS-in-logical-decoding-s-pr.patch
Description: Binary data


REL12_v2-0001-Add-CHECK_FOR_INTERRUPTS-in-logical-decoding-s-pr.patch
Description: Binary data


REL10_v2-0001-Add-CHECK_FOR_INTERRUPTS-in-logical-decoding-s-pr.patch
Description: Binary data


master_v2-0001-Add-CHECK_FOR_INTERRUPTS-in-logical-decoding-s-pr.patch
Description: Binary data


Re: Logical WAL sender unresponsive during decoding commit

2022-08-16 Thread Andrey Borodin



> On 16 Aug 2022, at 10:25, Masahiko Sawada  wrote:
> 
> The same issue is recently reported[1] on -bugs
Oh, I missed that thread.

> and I proposed the
> patch that adds CHECK_FOR_INTERRUPTS() to the loop in
> ReorderBufferProcessTXN().
I agree that it's a good place for check.

> I think it should be backpatched.
Yes, I think so too.

> [1] 
> https://www.postgresql.org/message-id/CAD21AoD%2BaNfLje%2B9JOqWbTiq1GL4BOp9_f7FxLADm8rS8cDhCQ%40mail.gmail.com

The patch in this thread looks good to me.


Thank you!

Best regards, Andrey Borodin.



Re: [PG15 Doc] remove "tty" connect string from manual

2022-08-16 Thread Daniel Gustafsson
> On 16 Aug 2022, at 07:27, Shinoda, Noriyoshi (PN Japan FSIP) 
>  wrote:
> 
> Hello, hackers.
> 
> As of PostgreSQL 14, "tty" in the libpq connection string has already been 
> removed with the commit below.
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=14d9b37607ad30c3848ea0f2955a78436eff1268
> 
> But https://www.postgresql.org/docs/15/libpq-connect.html#LIBPQ-CONNSTRING 
> still says "Ignored (formerly, this specified where to send server debug 
> output)". The attached patch removes the "tty" item.

Ah, nice catch, I missed removing this in my original patch.  I'll take of
committing this shortly.

--
Daniel Gustafsson   https://vmware.com/





Re: Propose a new function - list_is_empty

2022-08-16 Thread Daniel Gustafsson
> On 16 Aug 2022, at 07:29, Peter Smith  wrote:
> On Tue, Aug 16, 2022 at 11:27 AM Tom Lane  wrote:

>> if you want to get rid of overcomplicated uses of
>> list_length() in favor of one of those spellings, have at it.
> 
> Done, and tested OK with make check-world.

I think these are nice cleanups to simplify and streamline the code, just a few
small comments from reading the patch:

/* If no subcommands, don't collect */
-   if 
(list_length(currentEventTriggerState->currentCommand->d.alterTable.subcmds) != 
0)
+   if (currentEventTriggerState->currentCommand->d.alterTable.subcmds)
Here the current coding gives context about the data structure used for the
subcmds member which is now lost.  I don't mind the change but rewording the
comment above to indicate that subcmds is a list would be good IMHO.


-   build_expressions = (list_length(stxexprs) > 0);
+   build_expressions = stxexprs != NIL;
Might be personal taste, but I think the parenthesis should be kept here as a
visual aid for the reader.


-   Assert(list_length(publications) > 0);
+   Assert(publications);
The more common (and clearer IMO) pattern would be Assert(publications != NIL);
I think.  The same applies for a few hunks in the patch.


-   Assert(clauses != NIL);
-   Assert(list_length(clauses) >= 1);
+   Assert(clauses);
Just removing the list_length() assertion would be enough here.


makeIndexArray() in jsonpath_gram.y has another Assert(list_length(list) > 0);
construction as well.  The other I found is in create_groupingsets_path() but
there I think it makes sense to keep the current coding based on the assertion
just prior to it being very similar and requiring list_length().

--
Daniel Gustafsson   https://vmware.com/





Re: [PoC] Reducing planning time when tables have many partitions

2022-08-16 Thread David Rowley
esOn Tue, 9 Aug 2022 at 19:10, David Rowley  wrote:
> I've not had a chance to look at the 0003 patch yet.

I've looked at the 0003 patch now.

The performance numbers look quite impressive, however, there were a
few things about the patch that I struggled to figure what they were
done the way you did them:

+ root->eq_not_children_indexes = bms_add_member(root->eq_not_children_indexes,

Why is that in PlannerInfo rather than in the EquivalenceClass?

  if (bms_equal(rel->relids, em->em_relids))
  {
  rel->eclass_member_indexes =
bms_add_member(rel->eclass_member_indexes, em_index);
  }

Why are you only adding the eclass_member_index to the RelOptInfo when
the em_relids contain a singleton relation?

I ended up going and fixing the patch to be more how I imagined it.

I've ended up with 3 Bitmapset fields in EquivalenceClass;
ec_member_indexes, ec_nonchild_indexes, ec_norel_indexes.  I also
trimmed the number of helper functions down for obtaining the minimal
set of matching EquivalenceMember indexes to just:

Bitmapset *
get_ecmember_indexes(PlannerInfo *root, EquivalenceClass *ec, Relids relids,
bool with_children, bool with_norel_members)

Bitmapset *
get_ecmember_indexes_strict(PlannerInfo *root, EquivalenceClass *ec,
Relids relids, bool with_children,
bool with_norel_members)

I'm not so much a fan of the bool parameters, but it seemed better
than having 8 different functions with each combination of the bool
paramters instead of 2.

The "strict" version of the function takes the intersection of
eclass_member_indexes for each rel mentioned in relids, whereas the
non-strict version does a union of those.  Each then intersect that
with all members in the 'ec', or just the non-child members when
'with_children' is false.  They both then optionally bms_add_members()
the ec_norel_members if with_norel_members is true.  I found it
difficult to figure out the best order to do the intersection. That
really depends on if the particular query has many EquivalenceClasses
with few EquivalenceMembers or few EquivalenceClasses with many
EquivalenceMembers. bms_int_members() always recycles the left input.
Ideally, that would always be the smallest Bitmapset. Maybe it's worth
inventing a new version of bms_int_members() that recycles the input
with the least nwords. That would give the subsequent
bms_next_member() calls an easier time. Right now they'll need to loop
over a bunch of 0 words at the end for many queries.

A few problems I ran into along the way:

1. generate_append_tlist() generates Vars with varno=0.  That causes
problems when we add Exprs from those in add_eq_member() as there is
no element at root->simple_rel_array[0] to add eclass_member_indexes
to.
2. The existing comment for EquivalenceMember.em_relids claims "all
relids appearing in em_expr", but that's just not true when it comes
to em_is_child members.

So far, I fixed #1 by adding a hack to setup_simple_rel_arrays() to do
"root->simple_rel_array[0] = makeNode(RelOptInfo);" I'm not suggesting
that's the correct fix. It might be possible to set the varnos to the
varnos from the first Append child instead.

The fact that #2 is not true adds quite a bit of complexity to the
patch and I think the patch might even misbehave as a result. It seems
there are cases where a child em_relids can contain additional relids
that are not present in the em_expr. For example, when a UNION ALL
child has a Const in the targetlist, as explained in a comment in
add_child_rel_equivalences(). However, there also seem to be cases
where the opposite is true.  I had to add the following code in
add_eq_member() to stop a regression test failing:

if (is_child)
expr_relids = bms_add_members(expr_relids, relids);

That's to make sure we add eclass_member_indexes to each RelOptInfo
mentioned in the em_expr.

After doing all that, I noticed that your benchmark was showing that
create_join_clause() was the new bottleneck. This was due to having to
loop so many times over the ec_sources to find an already built
RestrictInfo. I went off and added some new code to optimize the
lookup of those in a similar way by adding a new Bitmapset field in
RelOptInfo to index which ec_sources it mentioned, which meant having
to move ec_sources into PlannerInfo. I don't think this part of the
patch is quite right yet as the code I have relies on em_relids being
the same as the ones mentioned in the RestrictInfo. That seems not
true for em_is_child EMs, so I think we probably need to add a new
field to EquivalenceMember that truly is just pull_varnos from
em_expr, or else look into some way to make em_relids mean that (like
the comment claims).

Here are my results from running your benchmark on master (@f6c750d31)
with and without the attached patch.

npart master (ms) patched (ms) speedup
2   0.28 0.2995.92%
4   0.37 0.3896.75%
8   0.53 0.5694.43%
16 0.92 0.91 

Re: Patch proposal: New hooks in the connection path

2022-08-16 Thread Drouvot, Bertrand

Hi,

On 8/16/22 10:10 AM, Bharath Rupireddy wrote:

On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand  wrote:

On 8/14/22 7:52 AM, Gurjeet Singh wrote:

On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand  wrote:
I think we can reduce the number of places the hook is called, if we
call the hook from proc_exit(), and at all the other places we simply set
a global variable to signify the reason for the failure. The case of
_exit(1) from the signal-handler cannot use such a mechanism, but I
think all the other cases of interest can simply register one of the
FCET_* values, and let the call from proc_exit() pass that value
to the hook.

That looks like a good idea to me. I'm tempted to rewrite the patch that
way (and addressing the first comment in the same time).

Curious to hear about others hackers thoughts too.

IMO, calling the hook from proc_exit() is not a good design as
proc_exit() is a generic code called from many places in the source
code, even the simple code of kind  if(call_failed_conn_hook) {
falied_conn_hook(params);} can come in the way of many exit code paths
which is undesirable, and the likelihood of introducing new bugs may
increase.


Thanks for the feedback.

What do you think about calling the hook only if the new global variable 
is not equal to its default value (which would mean don't trigger the 
hook)?


Regards,

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com





Re: Patch proposal: New hooks in the connection path

2022-08-16 Thread Bharath Rupireddy
On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand  wrote:
>
> On 8/14/22 7:52 AM, Gurjeet Singh wrote:
> > On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand  
> > wrote:
>
> > I think we can reduce the number of places the hook is called, if we
> > call the hook from proc_exit(), and at all the other places we simply set
> > a global variable to signify the reason for the failure. The case of
> > _exit(1) from the signal-handler cannot use such a mechanism, but I
> > think all the other cases of interest can simply register one of the
> > FCET_* values, and let the call from proc_exit() pass that value
> > to the hook.
>
> That looks like a good idea to me. I'm tempted to rewrite the patch that
> way (and addressing the first comment in the same time).
>
> Curious to hear about others hackers thoughts too.

IMO, calling the hook from proc_exit() is not a good design as
proc_exit() is a generic code called from many places in the source
code, even the simple code of kind  if(call_failed_conn_hook) {
falied_conn_hook(params);} can come in the way of many exit code paths
which is undesirable, and the likelihood of introducing new bugs may
increase.

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/




RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-16 Thread wangw.f...@fujitsu.com
On Fri, August 12, 2022 17:22 PM Peter Smith  wrote:
> Here are some review comments for v20-0004:
> 
> (This completes my reviews of the v20* patch set. Sorry, the reviews
> are time consuming, so I am lagging slightly behind the latest posted
> version)

Thanks for your comments.

> 1. doc/src/sgml/ref/create_subscription.sgml
> 
> @@ -245,6 +245,11 @@ CREATE SUBSCRIPTION  class="parameter">subscription_namealso be the unique column on the publisher-side; 2) there cannot be
>any non-immutable functions used by the subscriber-side replicated
>table.
> +  When applying a streaming transaction, if either requirement is not
> +  met, the background worker will exit with an error.
> +  The parallel mode is disregarded when retrying;
> +  instead the transaction will be applied using on
> +  mode.
>   
> 
> The "on mode" still sounds strange to me. Maybe it's just my personal
> opinion, but I don’t really consider 'on' and 'off' to be "modes".
> Anyway I already posted the same comment several times before [1,
> #4.3]. Let's see what others think.
> 
> SUGGESTION
> "using on mode" -> "using streaming = on"

Okay, I will follow the relevant comments later.

> 2. src/backend/replication/logical/worker.c - start_table_sync
> 
> @@ -3902,20 +3925,28 @@ start_table_sync(XLogRecPtr *origin_startpos,
> char **myslotname)
>   }
>   PG_CATCH();
>   {
> + /*
> + * Emit the error message, and recover from the error state to an idle
> + * state
> + */
> + HOLD_INTERRUPTS();
> +
> + EmitErrorReport();
> + AbortOutOfAnyTransaction();
> + FlushErrorState();
> +
> + RESUME_INTERRUPTS();
> +
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid, false);
> +
>   if (MySubscription->disableonerr)
> - DisableSubscriptionAndExit();
> - else
> - {
> - /*
> - * Report the worker failed during table synchronization. Abort
> - * the current transaction so that the stats message is sent in an
> - * idle state.
> - */
> - AbortOutOfAnyTransaction();
> - pgstat_report_subscription_error(MySubscription->oid, false);
> + DisableSubscriptionOnError();
> 
> - PG_RE_THROW();
> - }
> + /* Set the retry flag. */
> + set_subscription_retry(true);
> +
> + proc_exit(0);
>   }
>   PG_END_TRY();
> 
> Perhaps current code is OK, but I am not 100% sure if we should set
> the retry flag when the disable_on_error is set, because the
> subscription is not going to be retried (because it is disabled). And
> later, if/when the user does enable the subscription, presumably that
> will be after they have already addressed the problem that caused the
> error/disablement in the first place.

I think it is okay. Because even after addressing the problem, it is also
*retrying* to apply the failed transaction. And, in the worst case, it just
applies the first failed streaming transaction using "on" mode instead of
"parallel" mode.

> 3. src/backend/replication/logical/worker.c - start_apply
> 
>   PG_CATCH();
>   {
> + /*
> + * Emit the error message, and recover from the error state to an idle
> + * state
> + */
> + HOLD_INTERRUPTS();
> +
> + EmitErrorReport();
> + AbortOutOfAnyTransaction();
> + FlushErrorState();
> +
> + RESUME_INTERRUPTS();
> +
> + /* Report the worker failed while applying changes */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> +
>   if (MySubscription->disableonerr)
> - DisableSubscriptionAndExit();
> - else
> - {
> - /*
> - * Report the worker failed while applying changes. Abort the
> - * current transaction so that the stats message is sent in an
> - * idle state.
> - */
> - AbortOutOfAnyTransaction();
> - pgstat_report_subscription_error(MySubscription-
> >oid, !am_tablesync_worker());
> + DisableSubscriptionOnError();
> 
> - PG_RE_THROW();
> - }
> + /* Set the retry flag. */
> + set_subscription_retry(true);
>   }
>   PG_END_TRY();
>  }
> 
> 3a.
> Same comment as #2
> 
> 3b.
> This PG_CATCH used to leave by either proc_exit(0) or PG_RE_THROW but
> what does it do now? My first impression is there is a bug here due to
> some missing code, because AFAICT the exception is caught and gobbled
> up and then what...?

=>3a.
See the reply to #2.
=>3b.
The function `proc_exit(0)` is invoked after invoking function start_apply. See
function ApplyWorkerMain.

> 4. src/backend/replication/logical/worker.c - set_subscription_retry
> 
> + if (MySubscription->retry == retry ||
> + am_apply_bgworker())
> + return;
> 
> 4a.
> I this this quick exit can be split and given some appropriate comments
> 
> SUGGESTION (for example)
> /* Fast path - if no state change then nothing to do */
> if (MySubscription->retry == retry)
> return;
> 
> /* Fast path - skip for apply background workers */
> if (am_apply_bgworker())
> return;

Changed.

> 5. .../subscription/t/032_streaming_apply.pl
> 
> @@ -78,9 +78,13 @@ my $timer =
> 

Re: Expose Parallelism counters planned/execute in pg_stat_statements

2022-08-16 Thread Julien Rouhaud
Hi,

On Fri, Jul 29, 2022 at 08:36:44AM -0500, Daymel Bonne Solís wrote:
>
> We have rewritten the patch and added the necessary columns to have the
> number of times a parallel query plan was not executed using parallelism.
>
> We are investigating how to add more information related to the workers
> created
> by the Gather/GatherMerge nodes, but it is not a trivial task.

As far as I can see the scope of the counters is now different.  You said you
wanted to be able to identify when a parallel query plan cannot be executed
with parallelism, but what the fields are now showing is simply whether no
workers were launched at all.  It could be because of the dbeaver behavior you
mentioned (the !es_use_parallel_mode case), but also if the executor did try to
launch parallel workers and didn't get any.

I don't think that's an improvement.  With this patch if you see the
"paral_planned_not_exec" counter going up, you still don't know if this is
because of the !es_use_parallel_mode or if you simply have too many parallel
queries running at the same time, or both, and therefore can't do much with
that information.  Both situations are different and in my opinion require
different (and specialized) counters to properly handle them.

Also, I don't think that paral_planned_exec and paral_planned_not_exec are good
column (and variable) names.  Maybe something like
"parallel_exec_count" and "forced_non_parallel_exec_count" (assuming it's based
on a parallel plan and !es_use_parallel_mode).




Re: Allow file inclusion in pg_hba and pg_ident files

2022-08-16 Thread Julien Rouhaud
Hi,

On Sat, Jul 30, 2022 at 04:09:36PM +0800, Julien Rouhaud wrote:

>
> - 0001: the rule_number / mapping_number addition in the views in a separate
>   commit
> - 0002: the main file inclusion patch.  Only a few minor bugfix since
>   previous version discovered thanks to the tests (a bit more about it after),
>   and documentation tweaks based on previous discussions
> - 0003: the pg_hba_matches() POC, no changes

Attached v9, simple rebase after multiple conflicts and part of the patchset
applied.
>From 5430fd3b829b9c7600b61095752d4f7f85d1150d Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 30 May 2022 10:59:51 +0800
Subject: [PATCH v9 1/3] Add rule_number / mapping_number to the
 pg_hba/pg_ident views.

Author: Julien Rouhaud
Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya%40jrouhaud
---
 doc/src/sgml/system-views.sgml  | 22 +
 src/backend/utils/adt/hbafuncs.c| 50 ++---
 src/include/catalog/pg_proc.dat | 11 ---
 src/test/regress/expected/rules.out | 10 +++---
 4 files changed, 72 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 44aa70a031..1d619427c1 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -991,6 +991,18 @@
 
 
 
+ 
+  
+   rule_number int4
+  
+  
+   Rule number of this rule among all rules if the rule is valid, otherwise
+   null. This indicates the order in which each rule will be considered
+   until the first matching one, if any, is used to perform authentication
+   with the client.
+  
+ 
+
  
   
line_number int4
@@ -1131,6 +1143,16 @@
 
 
 
+ 
+  
+   mapping_number int4
+  
+  
+   Mapping number, in priority order, of this mapping if the mapping is
+   valid, otherwise null
+  
+ 
+
  
   
line_number int4
diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index 9e5794071c..c9be4bff1f 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -26,10 +26,12 @@
 
 static ArrayType *get_hba_options(HbaLine *hba);
 static void fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
- int lineno, HbaLine *hba, 
const char *err_msg);
+ int rule_number, int lineno, 
HbaLine *hba,
+ const char *err_msg);
 static void fill_hba_view(Tuplestorestate *tuple_store, TupleDesc tupdesc);
 static void fill_ident_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
-   int lineno, IdentLine 
*ident, const char *err_msg);
+   int mapping_number, int 
lineno, IdentLine *ident,
+   const char *err_msg);
 static void fill_ident_view(Tuplestorestate *tuple_store, TupleDesc tupdesc);
 
 
@@ -157,7 +159,7 @@ get_hba_options(HbaLine *hba)
 }
 
 /* Number of columns in pg_hba_file_rules view */
-#define NUM_PG_HBA_FILE_RULES_ATTS  9
+#define NUM_PG_HBA_FILE_RULES_ATTS  10
 
 /*
  * fill_hba_line
@@ -165,6 +167,7 @@ get_hba_options(HbaLine *hba)
  *
  * tuple_store: where to store data
  * tupdesc: tuple descriptor for the view
+ * rule_number: unique rule identifier among all valid rules
  * lineno: pg_hba.conf line number (must always be valid)
  * hba: parsed line data (can be NULL, in which case err_msg should be set)
  * err_msg: error message (NULL if none)
@@ -174,7 +177,8 @@ get_hba_options(HbaLine *hba)
  */
 static void
 fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
- int lineno, HbaLine *hba, const char *err_msg)
+ int rule_number, int lineno, HbaLine *hba,
+ const char *err_msg)
 {
Datum   values[NUM_PG_HBA_FILE_RULES_ATTS];
boolnulls[NUM_PG_HBA_FILE_RULES_ATTS];
@@ -193,6 +197,11 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc 
tupdesc,
memset(nulls, 0, sizeof(nulls));
index = 0;
 
+   /* rule_number */
+   if (err_msg)
+   nulls[index++] = true;
+   else
+   values[index++] = Int32GetDatum(rule_number);
/* line_number */
values[index++] = Int32GetDatum(lineno);
 
@@ -336,7 +345,7 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc 
tupdesc,
else
{
/* no parsing result, so set relevant fields to nulls */
-   memset([1], true, (NUM_PG_HBA_FILE_RULES_ATTS - 2) * 
sizeof(bool));
+   memset([2], true, (NUM_PG_HBA_FILE_RULES_ATTS - 3) * 
sizeof(bool));
}
 
/* error */
@@ -359,6 +368,7 @@ fill_hba_view(Tuplestorestate *tuple_store, TupleDesc 
tupdesc)
 

Re: Support logical replication of global object commands

2022-08-16 Thread Zheng Li
> Can we think of relying to send WAL of such DDLs just based on whether
> there is a corresponding publication (ex. publication of ALL OBJECTS)?
> I mean avoid database-specific filtering in decoding for such DDL
> commands but not sure how much better it is than the current proposal?

I think a publication of ALL OBJECTS sounds intuitive. Does it mean we'll
publish all DDL commands, all commit and abort operations in every
database if there is such publication of ALL OBJECTS?

Best,
Zheng