date:20210129

Re: [PATCH] pg_hba.conf error messages for logical replication connections

2021-01-29 Thread Amit Kapila

On Sat, Jan 30, 2021 at 12:24 AM Paul Martinez  wrote:
>
> On Thu, Jan 28, 2021 at 8:17 PM Amit Kapila  wrote:
> >
> > What exactly are you bothered about here? Is the database name not
> > present in the message your concern or the message uses 'replication'
> > but actually it doesn't relate to 'replication' specified in
> > pg_hba.conf your concern? I think with the current scheme one might
> > say that replication word in error message helps them to distinguish
> > logical replication connection error from a regular connection error.
> > I am not telling what you are proposing is wrong but just that the
> > current scheme of things might be helpful to some users. If you can
> > explain a bit how the current message misled you and the proposed one
> > solves that confusion that would be better?
> >
>
> My main confusion arose from conflating the word "replication" in the
> error message with the "replication" keyword in pg_hba.conf.
>
> In my case, I was actually confused because I was creating logical
> replication connections that weren't getting rejected, despite a lack
> of any "replication" rules in my pg_hba.conf. I had the faulty
> assumption that replication connection requires "replication" keyword,
> and my change to the documentation makes it explicit that logical
> replication connections do not match the "replication" keyword.
>

I think it is good to be more explicit in the documentation but we
already mention "physical replication connection" in the sentence. So
it might be better that we add a separate sentence related to logical
replication.

> I was digging through the code trying to understand why it was working,
> and also making manual connections before I stumbled upon these error
> messages.
>
> The fact that the error message doesn't include the database name
> definitely contributed to my confusion. It only mentions the word
> "replication", and not a database name, and that reinforces the idea
> that the "replication" keyword rule should apply, because it seems
> "replication" has replaced the database name.
>
> But overall, I would agree that the current messages aren't wrong,
> and my fix could still cause confusion because now logical replication
> connections won't be described as "replication" connections.
>
> How about explicitly specifying physical vs. logical replication in the
> error message, and also adding hints for clarifying the use of
> the "replication" keyword in pg_hba.conf?
>

Yeah, hints or more details might improve the situation but I am not
sure we want to add more branching here. Can we write something
similar to HOSTNAME_LOOKUP_DETAIL for hints? Also, I think what you
are proposing to write is more of a errdetail kind of message. See
more error routines in the docs [1].

[1] - https://www.postgresql.org/docs/devel/error-message-reporting.html

-- 
With Regards,
Amit Kapila.

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Peter Geoghegan

On Fri, Jan 29, 2021 at 6:44 PM Tom Lane  wrote:
> Pointer width is interesting, but really it's a solved problem
> compared to these.

What about USE_FLOAT8_BYVAL?

-- 
Peter Geoghegan

Re: logical replication worker accesses catalogs in error context callback

2021-01-29 Thread Andres Freund

Hi,

On 2021-01-28 11:14:03 +0530, Amit Kapila wrote:
> On Wed, Jan 27, 2021 at 9:38 AM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Jan 27, 2021 at 7:48 AM Zhihong Yu  wrote:
> >
> > Thanks for pointing to the changes in the commit message. I corrected
> > them. Attaching v4 patch set, consider it for further review.
> >
> 
> About 0001, have we tried to reproduce the actual bug here which means
> when the error_callback is called we should face some problem? I feel
> with the correct testcase we should hit the Assert
> (Assert(IsTransactionState());) in SearchCatCacheInternal because
> there we expect the transaction to be in a valid state. I understand
> that the transaction is in a broken state at that time but having a
> testcase to hit the actual bug makes it easy to test the fix.

I think it's easy to hit - add a trivial DEBUG message to XLogWrite(),
and run anything involving logical rep using a low enough log
level. There might even be enough messages with a low enough level
already somewhere in the relevant paths.

Greetings,

Andres Freund

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Tom Lane

Peter Geoghegan  writes:
> Broad trends have made it easier to write portable C code, but that
> doesn't apply to 32-bit machines, I imagine. Including even the
> extremely low power 32-bit chips that are not yet fully obsolete, like
> the Raspberry Pi Zero's chip.

Meh.  To my mind, the most interesting aspects of different hardware
platforms for our purposes are

* alignment sensitivity (particularly, is unaligned access expensive);
* spinlock support, and after that various other atomic instructions;
* endianness

Pointer width is interesting, but really it's a solved problem
compared to these.

regards, tom lane

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Peter Geoghegan

On Fri, Jan 29, 2021 at 6:33 PM Michael Paquier  wrote:
> Those 32-bit modules are still being sold actively by the RPI
> foundation, and used as cheap machines for education purposes, so I
> think that it is still useful for Postgres to have active buildfarm
> members for 32-bit architectures.

But I'm not arguing against that. I'm merely arguing that it is okay
to regress 32-bit platforms (within reason) in order to make them more
like 64-bit platforms. This makes them less prone to subtle
portability bugs that the regression tests won't catch, so even 32-bit
Postgres may well come out ahead, in a certain sense.

-- 
Peter Geoghegan

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Peter Geoghegan

On Fri, Jan 29, 2021 at 5:53 PM Tom Lane  wrote:
> Hmph ... three of my five buildfarm animals are 32-bit, plus I
> have got 32-bit OSes for my Raspberry Pi ;-).  Admittedly, none
> of those represent hardware someone would put a serious database
> on today.  But in terms of testing diversity, I think they're
> a lot more credible than thirty-one flavors of Linux on x86_64.

Fair enough.

To be clear I meant testing in the deepest and most general sense --
not simply running the tests. If you happen to be using approximately
the same platform as most Postgres hackers, it's reasonable to expect
to run into fewer bugs tied to portability issues. Regardless of
whether or not the minority platform you were considering has
theoretical testing parity.

Broad trends have made it easier to write portable C code, but that
doesn't apply to 32-bit machines, I imagine. Including even the
extremely low power 32-bit chips that are not yet fully obsolete, like
the Raspberry Pi Zero's chip.

-- 
Peter Geoghegan

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Michael Paquier

On Fri, Jan 29, 2021 at 08:53:49PM -0500, Tom Lane wrote:
> Peter Geoghegan  writes:
>> I don't know anybody that still runs Postgres
>> (or anything like it) on a 32-bit platform. I think that Michael
>> Paquier owns a Raspberry Pi zero, but that hardly counts.

hamster died a couple of years ago, it was a RPI1 and I have not
bought one after.  RIP to it.  I still have dangomushi, a RPI2, based
on armv7l and that's 32-bit.  Heikki has chipmunk, which is a RPI1
last time we discussed about that.  The good thing about those
machines is that they are low-energy consumers, and silent.  So it is
easy to forget about them and just let them be.

> Hmph ... three of my five buildfarm animals are 32-bit, plus I
> have got 32-bit OSes for my Raspberry Pi ;-).  Admittedly, none
> of those represent hardware someone would put a serious database
> on today.  But in terms of testing diversity, I think they're
> a lot more credible than thirty-one flavors of Linux on x86_64.

Those 32-bit modules are still being sold actively by the RPI
foundation, and used as cheap machines for education purposes, so I
think that it is still useful for Postgres to have active buildfarm
members for 32-bit architectures.
--
Michael

signature.asc
Description: PGP signature

Re: LogwrtResult contended spinlock

2021-01-29 Thread Andres Freund

Hi,

On 2021-01-29 22:47:56 -0300, Alvaro Herrera wrote:
> So I tried this, but -- perhaps not suprisingly -- I can't get it to
> work properly; the synchronization fails.

What do you mean by "synchronization fails"?


> Strangely, all tests work for me, but the pg_upgrade one in particular
> fails.

It's one of the few tests using fsync=on.


> +static inline void
> +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 
> target_)
> +{
> +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> + AssertPointerAlignment(ptr, 8);
> +#endif
> + /* FIXME is this algorithm correct if we have u64 simulation? */

I don't see a problem.


> + while (true)
> + {
> + uint64  currval;
> +
> + currval = pg_atomic_read_u64(ptr);
> + if (currval > target_)
> + break;  /* already done by somebody else */
> + if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
> + break;  /* successfully done */
> + }
> +}

I suggest writing this as

currval = pg_atomic_read_u64(ptr);
while (currval < target_)
{
if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
break;
}

>  /*
>   * Inserting to WAL is protected by a small fixed number of WAL insertion
> @@ -596,8 +599,10 @@ typedef struct XLogCtlData
>  {
>   XLogCtlInsert Insert;
>  
> + XLogwrtAtomic LogwrtRqst;
> + XLogwrtAtomic LogwrtResult;
> +
>   /* Protected by info_lck: */
> - XLogwrtRqst LogwrtRqst;

Not sure putting these into the same cacheline is a good idea.


> @@ -2166,12 +2163,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool 
> opportunistic)
>   if (opportunistic)
>   break;
>  
> - /* Before waiting, get info_lck and update LogwrtResult 
> */
> - SpinLockAcquire(&XLogCtl->info_lck);
> - if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
> - XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
> - LogwrtResult = XLogCtl->LogwrtResult;
> - SpinLockRelease(&XLogCtl->info_lck);
> + /* Before waiting, update LogwrtResult */
> + 
> pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, OldPageRqstPtr);
> +
> + LogwrtResult.Write = 
> pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> + LogwrtResult.Flush = 
> pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);

I don't think it's quite as easy as this. Write/Flush now aren't
guaranteed to be coherent with each other - previously they were. And
because it's in a global variable used everywhere, we can't just be more
careful about update protocols in one place...

We also shouldn't re-read a variable that we just did via the
pg_atomic_monotonic_advance_u64().

I think we should stop updating both the Write/Flush position at the
same time. That way we don't have an expectation of them being coherent
with each other. Most places really don't need both anyway.


>   {
> - SpinLockAcquire(&XLogCtl->info_lck);
> - XLogCtl->LogwrtResult = LogwrtResult;
> - if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
> - XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
> - if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
> - XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
> - SpinLockRelease(&XLogCtl->info_lck);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Write, 
> LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Flush, 
> LogwrtResult.Flush);
> +
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, 
> LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Flush, 
> LogwrtResult.Flush);

Hm. We went from one cheap atomic operation (SpinLockAcquire) to four
expensive ones in the happy path. That's not free...

I don't think we need to manipulate XLogCtl->LogwrtResult.* using atomic
ops - they can only be updated with WALWriteLock held, right?

XLogCtl->LogwrtResult was updated with plain assignment before, why did
you change it to pg_atomic_monotonic_advance_u64()?


> @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
>   {
>   /* advance global request to include new block(s) */
>   pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, 
> EndPos);
> + pg_memory_barrier();
> +

That's not really useful - the path that actually updates already
implies a barrier. It'd probably be better to add a barrier to a "never
executed cmpxchg" fastpath.



> @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
>   WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
>   LogwrtResult.Write = 
> pg_atomic_read

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-29 Thread Michael Paquier

On Fri, Jan 29, 2021 at 08:56:47PM +0300, Alexey Kondratov wrote:
> On 2021-01-28 14:42, Alexey Kondratov wrote:
>> No, you are right, we are doing REINDEX with AccessExclusiveLock as it
>> was before. This part is a more specific one. It only applies to
>> partitioned indexes, which do not hold any data, so we do not reindex
>> them directly, only their leafs. However, if we are doing a TABLESPACE
>> change, we have to record it in their pg_class entry, so all future
>> leaf partitions were created in the proper tablespace.
>> 
>> That way, we open partitioned index relation only for a reference,
>> i.e. read-only, but modify pg_class entry under a proper lock
>> (RowExclusiveLock). That's why I thought that ShareLock will be
>> enough.
>> 
>> IIUC, 'ALTER TABLE ... SET TABLESPACE' uses AccessExclusiveLock even
>> for relations with no storage, since AlterTableGetLockLevel() chooses
>> it if AT_SetTableSpace is met. This is very similar to our case, so
>> probably we should do the same?
>> 
>> Actually it is not completely clear for me why
>> ShareUpdateExclusiveLock is sufficient for newly added
>> SetRelationTableSpace() as Michael wrote in the comment.

Nay, it was not fine.  That's something Alvaro has mentioned, leading
to 2484329.  This also means that the main patch of this thread should
refresh the comments at the top of CheckRelationTableSpaceMove() and
SetRelationTableSpace() to mention that this is used by REINDEX
CONCURRENTLY with a lower lock.

> Changed patch to use AccessExclusiveLock in this part for now. This is what
> 'ALTER TABLE/INDEX ... SET TABLESPACE' and 'REINDEX' usually do. Anyway, all
> real leaf partitions are processed in the independent transactions later.

+   if (partkind == RELKIND_PARTITIONED_INDEX)
+   {
+   Relation iRel = index_open(partoid, AccessExclusiveLock);
+
+   if (CheckRelationTableSpaceMove(iRel, params->tablespaceOid))
+   SetRelationTableSpace(iRel,
+ params->tablespaceOid,
+ InvalidOid);
+   index_close(iRel, NoLock);
Are you sure that this does not represent a risk of deadlocks as EAL
is not taken consistently across all the partitions?  A second issue
here is that this breaks the assumption of REINDEX CONCURRENTLY kicked
on partitioned relations that should use ShareUpdateExclusiveLock for
all its steps.  This would make the first transaction invasive for the
user, but we don't want that.

This makes me really wonder if we would not be better to restrict this
operation for partitioned relation as part of REINDEX as a first step.
Another thing, mentioned upthread, is that we could do this part of
the switch at the last transaction, or we could silently *not* do the
switch for partitioned indexes in the flow of REINDEX, letting users
handle that with an extra ALTER TABLE SET TABLESPACE once REINDEX has
finished on all the partitions, cascading the command only on the
partitioned relation of a tree.  It may be interesting to look as well
at if we could lower the lock used for partitioned relations with
ALTER TABLE SET TABLESPACE from AEL to SUEL, choosing AEL only if at
least one partition with storage is involved in the command,
CheckRelationTableSpaceMove() discarding anything that has no need to
change.
--
Michael


signature.asc
Description: PGP signature

Re: Thoughts on "killed tuples" index hint bits support on standby

2021-01-29 Thread Peter Geoghegan

On Thu, Jan 28, 2021 at 10:16 AM Michail Nikolaev
 wrote:
> > I wonder if it would help to not actually use the LP_DEAD bit for
> > this. Instead, you could use the currently-unused-in-indexes
> > LP_REDIRECT bit.
>
> Hm… Sound very promising - an additional bit is a lot in this situation.

Yeah, it would help a lot. But those bits are precious. So it makes
sense to think about what to do with both of them in index AMs at the
same time.  Otherwise we risk missing some important opportunity.

> > Whether or not "recently dead" means "dead to my
> > particular MVCC snapshot" can be determined using some kind of
> > in-memory state that won't survive a crash (or a per-index-page
> > epoch?).
>
> Do you have any additional information about this idea? (maybe some thread). 
> What kind of “in-memory state that won't survive a crash” and how to deal 
> with flushed bits after the crash?

Honestly, that part wasn't very well thought out. A lot of things might work.

Some kind of "recently dead" bit is easier on the primary. If we have
recently dead bits set on the primary (using a dedicated LP bit for
original execution recently-dead-ness), then we wouldn't even
necessarily have to change anything about index scans/visibility at
all. There would still be a significant benefit if we simply used the
recently dead bits when considering which heap blocks nbtree simple
deletion will visit inside _bt_deadblocks() -- in practice there would
probably be no real downside from assuming that the recently dead bits
are now fully dead (it would sometimes be wrong, but not enough to
matter, probably only when there is a snapshot held for way way too
long).

Deletion in indexes can work well while starting off with only an
*approximate* idea of which index tuples will be safe to delete --
this is a high level idea behind my recent commit d168b666823. It
seems very possible that that could be pushed even further here on the
primary.

On standbys (which set standby recently dead bits) it will be
different, because you need "index hint bits" set that are attuned to
the workload on the standby, and because you don't ever use the bit to
help with deleting anything on the standby (that all happens during
original execution).

BTW, what happens when the page splits on the primary, btw? Does your
patch "move over" the LP_DEAD bits to each half of the split?

> Hm. What is about this way:
>
> 10 - dead to all on standby (LP_REDIRECT)
> 11 - dead to all on primary (LP_DEAD)
> 01 - future “recently DEAD” on primary (LP_NORMAL)

Not sure.

> Also, looks like both GIST and HASH indexes also do not use LP_REDIRECT.

Right -- if we were to do this, the idea would be that it would apply
to all index AMs that currently have (or will ever have) something
like the LP_DEAD bit stuff. The GiST and hash support for index
deletion is directly based on the original nbtree version, and there
is no reason why we cannot eventually do all this stuff in at least
those three AMs.

There are already some line-pointer level differences in index AMs:
LP_DEAD items have storage in index AMs, but not in heapam. This
all-table-AMs/all-index-AMs divide in how item pointers work would be
preserved.

> Also, btw, do you know any reason to keep minRecoveryPoint at a low value?

Not offhand.

--
Peter Geoghegan

Re: LogwrtResult contended spinlock

2021-01-29 Thread Andres Freund

Hi,


On 2021-01-29 12:40:18 -0300, Alvaro Herrera wrote:
> On 2020-Aug-31, Andres Freund wrote:
> 
> > Wouldn't the better fix here be to allow reading of individual members 
> > without a lock? E.g. by wrapping each in a 64bit atomic.
> 
> So I've been playing with this and I'm annoyed about having two
> datatypes to represent Write/Flush positions:
> 
> typedef struct XLogwrtRqst
> {
>   XLogRecPtr  Write;  /* last byte + 1 to write out */
>   XLogRecPtr  Flush;  /* last byte + 1 to flush */
> } XLogwrtRqst;
> 
> typedef struct XLogwrtResult
> {
>   XLogRecPtr  Write;  /* last byte + 1 written out */
>   XLogRecPtr  Flush;  /* last byte + 1 flushed */
> } XLogwrtResult;
> 
> Don't they look, um, quite similar?  I am strongly tempted to remove
> that distinction, since it seems quite pointless, and introduce a
> different one:

Their spelling drives me nuts. Like, one is *Rqst, the other *Result?
Comeon.


> Now, I do wonder if there's a point in keeping LogwrtResult as a local
> variable at all; maybe since the ones in shared memory are going to use
> unlocked access, we don't need it anymore?  I'd prefer to defer that
> decision to after this patch is done, since ISTM that it'd merit more
> careful benchmarking.

I think doing that might be a bit harmful - we update LogwrtResult
fairly granularly in XLogWrite(). Doing that in those small steps in
shared memory will increase the likelihood of cache misses in other
backends.

Greetings,

Andres Freund

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Tom Lane

Peter Geoghegan  writes:
> On Fri, Jan 29, 2021 at 4:01 PM Tom Lane  wrote:
>> We do still have x86 in the buildfarm, as well as some other
>> 32-bit platforms, so I don't agree that it's that much less
>> tested than non-mainstream 64-bit platforms.

> I don't know anybody that still runs Postgres
> (or anything like it) on a 32-bit platform. I think that Michael
> Paquier owns a Raspberry Pi zero, but that hardly counts.

Hmph ... three of my five buildfarm animals are 32-bit, plus I
have got 32-bit OSes for my Raspberry Pi ;-).  Admittedly, none
of those represent hardware someone would put a serious database
on today.  But in terms of testing diversity, I think they're
a lot more credible than thirty-one flavors of Linux on x86_64.

regards, tom lane

Re: LogwrtResult contended spinlock

2021-01-29 Thread Alvaro Herrera

So I tried this, but -- perhaps not suprisingly -- I can't get it to
work properly; the synchronization fails.  I suspect I need some
barriers, but I tried adding a few (probably some that are not really
necessary) and that didn't have the expected effect.  Strangely, all
tests work for me, but the pg_upgrade one in particular fails.

(The attached is of course POC quality at best.)

I'll have another look next week.

-- 
Álvaro Herrera39°49'30"S 73°17'W
>From 90507185357391a661cd856fc231b140079c8d78 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 29 Jan 2021 22:34:23 -0300
Subject: [PATCH v2 1/3] add pg_atomic_monotonic_advance_u64

---
 src/include/port/atomics.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index 856338f161..752f39d66e 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -519,6 +519,25 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
 	return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
 }
 
+static inline void
+pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
+{
+#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
+	AssertPointerAlignment(ptr, 8);
+#endif
+	/* FIXME is this algorithm correct if we have u64 simulation? */
+	while (true)
+	{
+		uint64		currval;
+
+		currval = pg_atomic_read_u64(ptr);
+		if (currval > target_)
+			break;	/* already done by somebody else */
+		if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
+			break;	/* successfully done */
+	}
+}
+
 #undef INSIDE_ATOMICS_H
 
 #endif			/* ATOMICS_H */
-- 
2.20.1

>From 84a62ac0594f8f0b13c06958cf6e94e2a2723082 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 29 Jan 2021 22:34:04 -0300
Subject: [PATCH v2 2/3] atomics in xlog.c

---
 src/backend/access/transam/xlog.c | 129 ++
 1 file changed, 61 insertions(+), 68 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f03bd473e2..8073a92ceb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -413,7 +413,8 @@ static XLogRecPtr RedoStartLSN = InvalidXLogRecPtr;
  * which is updated when convenient.
  *
  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
- * (protected by info_lck), but we don't need to cache any copies of it.
+ * (which use atomic access, so no locks are needed), but we don't need to
+ * cache any copies of it.
  *
  * info_lck is only held long enough to read/update the protected variables,
  * so it's a plain spinlock.  The other locks are held longer (potentially
@@ -433,17 +434,19 @@ static XLogRecPtr RedoStartLSN = InvalidXLogRecPtr;
  *--
  */
 
-typedef struct XLogwrtRqst
+/* Write/Flush position to be used in shared memory */
+typedef struct XLogwrtAtomic
 {
-	XLogRecPtr	Write;			/* last byte + 1 to write out */
-	XLogRecPtr	Flush;			/* last byte + 1 to flush */
-} XLogwrtRqst;
+	pg_atomic_uint64 Write;			/* last byte + 1 of write position */
+	pg_atomic_uint64 Flush;			/* last byte + 1 of flush position */
+} XLogwrtAtomic;
 
-typedef struct XLogwrtResult
+/* Write/Flush position to be used in process-local memory */
+typedef struct XLogwrt
 {
 	XLogRecPtr	Write;			/* last byte + 1 written out */
 	XLogRecPtr	Flush;			/* last byte + 1 flushed */
-} XLogwrtResult;
+} XLogwrt;
 
 /*
  * Inserting to WAL is protected by a small fixed number of WAL insertion
@@ -596,8 +599,10 @@ typedef struct XLogCtlData
 {
 	XLogCtlInsert Insert;
 
+	XLogwrtAtomic LogwrtRqst;
+	XLogwrtAtomic LogwrtResult;
+
 	/* Protected by info_lck: */
-	XLogwrtRqst LogwrtRqst;
 	XLogRecPtr	RedoRecPtr;		/* a recent copy of Insert->RedoRecPtr */
 	FullTransactionId ckptFullXid;	/* nextXid of latest checkpoint */
 	XLogRecPtr	asyncXactLSN;	/* LSN of newest async commit/abort */
@@ -613,12 +618,6 @@ typedef struct XLogCtlData
 	pg_time_t	lastSegSwitchTime;
 	XLogRecPtr	lastSegSwitchLSN;
 
-	/*
-	 * Protected by info_lck and WALWriteLock (you must hold either lock to
-	 * read it, but both to update)
-	 */
-	XLogwrtResult LogwrtResult;
-
 	/*
 	 * Latest initialized page in the cache (last byte position + 1).
 	 *
@@ -779,7 +778,7 @@ static int	UsableBytesInSegment;
  * Private, possibly out-of-date copy of shared LogwrtResult.
  * See discussion above.
  */
-static XLogwrtResult LogwrtResult = {0, 0};
+static XLogwrt LogwrtResult = {0, 0};
 
 /*
  * Codes indicating where we got a WAL file from during recovery, or where
@@ -910,7 +909,7 @@ static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
 
 static void AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic);
 static bool XLogCheckpointNeeded(XLogSegNo new_segno);
-static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible);
+static void XLogWrite(XLogwrt WriteRqst, bool flexible);
 static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
    bool find_free, XLogS

Re: Support for NSS as a libpq TLS backend

2021-01-29 Thread Michael Paquier

On Fri, Jan 29, 2021 at 02:13:30PM +0100, Daniel Gustafsson wrote:
> I'm still not convinced that adding --with-ssl=openssl is worth it before the
> rest of NSS goes in (and more importantly, *if* it goes in).
>
> On the one hand, we already have pluggable (for some value of) support for
> adding TLS libraries, and adding --with-ssl is one more piece of that puzzle.
> We could of course have endless --with-X options instead but as you say,
> --with-uuid has set the tone here (and I believe that's good).  On the other
> hand, if we never add any other library than OpenSSL then it's just complexity
> without benefit.

IMO, one could say the same thing for any piece of refactoring we have
done in the past to make the TLS/crypto code more modular.  There is
demand for being able to choose among multiple SSL libs at build time,
and we are still in a phase where we evaluate the options at hand.
This refactoring is just careful progress, and this is one step in
this direction.  The piece about refactoring the SSL tests is
similar.

> As mentioned elsewhere in the thread, the current v23 patchset has the
> --with-ssl change as a separate commit to at least make it visual what it 
> looks
> like.  The documentation changes are in the main NSS patch though since
> documenting --with-ssl when there is only one possible value didn't seem to be
> helpful to users whom are fully expected to use --with-openssl still.

The documentation changes should be part of the patch introducing the
switch IMO: a description of the new switch, as well as a paragraph
about the old value being deprecated.  That's done this way for UUID.
--
Michael


signature.asc
Description: PGP signature

Re: New IndexAM API controlling index vacuum strategies

2021-01-29 Thread Peter Geoghegan

On Tue, Jan 26, 2021 at 10:59 PM Masahiko Sawada  wrote:
> What value is set to fillfactor?

90, same as before.

> That's very good. I'm happy that this patch efficiently utilizes
> bottom-up index deletion feature.

Me too!

> Looking at the relation size growth, there is almost no difference
> between master and patched in spite of skipping some vacuums in the
> patched test, which is also good.

Right. Stability is everything here. Actually I think that most
performance problems in Postgres are mostly about stability if you
really look into it.

> > I did notice a problem, though. I now think that the criteria for
> > skipping an index vacuum in the third patch from the series is too
> > conservative, and that this led to an excessive number of index
> > vacuums with the patch.
>
> Maybe that's why there are 5 autovacuum runs on pgbench_accounts in
> the master branch whereas there are 7 runs in the patched?

Probably, but it might also be due to some other contributing factor.
There is still very little growth in the size of the indexes, and the
PK still has zero growth. The workload consists of 32 hours of a
10ktps workload, so I imagine that there is opportunity for some
extremely rare event to happen a few times. Too tired to think about
it in detail right now.

It might also be related to the simple fact that only one VACUUM
process may run against a table at any given time! With a big enough
table, and several indexes, and reasonably aggressive av settings,
it's probably almost impossible for autovacuum to "keep up" (in the
exact sense that the user asks for by having certain av settings).
This must be taken into account in some general way --

It's a bit tricky to interpret results here, generally speaking,
because there are probably a few things like that. To me, the most
important thing is that the new behavior "makes sense" in some kind of
general way, that applies across a variety of workloads. It may not be
possible to directly compare master and patch like this and arrive at
one simple number that is fair. If we really wanted one simple
benchmark number, maybe we'd have to tune the patch and master
separately -- which doesn't *seem* fair.

> Also makes sense to me. The patch I recently submitted doesn't include
> it but I'll do that in the next version patch.

Great!

> Maybe the same is true for heap? I mean that skipping heap vacuum on a
> too-small table will not bring the benefit but bloat. I think we could
> proceed with heap vacuum if a table is smaller than a threshold, even
> if one of the indexes wanted to skip.

I think that you're probably right about that. It isn't a problem for
v2 in practice because the bloat will reliably cause LP_DEAD line
pointers to accumulate in heap pages, so you VACUUM anyway -- this is
certainly what you *always* see in the small pgbench tables with the
default workload. But even then -- why not be careful? I agree that
there should be some kind of limit on table size that applies here --
a size at which we'll never apply any of these optimizations, no
matter what.

> Yeah, I think the following information would also be helpful:
>
> * did vacuum heap? or skipped it?
> * how many indexes did/didn't bulk-deletion?
> * time spent for each vacuum phase.

That list looks good -- in general I don't like that log_autovacuum
cannot ever have the VACUUM VERBOSE per-index output -- maybe that
could be revisited soon? I remember reading your e-mail about this on
a recent thread, and I imagine that you already saw the connection
yourself.

It'll be essential to have good instrumentation as we do more
benchmarking. We're probably going to have to make subjective
assessments of benchmark results, based on multiple factors. That will
probably be the only practical way to assess how much better (or
worse) the patch is compared to master. This patch is more about
efficiency and predictability than performance per se. Which is good,
because that's where most of the real world problems actually are.

-- 
Peter Geoghegan

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Peter Geoghegan

On Fri, Jan 29, 2021 at 4:01 PM Tom Lane  wrote:
> > It's probably much riskier to use 32-bit x86 today than
> > it is to use (say) POWER8, or some other contemporary minority
> > platform.
>
> We do still have x86 in the buildfarm, as well as some other
> 32-bit platforms, so I don't agree that it's that much less
> tested than non-mainstream 64-bit platforms.  But I do agree
> it's not our main development focus anymore, and shouldn't be.

I was arguing that it's much less tested *in effect*. It seems like
the trend is very much in the direction of less and less ISA level
differentiation.

Consider (just to pick one example) the rationale behind the RISC-V initiative:

https://en.wikipedia.org/wiki/RISC-V#Rationale

In many ways my x86-64 Macbook is closer to the newer M1 Macbook than
it is to some old 32-bit x86 machine. I suspect that this matters. I
am speculating here, of course -- I have to because there is no
guidance to work off of. I don't know anybody that still runs Postgres
(or anything like it) on a 32-bit platform. I think that Michael
Paquier owns a Raspberry Pi zero, but that hardly counts.

-- 
Peter Geoghegan

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-29 Thread Bharath Rupireddy

On Sat, Jan 30, 2021 at 12:14 AM Fujii Masao
 wrote:
> +   /*
> +* It doesn't make sense to show this entry in the 
> output with a
> +* NULL server_name as it will be closed at the xact 
> end.
> +*/
> +   continue;
>
> -1 with this change because I still think that it's more useful to list
> all the open connections.

If postgres_fdw_get_connections doesn't have a "valid" column, then I
thought it's better not showing server_name NULL in the output. Do you
think that we need to output some fixed strings for such connections
like "" or "" or "" or ""? I'm not sure whether
we are allowed to have fixed strings as column output.

> This makes me think that more discussion would be necessary before
> changing the interface of postgres_fdw_get_connections(). On the other
> hand, we should address the issue ASAP to make the buildfarm member fine.
> So at first I'd like to push only the change of regression test.
> Patch attached. I tested it both with CLOBBER_CACHE_ALWAYS set and unset,
> and the results were stable.

Thanks, the postgres_fdw.patch looks good to me.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Save user's original authenticated identity for logging

2021-01-29 Thread Jacob Champion

On Fri, 2021-01-29 at 18:40 -0500, Tom Lane wrote:
> Ah.  So basically, this comes into play when you consider that some
> outside-the-database entity is your "real" authenticated identity.
> That seems reasonable when using Kerberos or the like, though it's
> not real meaningful for traditional password-type authentication.

Right.

> So, if we store this "real" identity, is there any security issue
> involved in exposing it to other users (via pg_stat_activity or
> whatever)?

I think that could be a concern for some, yeah. Besides being able to
get information on other logged-in users, the ability to connect an
authenticated identity to a username also gives you some insight into
the pg_hba configuration.

--Jacob

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Tom Lane

Peter Geoghegan  writes:
> On Fri, Jan 29, 2021 at 12:12 PM Tom Lane  wrote:
>> I got annoyed (not for the first time) by the fact that the
>> partitioned_rels field of AppendPath and MergeAppendPath is a list of
>> Relids, i.e., Bitmapsets.  This is problematic because a Bitmapset is
>> not a type of Node, and thus a List of them is really an invalid data
>> structure.  The main practical consequence is that pprint() fails to
>> print these path types accurately, which is an issue for debugging.

> So we don't actually require T_List-type Lists to only contain entries
> of type Node already?

I seem to recall that there are some places that use Lists to store
plain "char *" strings (not wrapped in T_String), and there are
definitely places that use lists of non-Node structs.  That's a kluge,
but I don't really object to it in narrowly-scoped data structures.
I think it's a good bit south of acceptable in anything declared in
include/nodes/*.h, though.  Publicly visible Node types ought to be
fully manipulable by the standard backend/nodes/ functions.

> ISTM that T_List-type Lists cannot *mostly* be a
> Node that consists of a collection of linked Nodes. It has to be
> all-or-nothing.

Right.  Any situation where you have a List of things that aren't
Nodes has to be one where you know a-priori that everything in this
List is a $whatever.  If the List is only used within a small bit
of code, that's fine, and adding the overhead to make the contents
be real Nodes wouldn't be worth the trouble.

> It's probably much riskier to use 32-bit x86 today than
> it is to use (say) POWER8, or some other contemporary minority
> platform.

We do still have x86 in the buildfarm, as well as some other
32-bit platforms, so I don't agree that it's that much less
tested than non-mainstream 64-bit platforms.  But I do agree
it's not our main development focus anymore, and shouldn't be.

regards, tom lane

Re: Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Peter Geoghegan

On Fri, Jan 29, 2021 at 12:12 PM Tom Lane  wrote:
> I got annoyed (not for the first time) by the fact that the
> partitioned_rels field of AppendPath and MergeAppendPath is a list of
> Relids, i.e., Bitmapsets.  This is problematic because a Bitmapset is
> not a type of Node, and thus a List of them is really an invalid data
> structure.  The main practical consequence is that pprint() fails to
> print these path types accurately, which is an issue for debugging.

So we don't actually require T_List-type Lists to only contain entries
of type Node already? ISTM that T_List-type Lists cannot *mostly* be a
Node that consists of a collection of linked Nodes. It has to be
all-or-nothing. The "Node-ness" of a List should never be situational
or implicit -- allowing that seems like a recipe for disaster. This
kind of "code reuse" is not a virtue at all.

If tightening things up here turns out to be a problem someplace, then
I'm okay with that code using some other solution. That could mean
expanding the definition of a Node in some way that was not originally
considered (when it nevertheless makes sense), or it could mean using
some other data structure instead.

Might be good to Assert() that this rule is followed in certain key
list.c functions.

> We've had some related problems before, so I'm wondering if it's time
> to bite the bullet and turn Bitmapsets into legal Nodes.  We'd have
> to add a nodetag field to them, which is free on 64-bit machines due
> to alignment considerations, but would increase the size of most
> Bitmapsets on 32-bit machines.  OTOH, I do not think we're optimizing
> for 32-bit machines anymore.

+1 from me.

I'm prepared to say that 32-bit performance shouldn't be a concern
these days, except perhaps with really significant regressions. And
even then, only when there is no clear upside. If anybody really does
run Postgres 14 on a 32-bit platform, they should be much more
concerned about bugs that slip in because nobody owns hardware like
that anymore. It's probably much riskier to use 32-bit x86 today than
it is to use (say) POWER8, or some other contemporary minority
platform.

-- 
Peter Geoghegan

Re: Proposal: Save user's original authenticated identity for logging

2021-01-29 Thread Tom Lane

Jacob Champion  writes:
> On Fri, 2021-01-29 at 17:30 -0500, Tom Lane wrote:
>> What happens if ALTER USER RENAME is done while the session is still
>> alive?

> IMO the authenticated identity should be write-once. Especially since
> one of my goals is to have greater auditability into events as they've
> actually happened. So ALTER USER RENAME should have no effect.

> This also doesn't really affect third-party auth methods. If I'm bound
> as pchamp...@example.com and a superuser changes my username to tlane,
> you _definitely_ don't want to see my authenticated identity change to 
> tl...@example.com. That's not who I am.

Ah.  So basically, this comes into play when you consider that some
outside-the-database entity is your "real" authenticated identity.
That seems reasonable when using Kerberos or the like, though it's
not real meaningful for traditional password-type authentication.
I'd misunderstood your point before.

So, if we store this "real" identity, is there any security issue
involved in exposing it to other users (via pg_stat_activity or
whatever)?

I remain concerned about the cost and inconvenience of exposing
it via log_line_prefix, but at least that shouldn't be visible
to anyone who's not entitled to know who's logged in ...

regards, tom lane

Re: Proposal: Save user's original authenticated identity for logging

2021-01-29 Thread Jacob Champion

On Fri, 2021-01-29 at 17:30 -0500, Tom Lane wrote:
> What happens if ALTER USER RENAME is done while the session is still
> alive?

IMO the authenticated identity should be write-once. Especially since
one of my goals is to have greater auditability into events as they've
actually happened. So ALTER USER RENAME should have no effect.

This also doesn't really affect third-party auth methods. If I'm bound
as pchamp...@example.com and a superuser changes my username to tlane,
you _definitely_ don't want to see my authenticated identity change to 
tl...@example.com. That's not who I am.

So the potential confusion would come into play with first-party authn.
From an audit perspective, I think it's worth it. I did authenticate as
pchampion, not tlane.

> More generally, exposing this in log_line_prefix seems like an awfully
> narrow-minded view of what people will want it for.  I'd personally
> think pg_stat_activity a better place to look, for example.
> [...]
> Yeah, this seems like about the most expensive way that we could possibly
> choose to make the info available.

I'm happy as long as it's _somewhere_. :D It's relatively easy to
expose a single location through multiple avenues, but currently there
is no single location.

--Jacob

Re: Proposal: Save user's original authenticated identity for logging

2021-01-29 Thread Jacob Champion

On Fri, 2021-01-29 at 17:01 -0500, Stephen Frost wrote:
> > - for LDAP, the bind DN is discarded entirely;
> 
> We don't support pg_ident.conf-style entries for LDAP, meaning that the
> user provided has to match what we check, so I'm not sure what would be
> improved with this change..?

For simple binds, this gives you almost nothing. For bind+search,
logging the actual bind DN is still important, in my opinion, since the
mechanism for determining it is more opaque (and may change over time).

But as Tom noted -- for both cases, if the role name changes, this
mechanism can still help you audit who the user _actually_ bound as,
not who you think they should have bound as based on their current role
name.

(There's also the fact that I think pg_ident mapping for LDAP would be
just as useful as it is for GSS or certs. That's for a different
conversation.)

> I'm also just generally not thrilled with
> putting much effort into LDAP as it's a demonstrably insecure
> authentication mechanism.

Because Postgres has to proxy the password? Or is there something else?

> > I propose that every auth method should store the string it uses to
> > identify a user -- what I'll call an "authenticated identity" -- into
> > one central location in Port, after authentication succeeds but before
> > any pg_ident authorization occurs. This field can then be exposed in
> > log_line_prefix. (It could additionally be exposed through a catalog
> > table or SQL function, if that were deemed useful.) This would let a
> > DBA more easily audit user activity when using more complicated
> > pg_ident setups.
> 
> This seems like it would be good to include the CSV format log files
> also.

Agreed in principle... Is the CSV format configurable? Forcing it into
CSV logs by default seems like it'd be a hard sell, especially for
people not using pg_ident.

> For some auth methods, eg: GSS, we've recently added information into
> the authentication method which logs what the authenticated identity
> was.  The advantage with that approach is that it avoids bloating the
> log by only logging that information once upon connection rather than
> on every log line...  I wonder if we should be focusing on a similar
> approach for other pg_ident.conf use-cases instead of having it via
> log_line_prefix, as the latter means we'd be logging the same value over
> and over again on every log line.

As long as the identity can be easily logged and reviewed by DBAs, I'm
happy.

--Jacob

Re: [sqlsmith] Failed assertion during partition pruning

2021-01-29 Thread Tom Lane

Zhihong Yu  writes:
> I wonder if the (failed) assertion should be converted to an if statement:

As I said, I'm now thinking it's not the Assert that's faulty.
If I'm right about that, it's likely that the mistaken labeling
of these paths has other consequences beyond triggering this
assertion.  (If it has none, I think we'd be better off to remove
these Path fields altogether, and re-identify the parent rels
here from the RelOptInfo data.)

regards, tom lane

Re: [sqlsmith] Failed assertion during partition pruning

2021-01-29 Thread Zhihong Yu

Hi,
I wonder if the (failed) assertion should be converted to an if statement:

diff --git a/src/backend/partitioning/partprune.c
b/src/backend/partitioning/partprune.c
index fac921eea5..d646f08a07 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -585,7 +585,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root,
RelOptInfo *parentrel,
  * partitioned tables that we have no sub-paths or
  * sub-PartitionedRelPruneInfo for.
  */
-Assert(!bms_is_empty(present_parts));
+if (bms_is_empty(present_parts)) return NIL;

 /* Record the maps and other information. */
 pinfo->present_parts = present_parts;

Cheers

On Fri, Jan 29, 2021 at 12:28 PM Tom Lane  wrote:

> I wrote:
> >> What it looks like to me is that the code for setting up run-time
> >> partition pruning has failed to consider the possibility of nested
> >> partitioning: it's expecting that every partitioned table will have
> >> at least one direct child that is a leaf.  I'm not sure though
> >> whether just the Assert is wrong, or there's more fundamental
> >> issues here.
>
> > After looking into the git history I realized that this assertion is
> > quite new, stemming from David's a929e17e5a8 of 2020-11-02.  So there's
> > something not right about that.
>
> I took some more time to poke at this today, and I now think that
> the assertion in make_partitionedrel_pruneinfo is probably OK,
> and what it's pointing out is a bug upstream in path creation.
> Specifically, I noted that in
>
> select a from trigger_parted where pg_trigger_depth() <> a order by a;
>
> we arrive at make_partitionedrel_pruneinfo with partrelids equal
> to (b 1 2), which seems to be correct.  The RTE list is
>
> RTE 1: trigger_parted
> RTE 2: trigger_parted_p1
> RTE 3: trigger_parted_p1_1
>
> Like so much else of the partitioning code, AppendPath.partitioned_rels
> is abysmally underdocumented, but what I think it means is the set of
> non-leaf partitioned tables that are notionally scanned by the
> AppendPath.  The only table directly mentioned by the AppendPath's
> subpath is RTE 3, so that all seems fine.
>
> However, upon adding a LIMIT:
>
> select a from trigger_parted where pg_trigger_depth() <> a order by a
> limit 40;
> server closed the connection unexpectedly
>
> we arrive at make_partitionedrel_pruneinfo with partrelids equal
> to just (b 1); trigger_parted_p1 has been left out.  The Path
> in this case has been made by generate_orderedappend_paths, which
> is what's responsible for computing AppendPath.partitioned_rels that
> eventually winds up as the argument to make_partitionedrel_pruneinfo.
> So I think that that code is somehow failing to account for nested
> partitioning, while the non-ordered-append code is doing it right.
> But I didn't spot exactly where the discrepancy is.
>
> regards, tom lane
>
>
>

Re: Key management with tests

2021-01-29 Thread Stephen Frost

Greetings,

* Masahiko Sawada (sawada.m...@gmail.com) wrote:
> On Fri, Jan 29, 2021 at 5:22 AM Bruce Momjian  wrote:
> > On Thu, Jan 28, 2021 at 02:41:09PM -0500, Tom Kincaid wrote:
> > > I would also like to add a "not wanted" entry for this feature on the
> > > TODO list, baaed on the feature's limited usefulness, but I already
> > > asked about that and no one seems to feel we don't want it.
> > >
> > >
> > > I want to avoid seeing this happen. As a result of a lot of customer and 
> > > user
> > > discussions, around their criteria for choosing a database, I believe TDE 
> > > is an
> > > important feature and having it appear with a "not-wanted" tag will keep 
> > > the
> > > version of PostgreSQL released by the community out of certain (and 
> > > possibly
> > > growing) number of deployment scenarios which I don't think anybody wants 
> > > to
> > > see.
> >
> > With pg_upgrade, I could work on it out of the tree until it became
> > popular, with a small non-user-visible part in the backend.  With the
> > Windows port, the port wasn't really visible to users until it we ready.
> >
> > For the key management part of TDE, it can't be done outside the tree,
> > and it is user-visible before it is useful, so that restricts how much
> > incremental work can be committed to the tree for TDE.  I highlighted
> > that concern emails months ago, but never got any feedback --- now it
> > seems people are realizing the ramifications of that.
> >
> > > I think the current situation to be as follows (if I missed something 
> > > please
> > > let me know):
> > >
> > > 1) We need to get the current patch for Key Management reviewed and tested
> > > further.
> > >
> > > I spoke to Bruce just now he will see if can get somebody to do this.
> >
> > Well, if we don't get anyone committed to working on the data encryption
> > part of TDE, the key management part is useless, so why review/test it
> > further?
> >
> > Although Sawada-san and Stephen Frost worked on the patch, they have not
> > commented much on my additions, and only a few others have commented on
> > the code, and there has been no discussion on who is working on the next
> > steps.  This indicates to me that there is little interest in moving
> > this feature forward,
> 
> TBH I’m confused a bit about the recent situation of this patch, but I
> can contribute to KMS work by discussing, writing, reviewing, and
> testing the patch. Also, I can work on the data encryption part of TDE
> (we need more discussion on that though). If the community concerns
> about the high-level design and thinks the design reviews by
> cryptography experts are still needed, we would need to do that first
> since the data encryption part of TDE depends on KMS. As far as I
> know, we have done that many times on pgsql-hackers, on offl-line and
> including the discussion on the past proposal, etc but given that the
> community still has a concern, it seems that we haven’t been able to
> share the details of the discussion enough that led to the design
> decision or the design is still not good. Honestly, I’m not sure how
> this feature can get consensus. But maybe we would need to have a
> break from refining the patch now and we need to marshal the
> discussions so far and the point behind the design so that everyone
> can understand why this feature is designed in that way. To do that,
> it might be a good start to sort the wiki page since it has data
> encryption part, KMS, and ToDo mixed.

I hope it's pretty clear that I'm also very much in support of both this
effort with the KMS and of TDE in general- TDE is specifically,
repeatedly, called out as a capability whose lack is blocking PG from
being able to be used for certain use-cases that it would otherwise be
well suited for, and that's really unfortunate.

I appreciate the recent discussion and reviews of the KMS in particular,
and of the patches which have been sent enabling TDE based on the KMS
patches.  Having them be relatively independent seems to be an ongoing
concern and perhaps we should figure out a way to more clearly put them
together.  That is- the KMS patches have been posted on one thread, and
TDE PoC patches which use the KMS patches have been on another thread,
leading some to not realize that there's already been TDE PoC work done
based on the KMS patches.  Seems like it might make sense to get one
patch set which goes all the way from the KMS and includes the TDE PoC,
even if they don't all go in at once.

I'm happy to go look over the KMS patches again if that'd be helpful and
to comment on the TDE PoC.  I can also spend some time trying to improve
on each, as I've already done.  A few of the larger concerns that I have
revolve around how to store integrity information (I've tried to find a
way to make room for such information in our existing page layout and,
perhaps unsuprisingly, it's far from trivial to do so in a way that will
avoid breaking the existing page layout, or where the same

Re: Proposal: Save user's original authenticated identity for logging

2021-01-29 Thread Tom Lane

Stephen Frost  writes:
> * Jacob Champion (pchamp...@vmware.com) wrote:
>> I propose that every auth method should store the string it uses to
>> identify a user -- what I'll call an "authenticated identity" -- into
>> one central location in Port, after authentication succeeds but before
>> any pg_ident authorization occurs. This field can then be exposed in
>> log_line_prefix. (It could additionally be exposed through a catalog
>> table or SQL function, if that were deemed useful.) This would let a
>> DBA more easily audit user activity when using more complicated
>> pg_ident setups.

> This seems like it would be good to include the CSV format log files
> also.

What happens if ALTER USER RENAME is done while the session is still
alive?

More generally, exposing this in log_line_prefix seems like an awfully
narrow-minded view of what people will want it for.  I'd personally
think pg_stat_activity a better place to look, for example.

> on every log line...  I wonder if we should be focusing on a similar
> approach for other pg_ident.conf use-cases instead of having it via
> log_line_prefix, as the latter means we'd be logging the same value over
> and over again on every log line.

Yeah, this seems like about the most expensive way that we could possibly
choose to make the info available.

regards, tom lane

Re: Proposal: Save user's original authenticated identity for logging

2021-01-29 Thread Stephen Frost

Greetings,

* Jacob Champion (pchamp...@vmware.com) wrote:
> First, the context: recently I've been digging into the use of third-
> party authentication systems with Postgres. One sticking point is the
> need to have a Postgres role corresponding to the third-party user
> identity, which becomes less manageable at scale. I've been trying to
> come up with ways to make that less painful, and to start peeling off
> smaller feature requests.

Yeah, it'd be nice to improve things in this area.

> = Problem =
> 
> For auth methods that allow pg_ident mapping, there's a way around the
> one-role-per-user problem, which is to have all users that match some
> pattern map to a single role. For Kerberos, you might specify that all
> user principals under @EXAMPLE.COM are allowed to connect as some
> generic user role, and that everyone matching */ad...@example.com is
> additionally allowed to connect as an admin role.
> 
> Unfortunately, once you've been assigned a role, Postgres either makes
> the original identity difficult to retrieve, or forgets who you were
> entirely:
> 
> - for GSS, the original principal is saved in the Port struct, and you
> need to either pull it out of pg_stat_gssapi, or enable log_connections
> and piece the log line together with later log entries;

This has been improved on of late, but it's been done piece-meal.

> - for LDAP, the bind DN is discarded entirely;

We don't support pg_ident.conf-style entries for LDAP, meaning that the
user provided has to match what we check, so I'm not sure what would be
improved with this change..?  I'm also just generally not thrilled with
putting much effort into LDAP as it's a demonstrably insecure
authentication mechanism.

> - for TLS client certs, the DN has to be pulled from pg_stat_ssl or the
> sslinfo extension (and it's truncated to 64 characters, so good luck if
> you have a particularly verbose PKI tree);

Yeah, it'd be nice to improve on this.

> - for peer auth, the username of the peereid is discarded;

Would be good to improve this too.

> = Proposal =
> 
> I propose that every auth method should store the string it uses to
> identify a user -- what I'll call an "authenticated identity" -- into
> one central location in Port, after authentication succeeds but before
> any pg_ident authorization occurs. This field can then be exposed in
> log_line_prefix. (It could additionally be exposed through a catalog
> table or SQL function, if that were deemed useful.) This would let a
> DBA more easily audit user activity when using more complicated
> pg_ident setups.

This seems like it would be good to include the CSV format log files
also.

> Would this be generally useful for those of you using pg_ident in
> production? Have I missed something that already provides this
> functionality?

For some auth methods, eg: GSS, we've recently added information into
the authentication method which logs what the authenticated identity
was.  The advantage with that approach is that it avoids bloating the
log by only logging that information once upon connection rather than
on every log line...  I wonder if we should be focusing on a similar
approach for other pg_ident.conf use-cases instead of having it via
log_line_prefix, as the latter means we'd be logging the same value over
and over again on every log line.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Assertion fail with window function and partitioned tables

2021-01-29 Thread Jaime Casanova

On Thu, Jan 28, 2021 at 9:45 PM Jaime Casanova
 wrote:
>
> Hi,
>
> Just found another crash.
>
> Seems that commit a929e17e5a8c9b751b66002c8a89fdebdacfe194 broke something.
> Attached is a minimal case and the stack trace.
>

Hi,

Seems this is the same that Andreas reported in
https://www.postgresql.org/message-id/87sg8tqhsl@aurora.ydns.eu so
consider this one as noise


--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



--

Re: [sqlsmith] Failed assertion during partition pruning

2021-01-29 Thread Tom Lane

I wrote:
>> What it looks like to me is that the code for setting up run-time
>> partition pruning has failed to consider the possibility of nested
>> partitioning: it's expecting that every partitioned table will have
>> at least one direct child that is a leaf.  I'm not sure though
>> whether just the Assert is wrong, or there's more fundamental
>> issues here.

> After looking into the git history I realized that this assertion is
> quite new, stemming from David's a929e17e5a8 of 2020-11-02.  So there's
> something not right about that.

I took some more time to poke at this today, and I now think that
the assertion in make_partitionedrel_pruneinfo is probably OK,
and what it's pointing out is a bug upstream in path creation.
Specifically, I noted that in

select a from trigger_parted where pg_trigger_depth() <> a order by a;

we arrive at make_partitionedrel_pruneinfo with partrelids equal
to (b 1 2), which seems to be correct.  The RTE list is

RTE 1: trigger_parted
RTE 2: trigger_parted_p1
RTE 3: trigger_parted_p1_1

Like so much else of the partitioning code, AppendPath.partitioned_rels
is abysmally underdocumented, but what I think it means is the set of
non-leaf partitioned tables that are notionally scanned by the
AppendPath.  The only table directly mentioned by the AppendPath's
subpath is RTE 3, so that all seems fine.

However, upon adding a LIMIT:

select a from trigger_parted where pg_trigger_depth() <> a order by a limit 40;
server closed the connection unexpectedly

we arrive at make_partitionedrel_pruneinfo with partrelids equal
to just (b 1); trigger_parted_p1 has been left out.  The Path
in this case has been made by generate_orderedappend_paths, which
is what's responsible for computing AppendPath.partitioned_rels that
eventually winds up as the argument to make_partitionedrel_pruneinfo.
So I think that that code is somehow failing to account for nested
partitioning, while the non-ordered-append code is doing it right.
But I didn't spot exactly where the discrepancy is.

regards, tom lane

Should we make Bitmapsets a kind of Node?

2021-01-29 Thread Tom Lane

I got annoyed (not for the first time) by the fact that the
partitioned_rels field of AppendPath and MergeAppendPath is a list of
Relids, i.e., Bitmapsets.  This is problematic because a Bitmapset is
not a type of Node, and thus a List of them is really an invalid data
structure.  The main practical consequence is that pprint() fails to
print these path types accurately, which is an issue for debugging.

We've had some related problems before, so I'm wondering if it's time
to bite the bullet and turn Bitmapsets into legal Nodes.  We'd have
to add a nodetag field to them, which is free on 64-bit machines due
to alignment considerations, but would increase the size of most
Bitmapsets on 32-bit machines.  OTOH, I do not think we're optimizing
for 32-bit machines anymore.

Another issue is that the outfuncs/readfuncs print representation
currently looks like "(b 1 2 ...)" which isn't a normal
representation for a Node.  I'd be inclined to try to preserve that
representation, because I think we'd have to special-case Bitmapsets
anyway given their variable number of unnamed entries.  But I've not
tried to actually code anything for it.

Thoughts?

regards, tom lane

Re: Support for NSS as a libpq TLS backend

2021-01-29 Thread Jacob Champion

On Fri, 2021-01-29 at 13:57 +0100, Daniel Gustafsson wrote:
> > On 21 Jan 2021, at 06:21, Michael Paquier  wrote:
> > I really need to study more the choide of the options chosen for
> > NSS_InitContext()...  But based on the docs I can read on the matter I
> > think that saving nsscontext in pg_cryptohash_ctx is right for each
> > cryptohash built.
> 
> It's a safe but slow option, NSS wasn't really made for running a single 
> crypto
> operation.  Since we are opening a context which isn't backed by an NSS
> database we could have a static context, which indeed speeds up processing a
> lot.  The problem with that is that there is no good callsite for closing the
> context as the backend is closing down.  Since you are kneedeep in the
> cryptohash code, do you have any thoughts on this?  I've included 0008 which
> implements this, with a commented out dummy stub for cleaning up.
> 
> Making nss_context static in cryptohash_nss.c is
> appealing but there is no good option for closing it there.  Any thoughts on
> how to handle global contexts like this?

I'm completely new to this code, so take my thoughts with a grain of
salt...

I think the bad news is that the static approach will need support for
ENABLE_THREAD_SAFETY. (It looks like the NSS implementation of
pgtls_close() needs some thread support too?)

The good(?) news is that I don't understand why OpenSSL's
implementation of cryptohash doesn't _also_ need the thread-safety
code. (Shouldn't we need to call CRYPTO_set_locking_callback() et al
before using any of its cryptohash implementation?) So maybe we can
implement the same global setup/teardown API for OpenSSL too and not
have to one-off it for NSS...

--Jacob

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-29 Thread Fujii Masao




On 2021/01/29 19:45, Bharath Rupireddy wrote:

On Fri, Jan 29, 2021 at 1:24 PM Bharath Rupireddy
 wrote:


On Fri, Jan 29, 2021 at 1:17 PM Fujii Masao  wrote:

But if the issue is only the inconsistency of test results,
we can go with the option (2)? Even with (2), we can make the test
stable by removing "valid" column and executing
postgres_fdw_get_connections() within the transaction?


Hmmm, and we should have the tests at the start of the file
postgres_fdw.sql before even we make any foreign server connections.


We don't need to move the test if we always call postgres_fdw_disconnect_all() 
just before starting new transaction and calling postgres_fdw_get_connections() 
as follows?

SELECT 1 FROM postgres_fdw_disconnect_all();
BEGIN;
...
SELECT * FROM postgres_fdw_get_connections();
...


Yes, that works, but we cannot show true/false for the
postgres_fdw_disconnect_all output.

I will post the patch soon. Thanks a lot.


Attaching a patch that has following changes: 1) Now,
postgres_fdw_get_connections will only return set of active
connections server names not their valid state 2) The functions
postgres_fdw_get_connections, postgres_fdw_disconnect and
postgres_fdw_disconnect_all are now being tested within an explicit
xact block, this way the tests are more stable even with clobber cache
always builds.

I tested the patch here on my development system with
-DCLOBBER_CACHE_ALWAYS configuration, the tests look consistent.

Please review the patch.


Thanks for the patch!

--- Return false as loopback2 connectin is closed already.
-SELECT postgres_fdw_disconnect('loopback2');
- postgres_fdw_disconnect
--
- f
-(1 row)
-
--- Return an error as there is no foreign server with given name.
-SELECT postgres_fdw_disconnect('unknownserver');
-ERROR:  server "unknownserver" does not exist

Why do we need to remove these? These seem to work fine even in
CLOBBER_CACHE_ALWAYS.

+   /*
+* It doesn't make sense to show this entry in the 
output with a
+* NULL server_name as it will be closed at the xact 
end.
+*/
+   continue;

-1 with this change because I still think that it's more useful to list
all the open connections.

This makes me think that more discussion would be necessary before
changing the interface of postgres_fdw_get_connections(). On the other
hand, we should address the issue ASAP to make the buildfarm member fine.
So at first I'd like to push only the change of regression test.
Patch attached. I tested it both with CLOBBER_CACHE_ALWAYS set and unset,
and the results were stable.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 07e06e5bf7..b09dce63f5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -17,10 +17,6 @@ DO $d$
 OPTIONS (dbname '$$||current_database()||$$',
  port '$$||current_setting('port')||$$'
 )$$;
-EXECUTE $$CREATE SERVER loopback4 FOREIGN DATA WRAPPER postgres_fdw
-OPTIONS (dbname '$$||current_database()||$$',
- port '$$||current_setting('port')||$$'
-)$$;
 END;
 $d$;
 CREATE USER MAPPING FOR public SERVER testserver1
@@ -28,7 +24,6 @@ CREATE USER MAPPING FOR public SERVER testserver1
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
 CREATE USER MAPPING FOR CURRENT_USER SERVER loopback2;
 CREATE USER MAPPING FOR public SERVER loopback3;
-CREATE USER MAPPING FOR public SERVER loopback4;
 -- ===
 -- create objects used through FDW loopback server
 -- ===
@@ -144,11 +139,6 @@ CREATE FOREIGN TABLE ft7 (
c2 int NOT NULL,
c3 text
 ) SERVER loopback3 OPTIONS (schema_name 'S 1', table_name 'T 4');
-CREATE FOREIGN TABLE ft8 (
-   c1 int NOT NULL,
-   c2 int NOT NULL,
-   c3 text
-) SERVER loopback4 OPTIONS (schema_name 'S 1', table_name 'T 4');
 -- ===
 -- tests for validator
 -- ===
@@ -220,8 +210,7 @@ ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS 
(column_name 'C 1');
  public | ft5   | loopback  | (schema_name 'S 1', table_name 'T 4') | 
  public | ft6   | loopback2 | (schema_name 'S 1', table_name 'T 4') | 
  public | ft7   | loopback3 | (schema_name 'S 1', table_name 'T 4') | 
- public | ft8   | loopback4 | (schema_name 'S 1', table_name 'T 4') | 
-(7 rows)
+(6 rows)
 
 -- Test that alteration of server options causes reconnection
 -- Remote's errors might be non-English, so hide them to ensure stable results
@@ -

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-29 Thread Alexey Kondratov


On 2021-01-28 14:42, Alexey Kondratov wrote:

On 2021-01-28 00:36, Alvaro Herrera wrote:



I didn't look at the patch closely enough to understand why you're
trying to do something like CLUSTER, VACUUM FULL or REINDEX without
holding full AccessExclusiveLock on the relation.  But do keep in mind
that once you hold a lock on a relation, trying to grab a weaker lock
afterwards is pretty pointless.



No, you are right, we are doing REINDEX with AccessExclusiveLock as it
was before. This part is a more specific one. It only applies to
partitioned indexes, which do not hold any data, so we do not reindex
them directly, only their leafs. However, if we are doing a TABLESPACE
change, we have to record it in their pg_class entry, so all future
leaf partitions were created in the proper tablespace.

That way, we open partitioned index relation only for a reference,
i.e. read-only, but modify pg_class entry under a proper lock
(RowExclusiveLock). That's why I thought that ShareLock will be
enough.

IIUC, 'ALTER TABLE ... SET TABLESPACE' uses AccessExclusiveLock even
for relations with no storage, since AlterTableGetLockLevel() chooses
it if AT_SetTableSpace is met. This is very similar to our case, so
probably we should do the same?

Actually it is not completely clear for me why
ShareUpdateExclusiveLock is sufficient for newly added
SetRelationTableSpace() as Michael wrote in the comment.



Changed patch to use AccessExclusiveLock in this part for now. This is 
what 'ALTER TABLE/INDEX ... SET TABLESPACE' and 'REINDEX' usually do. 
Anyway, all real leaf partitions are processed in the independent 
transactions later.


Also changed some doc/comment parts Justin pointed me to.

+  then all "mapped" and system relations will be skipped and a 
single
+  WARNING will be generated. Indexes on TOAST 
tables

+  are reindexed, but not moved the new tablespace.


moved *to* the new tablespace.



Fixed.



I don't know if that needs to be said at all.  We talked about it a lot 
to
arrive at the current behavior, but I think that's only due to the 
difficulty

of correcting the initial mistake.



I do not think that it will be a big deal to move indexes on TOAST 
tables as well. I just thought that since 'ALTER TABLE/INDEX ... SET 
TABLESPACE' only moves them together with host table, we also should not 
do that. Yet, I am ready to change this logic if requested.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres CompanyFrom 6e9db8d362e794edf421733bc7cade38c917bff4 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Wed, 27 Jan 2021 00:46:17 +0300
Subject: [PATCH v9] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml |  31 +++-
 src/backend/catalog/index.c   |  47 +-
 src/backend/commands/indexcmds.c  | 112 -
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/index.h   |   9 +-
 src/test/regress/input/tablespace.source  | 106 +
 src/test/regress/output/tablespace.source | 181 ++
 7 files changed, 478 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..2b39699d42 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
+TABLESPACE new_tablespace
 
  
 
@@ -187,6 +188,21 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  Specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" or (unless allow_system_table_mods)
+  system relations. If SCHEMA,
+  DATABASE or SYSTEM are specified,
+  then all "mapped" and system relations will be skipped and a single
+  WARNING will be generated. Indexes on TOAST tables
+  are reindexed, but not moved to the new tablespace.
+ 
+
+   
+

 VERBOSE
 
@@ -210,6 +226,14 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+new_tablespace
+
+ 
+  The tablespace where indexes will be rebuilt.
+ 
+
+   
   
  
 
@@ -292,7 +316,12 @@ REINDEX [ ( option [, ...] ) ] { IN
with REINDEX INDEX or REINDEX TABLE,
respectively. Each partition of the specified partitioned relation is
reindexed in a separate transaction. Those commands cannot be used inside
-   a transaction block when working on a partitioned table or index.
+   a transaction block when working on a partitioned table or index. If
+   a REINDEX command fails when run on a partitioned
+   relation, and TABLESPACE was specified, then it may have
+   moved indexes on some partitions to the new tablespace.  Re-running the command
+   will reindex all

Re: Dumping/restoring fails on inherited generated column

2021-01-29 Thread Tom Lane

Peter Eisentraut  writes:
> I've had another go at this, and I've found a solution that appears to 
> address all the issues I'm aware of.  It's all very similar to the 
> previously discussed patches.  The main difference is that previous 
> patches had attempted to use something like tbinfo->attislocal to 
> determine whether a column was inherited, but that's not correct.  This 
> patch uses the existing logic in flagInhAttrs() to find whether there is 
> a matching parent column with a generation expression.  I've added 
> pg_dump test cases here to check the different variations that the code 
> addresses.

This is a clear improvement on the current situation, and given that
this issue is over a year old, I think we should push and back-patch
this in time for February's releases.

However ... this doesn't solve all the cases noted in this thread.
In the first example I gave at [1],

d3=# create table parent (f1 int default 2);
CREATE TABLE
d3=# create table child (f1 int default 3) inherits(parent);
NOTICE:  merging column "f1" with inherited definition
CREATE TABLE
d3=# create table child2() inherits(parent);
CREATE TABLE
d3=# alter table child2 alter column f1 set default 42;
ALTER TABLE

pg_dump still fails to restore child2.f1's non-inherited default.
That's probably a pre-existing problem, since it doesn't involve
GENERATED at all, but we shouldn't forget about it.

Also, in the example from [2],

d3=# create table pp1 (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE
d3=# create table cc1 (a int, b int GENERATED ALWAYS AS (a * 3) STORED);
CREATE TABLE
d3=# alter table cc1 inherit pp1;
ALTER TABLE

pg_dump now omits to dump cc1's generation expression, which seems
strictly worse than before.  Admittedly, the backend likely ought to
be rejecting this scenario, but it doesn't do so today.

Neither of these points seem like a reason to reject this patch,
they're just adjacent work that remains to be done.

regards, tom lane

[1] https://www.postgresql.org/message-id/660925.1601397436%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/661371.1601398006%40sss.pgh.pa.us

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-29 Thread Alexander Korotkov

On Thu, Jan 21, 2021 at 11:14 PM Pavel Stehule  wrote:
>> Looks good, I've applied it, thanks.
>
> I tested last set of patches
>
> 1. There is no problem with patching and compilation
> 2. make check-world passed
> 3. build doc without problems
> 4. I have not any objections against implemented functionality, 
> implementation and tests
>
> I'll mark this patch as ready for committers
>
> Thank you for your work. It will be nice feature

I've skimmed through the thread, it seems that consensus over
functionality is reached.  Patchset themself looks good for me.  I'm
going to push this if no objections.

--
Regards,
Alexander Korotkov

Re: LogwrtResult contended spinlock

2021-01-29 Thread Alvaro Herrera

On 2020-Aug-31, Andres Freund wrote:

> Wouldn't the better fix here be to allow reading of individual members 
> without a lock? E.g. by wrapping each in a 64bit atomic.

So I've been playing with this and I'm annoyed about having two
datatypes to represent Write/Flush positions:

typedef struct XLogwrtRqst
{
XLogRecPtr  Write;  /* last byte + 1 to write out */
XLogRecPtr  Flush;  /* last byte + 1 to flush */
} XLogwrtRqst;

typedef struct XLogwrtResult
{
XLogRecPtr  Write;  /* last byte + 1 written out */
XLogRecPtr  Flush;  /* last byte + 1 flushed */
} XLogwrtResult;

Don't they look, um, quite similar?  I am strongly tempted to remove
that distinction, since it seems quite pointless, and introduce a
different one:

typedef struct XLogwrtAtomic
{
pg_atomic_uint64Write;  /* last byte + 1 of write 
position */
pg_atomic_uint64Flush;  /* last byte + 1 of flush 
position */
} XLogwrtAtomic;

this one, with atomics, would be used for the XLogCtl struct members
LogwrtRqst and LogwrtResult, and are always accessed using atomic ops.
On the other hand we would have

typedef struct XLogwrt
{
XLogRecPtr  Write;  /* last byte + 1 of write position */
XLogRecPtr  Flush;  /* last byte + 1 of flush position */
} XLogwrt;

to be used for the local-memory-only LogwrtResult, using normal
assignment.

Now, I do wonder if there's a point in keeping LogwrtResult as a local
variable at all; maybe since the ones in shared memory are going to use
unlocked access, we don't need it anymore?  I'd prefer to defer that
decision to after this patch is done, since ISTM that it'd merit more
careful benchmarking.

Thoughts?

-- 
Álvaro Herrera   Valdivia, Chile

Re: Allow matching whole DN from a client certificate

2021-01-29 Thread Andrew Dunstan



On 1/28/21 5:10 PM, Andrew Dunstan wrote:
>
>> (I'd still recommend switching to use the RFC
>> flag to OpenSSL, to ease future improvements.) There should be a bunch
>> of warning documentation saying not to do anything more complex unless
>> you're an expert, and that the exact regular expression needed may
>> change depending on the TLS backend.
>
> I'll play around with the RFC flag.
>
>
>> If you want to add UTF-8 support to the above, so that things outside
>> ASCII can be matched nicely, then the ASN1_STRFLGS_ESC_MSB flag should
>> be removed from the _print_ex() call per OpenSSL documentation, and we
>> should investigate how it plays with other non-UTF-8 database
>> encodings. That seems like work but not a huge amount of it.
>
> How about if we remove ASN1_STRFLGS_ESC_MSB when the server encoding is
> UTF8? That should be an extremely simple change.
>
>
>

Of course, we don't have any idea what the database is at this stage, so
that wasn't well thought out.


I'm inclined at this stage just to turn on RFC2253. If someone wants to
deal with UTF8 (or other encodings) at a later stage they can.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Is Recovery actually paused?

2021-01-29 Thread Yugo NAGATA

On Fri, 29 Jan 2021 16:33:32 +0530
Dilip Kumar  wrote:

> On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
> >
> > On Thu, 28 Jan 2021 09:55:42 +0530
> > Dilip Kumar  wrote:
> >
> > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > >  wrote:
> > > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when 
> > > > > > > > > "is" exists
> > > > > > > > > in a function, I expect a boolean output. Others may have 
> > > > > > > > > better
> > > > > > > > > thoughts.
> > > > > > > >
> > > > > > > > Maybe we should leave the existing function 
> > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > alone and add a new one with the name you suggest that returns 
> > > > > > > > text.
> > > > > > > > That would create less burden for tool authors.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > >
> > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > >
> > > > > This means pg_is_wal_replay_paused is left without any change and this
> > > > > returns whether pause is requested or not? If so, it seems good to 
> > > > > modify
> > > > > the documentation of this function in order to note that this could 
> > > > > not
> > > > > return the actual pause state.
> > > >
> > > > Yes, we can say that it will return true if the replay pause is
> > > > requested.  I am changing that in my new patch.
> > >
> > > I have modified the patch, changes
> > >
> > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > the pause request state
> > > - Now, we are not waiting for the recovery to actually get paused so I
> > > think it doesn't make sense to put a lot of checkpoints to check the
> > > pause requested so I have removed that check from the
> > > recoveryApplyDelay but I think it better we still keep that check in
> > > the WaitForWalToBecomeAvailable because it can wait forever before the
> > > next wal get available.
> >
> > I think basically the check in WaitForWalToBecomeAvailable is independent
> > of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> > actual pause state.  This function could just return 'pause requested'
> > if a pause is requested during waiting for WAL.
> >
> > However, I agree the change to allow recovery to transit the state to
> > 'paused' during WAL waiting because 'paused' has more useful information
> > for users than 'pause requested'.  Returning 'paused' lets users know
> > clearly that no more WAL are applied until recovery is resumed.  On the
> > other hand, when 'pause requested' is returned, user can't say whether
> > the next WAL wiill be applied or not from this information.
> >
> > For the same reason, I think it is also useful to call recoveryPausesHere
> > in recoveryApplyDelay.
> 
> IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> available so it can not be controlled by user so it is good to put a
> check for the recovery pause,  however recoveryApplyDelay wait for the
> apply delay which is configured by user and it is predictable value by
> the user.  I don't have much objection to putting that check in the
> recoveryApplyDelay as well but I feel it is not necessary.  Any other
> thoughts on this?

I'm not sure if the user can figure out easily that the reason why
pg_get_wal_replay_pause_state returns 'pause requested' is due to
recovery_min_apply_delay because it would needs knowledge of the
internal mechanism of recovery.  However, if there are not any other
opinions of it, I don't care that recoveryApplyDelay is left as is
because such check and state transition is independent of the goal of
pg_get_wal_replay_pause_state itself as I mentioned above.

Regards,
Yugo Nagata

-- 
Yugo NAGATA

Re: Allow matching whole DN from a client certificate

2021-01-29 Thread Andrew Dunstan



On 1/29/21 8:18 AM, Daniel Gustafsson wrote:
>> On 28 Jan 2021, at 23:10, Andrew Dunstan  wrote:
>> On 1/28/21 11:39 AM, Jacob Champion wrote:
>>> Unfortunately I don't really know what that solution should look like.
>>> A DSL for filtering on RDNs would be a lot of work, but it could
>>> potentially allow LDAP to be mapped through pg_ident as well
>> In the end it will be up to users to come up with expressions that meet
>> their usage. Yes they could get it wrong, but then they can get so many
>> things wrong ;-)
> My main concern with this isn't that it's easy to get it wrong, but that it 
> may
> end up being hard to get it right (with false positives in the auth path as a
> result). Right now I'm not sure where it leans.
>
> Maybe it will be easier to judge the proposal when the documentation has been
> updated warnings for the potential pitfalls?
>

Feel free to make suggestions for wording :-)


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Phrase search vs. multi-lexeme tokens

2021-01-29 Thread Alexander Korotkov

On Tue, Jan 26, 2021 at 4:31 AM Neil Chen  wrote:
> On Mon, Jan 25, 2021 at 11:25 PM Alexander Korotkov  
> wrote:
>>
>>
>> BTW, you mentioned you read the documentation.  Do you think it needs
>> to be adjusted accordingly to the patch?
>>
>
> Yes, I checked section 8.11, section 9.13 and Chapter 12 of the document. The 
> change of this patch did not conflict with the document, because it was not 
> mentioned in the document at all. We can simply not modify it, or we can 
> supplement these situations.

I've checked the docs myself and I think you're right (despite that's
surprising for me).  It seems that this patch just changes
undocumented aspects of full-text search to be more consistent and
intuitive.

The revised patch is attached.  This revision adds just comment and
commit message.  I'm going to push this if no objections.

--
Regards,
Alexander Korotkov

tsquery_phrase_fix_v2.patch
Description: Binary data

Re: Printing backtrace of postgres processes

2021-01-29 Thread vignesh C

Thanks Bharath for your review comments. Please find my comments inline below.

On Thu, Jan 28, 2021 at 7:40 PM Bharath Rupireddy
 wrote:
>
> On Thu, Jan 28, 2021 at 5:22 PM vignesh C  wrote:
> > Thanks for the comments, I have fixed and attached an updated patch
> > with the fixes for the same.
> > Thoughts?
>
> Thanks for the patch. Here are few comments:
>
> 1) I think it's return SIGNAL_BACKEND_SUCCESS; instead of return 0; in
> check_valid_pid?
>

I did not want to use SIGNAL_BACKEND_SUCCESS as we have not yet
signalled the backend process at this time. I have added
BACKEND_VALIDATION_SUCCESS macro and used it here for better
readability.

> 2) How about following in pg_signal_backend for more readability
> +if (ret != SIGNAL_BACKEND_SUCCESS)
> +return ret;
> instead of
> +if (ret)
> +return ret;
>

Modified it to ret != BACKEND_VALIDATION_SUCCESS

> 3) How about validate_backend_pid or some better name instead of
> check_valid_pid?
>

Modified it to validate_backend_pid

> 4) How about following
> + errmsg("must be a superuser to print backtrace
> of backend process")));
> instead of
> + errmsg("must be a superuser to print backtrace
> of superuser query process")));
>

Here the message should include superuser, we cannot remove it. Non
super user can log non super user provided if user has permissions for
it.

> 5) How about following
>  errmsg("must be a member of the role whose backed
> process's backtrace is being printed or member of
> pg_signal_backend")));
> instead of
> + errmsg("must be a member of the role whose
> backtrace is being logged or member of pg_signal_backend")));
>

Modified it.

> 6) I'm not sure whether "backtrace" or "call stack" is a generic term
> from the user/developer perspective. In the patch, the function name
> and documentation says callstack(I think it is "call stack" actually),
> but the error/warning messages says backtrace. IMHO, having
> "backtrace" everywhere in the patch, even the function name changed to
> pg_print_backtrace, looks better and consistent. Thoughts?
>

Modified it to pg_print_backtrace.

> 7) How about following in pg_print_callstack?
> {
> intbt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
> boolresult = false;
>
> if (r == SIGNAL_BACKEND_SUCCESS)
> {
> if (EmitProcSignalPrintCallStack(bt_pid))
> result = true;
> else
> ereport(WARNING,
> (errmsg("failed to send signal to postmaster: %m")));
> }
>
> PG_RETURN_BOOL(result);
> }
>

Modified similarly with slight change.

> 8) How about following
> +(errmsg("backtrace generation is not supported by
> this PostgresSQL installation")));
> instead of
> +(errmsg("backtrace generation is not supported by
> this installation")));
>

I used the existing message to maintain consistency with
set_backtrace. I feel we can keep it the same.

> 9) Typo - it's "example" +2) Using "addr2line -e postgres address", For 
> exmple:
>

Modified it.

> 10) How about
> + * Handle print backtrace signal
> instead of
> + * Handle receipt of an print backtrace.
>

I used the existing message to maintain consistency similar to
HandleProcSignalBarrierInterrupt. I feel we can keep it the same.

> 11) Isn't below in documentation specific to Linux platform. What
> happens if GDB is not there on the platform?
> +
> +1)  "info line *address" from gdb on postgres executable. For example:
> +gdb ./postgres
> +GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>

I have made changes "You can get the file name and line number by
using gdb/addr2line in linux platforms, as a prerequisite users must
ensure gdb/addr2line is already installed".

User will get an error like this in windows:
select pg_print_backtrace(pg_backend_pid());
WARNING:  backtrace generation is not supported by this installation
 pg_print_callstack

 f
(1 row)

The backtrace will not be logged in case of windows, it will throw a
warning "backtrace generation is not supported by this installation"
Thoughts?

> 12) +The callstack will be logged in the log file. What happens if the
> server is started without a log file , ./pg_ctl -D data start? Where
> will the backtrace go?
>

Updated to: The backtrace will be logged to the log file if logging is
enabled, if logging is disabled backtrace will be logged to the
console where the postmaster was started.

> 13) Not sure, if it's an overkill, but how about pg_print_callstack
> returning a warning/notice along with true, which just says, "See
> <<>>". Thoughts?

As you rightly pointed out it will be an overkill, I feel the existing
is easily understandable.

Attached v5 patch has the fixes for the same.
Thoughts?

Regards,
Vignesh
From 13564e153f4154d90a0b53e2b77c3e8d2df97544 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date:

Re: Allow matching whole DN from a client certificate

2021-01-29 Thread Daniel Gustafsson

> On 28 Jan 2021, at 23:10, Andrew Dunstan  wrote:
> On 1/28/21 11:39 AM, Jacob Champion wrote:
>> 
>> Unfortunately I don't really know what that solution should look like.
>> A DSL for filtering on RDNs would be a lot of work, but it could
>> potentially allow LDAP to be mapped through pg_ident as well
> 
> In the end it will be up to users to come up with expressions that meet
> their usage. Yes they could get it wrong, but then they can get so many
> things wrong ;-)

My main concern with this isn't that it's easy to get it wrong, but that it may
end up being hard to get it right (with false positives in the auth path as a
result). Right now I'm not sure where it leans.

Maybe it will be easier to judge the proposal when the documentation has been
updated warnings for the potential pitfalls?

--
Daniel Gustafsson   https://vmware.com/

Re: Dumping/restoring fails on inherited generated column

2021-01-29 Thread Peter Eisentraut

I've had another go at this, and I've found a solution that appears to 
address all the issues I'm aware of.  It's all very similar to the 
previously discussed patches.  The main difference is that previous 
patches had attempted to use something like tbinfo->attislocal to 
determine whether a column was inherited, but that's not correct.  This 
patch uses the existing logic in flagInhAttrs() to find whether there is 
a matching parent column with a generation expression.  I've added 
pg_dump test cases here to check the different variations that the code 
addresses.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/
From 6cace6e6e7935844272fb201fc0096d8f2381d00 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 29 Jan 2021 13:20:41 +0100
Subject: [PATCH v4] pg_dump: Fix dumping of inherited generated columns

Generation expressions of generated columns are always inherited, so
there is no need to set them separately in child tables, and there is
no syntax to do so either.  The code previously used the code paths
for the handling of default values, for which different rules apply;
in particular it might want to set a default value explicitly for an
inherited column.  This resulted in unrestorable dumps.  For generated
columns, just skip them in inherited tables.

Discussion: 
https://www.postgresql.org/message-id/flat/15830.1575468847%40sss.pgh.pa.us
---
 src/bin/pg_dump/common.c | 31 -
 src/bin/pg_dump/pg_dump.c| 37 +
 src/bin/pg_dump/t/002_pg_dump.pl | 46 
 3 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index b0f02bc1f6..1a261a5545 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -467,12 +467,22 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int 
numTables)
 /* flagInhAttrs -
  *  for each dumpable table in tblinfo, flag its inherited attributes
  *
- * What we need to do here is detect child columns that inherit NOT NULL
- * bits from their parents (so that we needn't specify that again for the
- * child) and child columns that have DEFAULT NULL when their parents had
- * some non-null default.  In the latter case, we make up a dummy AttrDefInfo
- * object so that we'll correctly emit the necessary DEFAULT NULL clause;
- * otherwise the backend will apply an inherited default to the column.
+ * What we need to do here is:
+ *
+ * - Detect child columns that inherit NOT NULL bits from their parents, so
+ *   that we needn't specify that again for the child.
+ *
+ * - Detect child columns that have DEFAULT NULL when their parents had some
+ *   non-null default.  In this case, we make up a dummy AttrDefInfo object so
+ *   that we'll correctly emit the necessary DEFAULT NULL clause; otherwise
+ *   the backend will apply an inherited default to the column.
+ *
+ * - Detect child columns that have a generation expression when their parents
+ *   also have one.  Generation expressions are always inherited, so there is
+ *   no need to set them again in child tables, and there is no syntax for it
+ *   either.  (Exception: In binary upgrade mode we dump them because
+ *   inherited tables are recreated standalone first and then reattached to
+ *   the parent.)
  *
  * modifies tblinfo
  */
@@ -510,6 +520,7 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int 
numTables)
{
boolfoundNotNull;   /* Attr was NOT NULL in 
a parent */
boolfoundDefault;   /* Found a default in a 
parent */
+   boolfoundGenerated; /* Found a generated in 
a parent */
 
/* no point in examining dropped columns */
if (tbinfo->attisdropped[j])
@@ -517,6 +528,7 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int 
numTables)
 
foundNotNull = false;
foundDefault = false;
+   foundGenerated = false;
for (k = 0; k < numParents; k++)
{
TableInfo  *parent = parents[k];
@@ -528,7 +540,8 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int 
numTables)
if (inhAttrInd >= 0)
{
foundNotNull |= 
parent->notnull[inhAttrInd];
-   foundDefault |= 
(parent->attrdefs[inhAttrInd] != NULL);
+   foundDefault |= 
(parent->attrdefs[inhAttrInd] != NULL && !parent->attgenerated[inhAttrInd]);
+   foundGenerated |= 
parent->attgenerated[inhAttrInd];
}
}
 
@@ -570,6 +583,10 @@ flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int 
numTables)

Re: Support for NSS as a libpq TLS backend

2021-01-29 Thread Daniel Gustafsson

> On 29 Jan 2021, at 07:01, Michael Paquier  wrote:
> 
> On Fri, Jan 29, 2021 at 12:20:21AM +0100, Daniel Gustafsson wrote:
>> SSL is admittedly an obsolete technical term, but it's one that enough people
>> have decided is interchangeable with TLS that it's not a hill worth dying on
>> IMHO.  Since postgres won't allow for using libnss or OpenSSL for cryptohash
>> *without* compiling SSL/TLS support (used or not), I think --with-ssl=LIB is
>> more descriptive and less confusing.
> 
> Okay, let's use --with-ssl then for the new switch name.  The previous
> patch is backward-compatible, and will simplify the rest of the set,
> so let's move on with it.  Once this is done, my guess is that it
> would be cleaner to have a new patch that includes only the
> ./configure and MSVC changes, and then the rest: test refactoring,
> cryptohash, strong random and lastly TLS (we may want to cut this a
> bit more though and perhaps have some restrictions depending on the
> scope of options a first patch set could support).
> 
> I'll wait a bit first to see if there are any objections to this
> change.

I'm still not convinced that adding --with-ssl=openssl is worth it before the
rest of NSS goes in (and more importantly, *if* it goes in).

On the one hand, we already have pluggable (for some value of) support for
adding TLS libraries, and adding --with-ssl is one more piece of that puzzle.
We could of course have endless --with-X options instead but as you say,
--with-uuid has set the tone here (and I believe that's good).  On the other
hand, if we never add any other library than OpenSSL then it's just complexity
without benefit.

As mentioned elsewhere in the thread, the current v23 patchset has the
--with-ssl change as a separate commit to at least make it visual what it looks
like.  The documentation changes are in the main NSS patch though since
documenting --with-ssl when there is only one possible value didn't seem to be
helpful to users whom are fully expected to use --with-openssl still.

--
Daniel Gustafsson   https://vmware.com/

Re: Is Recovery actually paused?

2021-01-29 Thread Dilip Kumar

On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
>
> On Thu, 28 Jan 2021 09:55:42 +0530
> Dilip Kumar  wrote:
>
> > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> > > >
> > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > Dilip Kumar  wrote:
> > > >
> > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > >  wrote:
> > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" 
> > > > > > > > exists
> > > > > > > > in a function, I expect a boolean output. Others may have better
> > > > > > > > thoughts.
> > > > > > >
> > > > > > > Maybe we should leave the existing function 
> > > > > > > pg_is_wal_replay_paused()
> > > > > > > alone and add a new one with the name you suggest that returns 
> > > > > > > text.
> > > > > > > That would create less burden for tool authors.
> > > > > >
> > > > > > +1
> > > > > >
> > > > >
> > > > > Yeah, we can do that, I will send an updated patch soon.
> > > >
> > > > This means pg_is_wal_replay_paused is left without any change and this
> > > > returns whether pause is requested or not? If so, it seems good to 
> > > > modify
> > > > the documentation of this function in order to note that this could not
> > > > return the actual pause state.
> > >
> > > Yes, we can say that it will return true if the replay pause is
> > > requested.  I am changing that in my new patch.
> >
> > I have modified the patch, changes
> >
> > - I have added a new interface pg_get_wal_replay_pause_state to get
> > the pause request state
> > - Now, we are not waiting for the recovery to actually get paused so I
> > think it doesn't make sense to put a lot of checkpoints to check the
> > pause requested so I have removed that check from the
> > recoveryApplyDelay but I think it better we still keep that check in
> > the WaitForWalToBecomeAvailable because it can wait forever before the
> > next wal get available.
>
> I think basically the check in WaitForWalToBecomeAvailable is independent
> of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> actual pause state.  This function could just return 'pause requested'
> if a pause is requested during waiting for WAL.
>
> However, I agree the change to allow recovery to transit the state to
> 'paused' during WAL waiting because 'paused' has more useful information
> for users than 'pause requested'.  Returning 'paused' lets users know
> clearly that no more WAL are applied until recovery is resumed.  On the
> other hand, when 'pause requested' is returned, user can't say whether
> the next WAL wiill be applied or not from this information.
>
> For the same reason, I think it is also useful to call recoveryPausesHere
> in recoveryApplyDelay.

IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
available so it can not be controlled by user so it is good to put a
check for the recovery pause,  however recoveryApplyDelay wait for the
apply delay which is configured by user and it is predictable value by
the user.  I don't have much objection to putting that check in the
recoveryApplyDelay as well but I feel it is not necessary.  Any other
thoughts on this?

> In addition, in RecoveryRequiresIntParameter, recovery should get paused
> if a parameter value has a problem.  However, pg_get_wal_replay_pause_state
> will return 'pause requested' in this case. So, I think, we should pass
> RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
> or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().

Yeah, absolutely right, it must pass RECOVERY_PAUSED.  I will change
this, thanks for noticing this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Single transaction in the tablesync worker?

2021-01-29 Thread Peter Smith

On Thu, Jan 28, 2021 at 9:37 PM Amit Kapila  wrote:
>
> On Thu, Jan 28, 2021 at 12:32 PM Peter Smith  wrote:
> >
> > On Wed, Jan 27, 2021 at 2:53 PM Amit Kapila  wrote:
> > >
> > > On Sat, Jan 23, 2021 at 5:56 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Sat, Jan 23, 2021 at 4:55 AM Peter Smith  
> > > > wrote:
> > > > >
> > > > > PSA the v18 patch for the Tablesync Solution1.
> > > >
> > > > 7. Have you tested with the new patch the scenario where we crash
> > > > after FINISHEDCOPY and before SYNCDONE, is it able to pick up the
> > > > replication using the new temporary slot? Here, we need to test the
> > > > case where during the catchup phase we have received few commits and
> > > > then the tablesync worker is crashed/errored out? Basically, check if
> > > > the replication is continued from the same point?
> > > >
> > >
> > > I have tested this and it didn't work, see the below example.
> > >
> > > Publisher-side
> > > 
> > > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text 
> > > varchar(120));
> > >
> > > BEGIN;
> > > INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
> > > INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
> > > COMMIT;
> > >
> > > CREATE PUBLICATION mypublication FOR TABLE mytbl1;
> > >
> > > Subscriber-side
> > > 
> > > - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync
> > > worker stops.
> > >
> > > CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text 
> > > varchar(120));
> > >
> > >
> > > CREATE SUBSCRIPTION mysub
> > >  CONNECTION 'host=localhost port=5432 dbname=postgres'
> > > PUBLICATION mypublication;
> > >
> > > During debug, stop after we mark FINISHEDCOPY state.
> > >
> > > Publisher-side
> > > 
> > > INSERT INTO mytbl1(somedata, text) VALUES (1, 3);
> > > INSERT INTO mytbl1(somedata, text) VALUES (1, 4);
> > >
> > >
> > > Subscriber-side
> > > 
> > > - Have a breakpoint in apply_dispatch
> > > - continue in debugger;
> > > - After we replay first commit (which will be for values(1,3), note
> > > down the origin position in apply_handle_commit_internal and somehow
> > > error out. I have forced the debugger to set to the last line in
> > > apply_dispatch where the error is raised.
> > > - After the error, again the tablesync worker is restarted and it
> > > starts from the position noted in the previous step
> > > - It exits without replaying the WAL for (1,4)
> > >
> > > So, on the subscriber-side, you will see 3 records. Fourth is missing.
> > > Now, if you insert more records on the publisher, it will anyway
> > > replay those but the fourth one got missing.
> > >
> ...
> > >
> > > At this point, I can't think of any way to fix this problem except for
> > > going back to the previous approach of permanent slots but let me know
> > > if you have any ideas to salvage this approach?
> > >
> >
> > OK. The latest patch [v21] now restores the permanent slot (and slot
> > cleanup) approach as it was implemented in an earlier version [v17].
> > Please note that this change also re-introduces some potential slot
> > cleanup problems for some race scenarios.
> >
>
> I am able to reproduce the race condition where slot/origin will
> remain on the publisher node even when the corresponding subscription
> is dropped. Basically, if we error out in the 'catchup' phase in
> tablesync worker then either it will restart and cleanup slot/origin
> or if in the meantime we have dropped the subscription and stopped
> apply worker then probably the slot and origin will be dangling on the
> publisher.
>
> I have used exactly the same test procedure as was used to expose the
> problem in the temporary slots with some minor changes as mentioned
> below:
> Subscriber-side
> 
> - Have a while(1) loop in LogicalRepSyncTableStart so that tablesync
> worker stops.
> - Have a while(1) loop in wait_for_relation_state_change so that we
> can control apply worker via debugger at the right time.
>
> Subscriber-side
> 
> - Have a breakpoint in apply_dispatch
> - continue in debugger;
> - After we replay first commit somehow error out. I have forced the
> debugger to set to the last line in apply_dispatch where the error is
> raised.
> - Now, the table sync worker won't restart because the apply worker is
> looping in wait_for_relation_state_change.
> - Execute DropSubscription;
> - We can allow apply worker to continue by skipping the while(1) and
> it will exit because DropSubscription would have sent a terminate
> signal.
>
> After the above steps, check the publisher (select * from
> pg_replication_slots) and you will find the dangling tablesync slot.
>
> I think to solve the above problem we should drop tablesync
> slot/origin at the Drop/Alter Subscription time and additionally we
> need to ensure that apply worker doesn't let tablesync workers restart
> (or it must not do any work to access the slot because the slots are
> dropped) once we st

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-29 Thread Bharath Rupireddy

On Fri, Jan 29, 2021 at 1:24 PM Bharath Rupireddy
 wrote:
>
> On Fri, Jan 29, 2021 at 1:17 PM Fujii Masao  
> wrote:
> > >> But if the issue is only the inconsistency of test results,
> > >> we can go with the option (2)? Even with (2), we can make the test
> > >> stable by removing "valid" column and executing
> > >> postgres_fdw_get_connections() within the transaction?
> > >
> > > Hmmm, and we should have the tests at the start of the file
> > > postgres_fdw.sql before even we make any foreign server connections.
> >
> > We don't need to move the test if we always call 
> > postgres_fdw_disconnect_all() just before starting new transaction and 
> > calling postgres_fdw_get_connections() as follows?
> >
> > SELECT 1 FROM postgres_fdw_disconnect_all();
> > BEGIN;
> > ...
> > SELECT * FROM postgres_fdw_get_connections();
> > ...
>
> Yes, that works, but we cannot show true/false for the
> postgres_fdw_disconnect_all output.
>
> I will post the patch soon. Thanks a lot.

Attaching a patch that has following changes: 1) Now,
postgres_fdw_get_connections will only return set of active
connections server names not their valid state 2) The functions
postgres_fdw_get_connections, postgres_fdw_disconnect and
postgres_fdw_disconnect_all are now being tested within an explicit
xact block, this way the tests are more stable even with clobber cache
always builds.

I tested the patch here on my development system with
-DCLOBBER_CACHE_ALWAYS configuration, the tests look consistent.

Please review the patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Don-t-return-valid-state-in-postgres_fdw_get_conn.patch
Description: Binary data

Re: Single transaction in the tablesync worker?

2021-01-29 Thread Peter Smith

Hi Amit.

PSA the v22 patch for the Tablesync Solution1.

Differences from v21:
+ Patch is rebased to latest OSS HEAD @ 29/Jan.
+ Includes new code as suggested [ak0128] to ensure no dangling slots
at Drop/AlterSubscription.
+ Removes the slot/origin cleanup down by process interrupt logic
(cleanup_at_shutdown function).
+ Addresses some minor review comments.


[ak0128] 
https://www.postgresql.org/message-id/CAA4eK1LMYXZY1SpzgW-WyFdy%2BFTMZ4BMz1dj0rT2rxGv-zLwFA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v22-0001-Tablesync-Solution1.patch
Description: Binary data


v22-0002-Tablesync-extra-logging.patch
Description: Binary data

Re: "has_column_privilege()" issue with attnums and non-existent columns

2021-01-29 Thread Joe Conway

On 1/29/21 12:13 AM, Ian Lawrence Barwick wrote:
> 2021年1月28日(木) 17:18 Peter Eisentraut:
> I'm not convinced the current behavior is wrong.  Is there some
> practical use case that is affected by this behavior?
> 
>  
> I was poking around at the function with a view to using it for something and 
> was
> curious what it did with bad input.
> 
> Providing the column name of a dropped column:
> 
>     Me: "Hey Postgres, do I have privileges on the dropped column 'bar' of my
> table 'foo'?"
>     Pg: "That column doesn't even exist - here, have an error instead."
>     Me: "Hey Postgres, does some other less-privileged user have privileges 
> on the
>          dropped column 'bar' of my table 'foo'?
>     Pg: "That column doesn't even exist - here, have an error instead."
> 
> Providing the attnum of a dropped column:
> 
>     Me: "Hey Postgres, here's the attnum of the dropped column 'bar', does 
> some
>          other less-privileged user have privileges on that column?"
>     Pg: "That column doesn't even exist - here, have a NULL".
>     Me: "Hey Postgres, here's the attnum of the dropped column 'bar' on this 
> table
>          I own, do I have privileges on that column?"
>     Pg: "Yup. And feel free to throw any other smallint at me, I'll pretend 
> that
>          represents a column too even if it never existed.".
> 
> Looking at the code, particularly the cited comment, it does seems the intent 
> was
> to return NULL in all cases where an invalid attnum was provided, but that 
> gets
> short-circuited by the assumption table owner = has privilege on any column.

Nicely illustrated :-)

> Not the most urgent or exciting of issues, but seems inconsistent to me.

+1

Joe


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: Is Recovery actually paused?

2021-01-29 Thread Yugo NAGATA

On Thu, 28 Jan 2021 09:55:42 +0530
Dilip Kumar  wrote:

> On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
> >
> > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> > >
> > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > Dilip Kumar  wrote:
> > >
> > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  
> > > > > wrote:
> > > > > >
> > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > >  wrote:
> > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" 
> > > > > > > exists
> > > > > > > in a function, I expect a boolean output. Others may have better
> > > > > > > thoughts.
> > > > > >
> > > > > > Maybe we should leave the existing function 
> > > > > > pg_is_wal_replay_paused()
> > > > > > alone and add a new one with the name you suggest that returns text.
> > > > > > That would create less burden for tool authors.
> > > > >
> > > > > +1
> > > > >
> > > >
> > > > Yeah, we can do that, I will send an updated patch soon.
> > >
> > > This means pg_is_wal_replay_paused is left without any change and this
> > > returns whether pause is requested or not? If so, it seems good to modify
> > > the documentation of this function in order to note that this could not
> > > return the actual pause state.
> >
> > Yes, we can say that it will return true if the replay pause is
> > requested.  I am changing that in my new patch.
> 
> I have modified the patch, changes
> 
> - I have added a new interface pg_get_wal_replay_pause_state to get
> the pause request state
> - Now, we are not waiting for the recovery to actually get paused so I
> think it doesn't make sense to put a lot of checkpoints to check the
> pause requested so I have removed that check from the
> recoveryApplyDelay but I think it better we still keep that check in
> the WaitForWalToBecomeAvailable because it can wait forever before the
> next wal get available.

I think basically the check in WaitForWalToBecomeAvailable is independent
of the feature of pg_get_wal_replay_pause_state, that is, reporting the
actual pause state.  This function could just return 'pause requested' 
if a pause is requested during waiting for WAL.

However, I agree the change to allow recovery to transit the state to
'paused' during WAL waiting because 'paused' has more useful information
for users than 'pause requested'.  Returning 'paused' lets users know
clearly that no more WAL are applied until recovery is resumed.  On the
other hand, when 'pause requested' is returned, user can't say whether
the next WAL wiill be applied or not from this information.

For the same reason, I think it is also useful to call recoveryPausesHere
in recoveryApplyDelay. 

In addition, in RecoveryRequiresIntParameter, recovery should get paused
if a parameter value has a problem.  However, pg_get_wal_replay_pause_state
will return 'pause requested' in this case. So, I think, we should pass
RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().

Regrads,
Yugo Nagata

-- 
Yugo NAGATA

Re: [PoC] Non-volatile WAL buffer

2021-01-29 Thread Takashi Menjo

Hi Tomas,

I'd answer your questions. (Not all for now, sorry.)


> Do I understand correctly that the patch removes "regular" WAL buffers
and instead writes the data into the non-volatile PMEM buffer, without
writing that to the WAL segments at all (unless in archiving mode)?
> Firstly, I guess many (most?) instances will have to write the WAL
segments anyway because of PITR/backups, so I'm not sure we can save much
here.

Mostly yes. My "non-volatile WAL buffer" patchset removes regular volatile
WAL buffers and brings non-volatile ones. All the WAL will get into the
non-volatile buffers and persist there. No write out of the buffers to WAL
segment files is required. However in archiving mode or in a case of buffer
full (described later), both of the non-volatile buffers and the segment
files are used.

In archiving mode with my patchset, for each time one segment (16MB
default) is fixed on the non-volatile buffers, that segment is written to a
segment file asynchronously (by XLogBackgroundFlush). Then it will be
archived by existing archiving functionality.


> But more importantly - doesn't that mean the nvwal_size value is
essentially a hard limit? With max_wal_size, it's a soft limit i.e. we're
allowed to temporarily use more WAL when needed. But with a pre-allocated
file, that's clearly not possible. So what would happen in those cases?

Yes, nvwal_size is a hard limit, and I see it's a major weak point of my
patchset.

When all non-volatile WAL buffers are filled, the oldest segment on the
buffers is written (by XLogWrite) to a regular WAL segment file, then those
buffers are cleared (by AdvanceXLInsertBuffer) for new records. All WAL
record insertions to the buffers block until that write and clear are
complete. Due to that, all write transactions also block.

To make the matter worse, if a checkpoint eventually occurs in such a
buffer full case, record insertions would block for a certain time at the
end of the checkpoint because a large amount of the non-volatile buffers
will be cleared (see PreallocNonVolatileXlogBuffer). From a client view, it
would look as if the postgres server freezes for a while.

Proper checkpointing would prevent such cases, but it could be hard to
control. When I reproduced the Gang's case reported in this thread, such
buffer full and freeze occured.


> Also, is it possible to change nvwal_size? I haven't tried, but I wonder
what happens with the current contents of the file.

The value of nvwal_size should be equal to the actual size of nvwal_path
file when postgres starts up. If not equal, postgres will panic at
MapNonVolatileXLogBuffer (see nv_xlog_buffer.c), and the WAL contents on
the file will remain as it was. So, if an admin accidentally changes the
nvwal_size value, they just cannot get postgres up.

The file size may be extended/shrunk offline by truncate(1) command, but
the WAL contents on the file also should be moved to the proper offset
because the insertion/recovery offset is calculated by modulo, that is,
record's LSN % nvwal_size; otherwise we lose WAL. An offline tool to do
such an operation might be required, but is not yet.


> The way I understand the current design is that we're essentially
switching from this architecture:
>
>clients -> wal buffers (DRAM) -> wal segments (storage)
>
> to this
>
>clients -> wal buffers (PMEM)
>
> (Assuming there we don't have to write segments because of archiving.)

Yes. Let me describe how current PostgreSQL design is and how the patchsets
and works talked in this thread changes it, AFAIU:

  - Current PostgreSQL:
clients -[memcpy]-> buffers (DRAM) -[write]-> segments (disk)

  - Patch "pmem-with-wal-buffers-master.patch" Tomas posted:
clients -[memcpy]-> buffers (DRAM) -[pmem_memcpy]-> mmap-ed segments
(PMEM)

  - My "non-volatile WAL buffer" patchset:
clients -[pmem_memcpy(*)]-> buffers (PMEM)

  - My another patchset mmap-ing segments as buffers:
clients -[pmem_memcpy(*)]-> mmap-ed segments as buffers (PMEM)

  - "Non-volatile Memory Logging" in PGcon 2016 [1][2][3]:
clients -[memcpy]-> buffers (WC[4] DRAM as pseudo PMEM) -[async
write]-> segments (disk)

  (* or memcpy + pmem_flush)

And I'd say that our previous work "Introducing PMDK into PostgreSQL"
talked in PGCon 2018 [5] and its patchset [6 for the latest] are based on
the same idea as Tomas's patch above.


That's all for this mail. Please be patient for the next mail.

Best regards,
Takashi

[1] https://www.pgcon.org/2016/schedule/track/Performance/945.en.html
[2] https://github.com/meistervonperf/postgresql-NVM-logging
[3] https://github.com/meistervonperf/pseudo-pram
[4] https://www.kernel.org/doc/html/latest/x86/pat.html
[5] https://pgcon.org/2018/schedule/events/1154.en.html
[6]
https://www.postgresql.org/message-id/CAOwnP3ONd9uXPXKoc5AAfnpCnCyOna1ru6sU=ey_4wfmjak...@mail.gmail.com

-- 
Takashi Menjo

Re: About to add WAL write/fsync statistics to pg_stat_wal view

2021-01-29 Thread Masahiro Ikeda


On 2021-01-27 00:14, David G. Johnston wrote:

On Mon, Jan 25, 2021 at 11:56 PM Masahiro Ikeda
 wrote:


(wal_write)
The number of times WAL buffers were written out to disk via

XLogWrite




Thanks.

I thought it's better to omit "The" and "XLogWrite" because other
views'
description
omits "The" and there is no description of "XlogWrite" in the
documents.
What do you think?


The documentation for WAL does get into the public API level of detail
and doing so here makes what this measures crystal clear.  The
potential absence of sufficient detail elsewhere should be corrected
instead of making this description more vague.  Specifically, probably
XLogWrite should be added to the WAL overview as part of this update
and probably even have the descriptive section of the documentation
note that the number of times that said function is executed is
exposed as a counter in the wal statistics table - thus closing the
loop.


Thanks for your comments.

I added the descriptions in documents and separated the patch
into attached two patches. First is to add wal i/o activity statistics.
Second is to make the wal receiver report the wal statistics.

Please let me know if you have any comments.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom a93516dd9836345f97a6f4081597a7079dff4932 Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 29 Jan 2021 16:41:34 +0900
Subject: [PATCH 1/2] Add statistics related to write/sync wal records.

This patch adds following statistics to pg_stat_wal view
to track WAL I/O activity.

- the total number of times writing/syncing WAL data.
- the total amount of time spent writing/syncing WAL data.

Since to track I/O timing may leads significant overhead,
GUC parameter "track_wal_io_timing" is introduced.
Only if this is on, the I/O timing is measured.

The statistics related to sync are zero when "wal_sync_method"
is "open_datasync" or "open_sync", because it doesn't call each
sync method.

(This requires a catversion bump, as well as an update to
 PGSTAT_FILE_FORMAT_ID)

Author: Masahiro Ikeda
Reviewed-By: Japin Li, Hayato Kuroda, Masahiko Sawada, David Johnston
Discussion: https://postgr.es/m/0509ad67b585a5b86a83d445dfa75...@oss.nttdata.com
---
 doc/src/sgml/config.sgml  | 21 +++
 doc/src/sgml/monitoring.sgml  | 50 
 doc/src/sgml/wal.sgml | 12 +++-
 src/backend/access/transam/xlog.c | 59 ++-
 src/backend/catalog/system_views.sql  |  4 ++
 src/backend/postmaster/checkpointer.c |  2 +-
 src/backend/postmaster/pgstat.c   |  4 ++
 src/backend/postmaster/walwriter.c|  3 +
 src/backend/utils/adt/pgstatfuncs.c   | 24 +++-
 src/backend/utils/misc/guc.c  |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h |  1 +
 src/include/catalog/pg_proc.dat   | 14 ++---
 src/include/pgstat.h  |  8 +++
 src/test/regress/expected/rules.out   |  6 +-
 15 files changed, 202 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f1037df5a9..3f1b3c1715 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7416,6 +7416,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  track_wal_io_timing (boolean)
+  
+   track_wal_io_timing configuration parameter
+  
+  
+  
+   
+Enables timing of WAL I/O calls. This parameter is off by default,
+because it will repeatedly query the operating system for
+the current time, which may cause significant overhead on some
+platforms.  You can use the  tool to
+measure the overhead of timing on your system.
+I/O timing information is
+displayed in 
+pg_stat_wal.  Only superusers can
+change this setting.
+   
+  
+ 
+
  
   track_functions (enum)
   
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..2435f401db 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3487,6 +3487,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   wal_write bigint
+  
+  
+   Number of times WAL buffers were written out to disk via 
+   XLogWrite, which nomally called by an 
+   XLogFlush request(see )
+  
+ 
+
+ 
+  
+   wal_write_time double precision
+  
+  
+   Total amount of time spent writing WAL data to disk, excluding sync time unless 
+is ether open_datasync or 
+   open_sync. Units are in milliseconds with microsecond resolution.
+   This is zero when  is disabled.
+  
+ 
+
+ 
+  
+   wal_sync bigint
+  
+  
+   Number of times WAL files were sy

Re: Snapbuild woes followup

2021-01-29 Thread Amit Kapila

On Tue, Jan 26, 2021 at 2:18 AM Andres Freund  wrote:
>
> Hi,
>
> On 2021-01-25 12:00:08 -0800, Andres Freund wrote:
> > > >   /*
> > > >* For backward compatibility reasons this has to be stored in the 
> > > > wrongly
> > > >* named field.  Will be fixed in next major version.
> > > >*/
> > > >   return builder->was_running.was_xmax;
> > >
> > > We could fix that now... Andres, what did you have in mind for a proper
> > > name?
> >
> > next_phase_at seems like it'd do the trick?
>

The new proposed name sounds good to me.

> See attached patch...

LGTM.

-- 
With Regards,
Amit Kapila.

[WIP] Reduce likelihood of fdw prepared statement collisions

2021-01-29 Thread Marco

Hi All,

I realize using foreign data wrappers with transaction pooling may not be
strictly supported, but for the most part they work. I am however
occasionally noticing errors stemming from the prepared statement names
created by the fdw modify code colliding between sessions/DBs.

Would the development team be open to a patch which somehow makes this less
likely? Something like the attached patch works, but probably isn't ideal?
Perhaps there is a better unique identifier I can use here. I am very new
to the postgres codebase.


Best
*Marco Montagna*
From c63630d5ebf3dd10fdf4de2faafb741acf71e6e0 Mon Sep 17 00:00:00 2001
From: Marco Montagna 
Date: Thu, 28 Jan 2021 23:55:38 -0800
Subject: [PATCH] Reduce likelihood of fdw prepared statement collisions.

---
 contrib/postgres_fdw/postgres_fdw.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d1..c826d6e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -3725,9 +3725,14 @@ prepare_foreign_modify(PgFdwModifyState *fmstate)
 	char	   *p_name;
 	PGresult   *res;
 
-	/* Construct name we'll use for the prepared statement. */
-	snprintf(prep_name, sizeof(prep_name), "pgsql_fdw_prep_%u",
-			 GetPrepStmtNumber(fmstate->conn));
+	/* Construct name we'll use for the prepared statement.
+	 *
+	 * Prepend a random number to reduce the likelihood of
+	 * prepared statement collisions when using transaction pooling.
+	 */
+	snprintf(prep_name, sizeof(prep_name), "pgsql_fdw_prep_%u_%u",
+			 GetPrepStmtNumber(fmstate->conn),
+			 rand());
 	p_name = pstrdup(prep_name);
 
 	/*
-- 
2.7.4

Re: Key management with tests

2021-01-29 Thread Masahiko Sawada

On Fri, Jan 29, 2021 at 5:22 AM Bruce Momjian  wrote:
>
> On Thu, Jan 28, 2021 at 02:41:09PM -0500, Tom Kincaid wrote:
> > I would also like to add a "not wanted" entry for this feature on the
> > TODO list, baaed on the feature's limited usefulness, but I already
> > asked about that and no one seems to feel we don't want it.
> >
> >
> > I want to avoid seeing this happen. As a result of a lot of customer and 
> > user
> > discussions, around their criteria for choosing a database, I believe TDE 
> > is an
> > important feature and having it appear with a "not-wanted" tag will keep the
> > version of PostgreSQL released by the community out of certain (and possibly
> > growing) number of deployment scenarios which I don't think anybody wants to
> > see.
>
> With pg_upgrade, I could work on it out of the tree until it became
> popular, with a small non-user-visible part in the backend.  With the
> Windows port, the port wasn't really visible to users until it we ready.
>
> For the key management part of TDE, it can't be done outside the tree,
> and it is user-visible before it is useful, so that restricts how much
> incremental work can be committed to the tree for TDE.  I highlighted
> that concern emails months ago, but never got any feedback --- now it
> seems people are realizing the ramifications of that.
>
> > I think the current situation to be as follows (if I missed something please
> > let me know):
> >
> > 1) We need to get the current patch for Key Management reviewed and tested
> > further.
> >
> > I spoke to Bruce just now he will see if can get somebody to do this.
>
> Well, if we don't get anyone committed to working on the data encryption
> part of TDE, the key management part is useless, so why review/test it
> further?
>
> Although Sawada-san and Stephen Frost worked on the patch, they have not
> commented much on my additions, and only a few others have commented on
> the code, and there has been no discussion on who is working on the next
> steps.  This indicates to me that there is little interest in moving
> this feature forward,

TBH I’m confused a bit about the recent situation of this patch, but I
can contribute to KMS work by discussing, writing, reviewing, and
testing the patch. Also, I can work on the data encryption part of TDE
(we need more discussion on that though). If the community concerns
about the high-level design and thinks the design reviews by
cryptography experts are still needed, we would need to do that first
since the data encryption part of TDE depends on KMS. As far as I
know, we have done that many times on pgsql-hackers, on offl-line and
including the discussion on the past proposal, etc but given that the
community still has a concern, it seems that we haven’t been able to
share the details of the discussion enough that led to the design
decision or the design is still not good. Honestly, I’m not sure how
this feature can get consensus. But maybe we would need to have a
break from refining the patch now and we need to marshal the
discussions so far and the point behind the design so that everyone
can understand why this feature is designed in that way. To do that,
it might be a good start to sort the wiki page since it has data
encryption part, KMS, and ToDo mixed.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

56 matches

Mail list logo