Re: TAP tests aren't using the magic words for Windows file access

2020-12-16 Thread r . zharkov

Thank you very much!

On 2020-12-15 20:47, Andrew Dunstan wrote:

On 12/15/20 12:05 AM, r.zhar...@postgrespro.ru wrote:

Are there any plans to backport the patch to earlier versions
of the Postgres?
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=114541d58e5970e51b78b77b65de16210beaab43



Oops, looks like that slipped off my radar somehow, I'll see about
backpatching it right away.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com


--
regards,

Roman Zharkov




Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

2020-12-16 Thread Fujii Masao



On 2020/12/16 16:24, Kyotaro Horiguchi wrote:

At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao  
wrote in

I pushed the following two patches.
- v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
- v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch


As I told in other thread [1], I'm thinking to revert this patch
because this change could have bad side-effect on the startup
process waiting for recovery conflict.

Before applying the patch, the latch that the startup process
used to wait for recovery conflict was different from the latch
that SIGHUP signal handler or walreceiver process, etc set to
wake the startup process up. So SIGHUP or walreceiver didn't
wake the startup process waiting for recovery conflict up unnecessary.

But the patch got rid of the dedicated latch for signaling
the startup process. This change forced us to use the same latch
to make the startup process wait or wake up. Which caused SIGHUP
signal handler or walreceiver proces to wake the startup process
waiting on the latch for recovery conflict up unnecessarily
frequently.

While waiting for recovery conflict on buffer pin, deadlock needs
to be checked at least every deadlock_timeout. But that frequent
wakeups could prevent the deadlock timer from being triggered and
could delay that deadlock checks.


I thought that spurious wakeups don't harm. But actually
ResolveRecoveryConflictWithBufferPin doesn't consider spurious
wakeups.  Only the timer woke up ResolveRecoveryConflictWithBufferPin
before the patch comes. Currently SIGHUP and XLogFlush() (on
walreceiver) also wake up startup process.

For a moment I thought that ResolveRecoveryConflictWithBufferPin
should wake up at shutdown time by the old recovery latch but it's not
the case since it wakes up after all blockers go away.  It seems to me
simpler to revert the patches than making the function properly handle
spurious wakeups.


Therefore, I'm thinking to revert the commit ac22929a26 and
113d3591b8.

[1]
https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726...@oss.nttdata.com


As the result, I agree to revert them. But I think we need to add a
comment for the reason we don't use MyLatch for recovery-wakeup after
reverting them.


Agreed. Attached is the patch that reverts those patches and
adds the comments about why both procLatch and recoveryWakeupLatch
are necessary.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 8dd225c2e1..b1e5d2dbff 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -681,8 +681,18 @@ typedef struct XLogCtlData
 * recoveryWakeupLatch is used to wake up the startup process to 
continue
 * WAL replay, if it is waiting for WAL to arrive or failover trigger 
file
 * to appear.
+*
+* Note that the startup process also uses another latch, its procLatch,
+* to wait for recovery conflict. If we get rid of recoveryWakeupLatch 
for
+* signaling the startup process in favor of using its procLatch, which
+* comports better with possible generic signal handlers using that 
latch.
+* But we should not do that because the startup process doesn't assume
+* that it's waken up by walreceiver process or SIGHUP signal handler
+* while it's waiting for recovery conflict. The separate latches,
+* recoveryWakeupLatch and procLatch, should be used for inter-process
+* communication for WAL replay and recovery conflict, respectively.
 */
-   Latch   *recoveryWakeupLatch;
+   Latch   recoveryWakeupLatch;
 
/*
 * During recovery, we keep a copy of the latest checkpoint record here.
@@ -5186,6 +5196,7 @@ XLOGShmemInit(void)
SpinLockInit(&XLogCtl->Insert.insertpos_lck);
SpinLockInit(&XLogCtl->info_lck);
SpinLockInit(&XLogCtl->ulsn_lck);
+   InitSharedLatch(&XLogCtl->recoveryWakeupLatch);
 }
 
 /*
@@ -6121,7 +6132,7 @@ recoveryApplyDelay(XLogReaderState *record)
 
while (true)
{
-   ResetLatch(MyLatch);
+   ResetLatch(&XLogCtl->recoveryWakeupLatch);
 
/* might change the trigger file's location */
HandleStartupProcInterrupts();
@@ -6140,7 +6151,7 @@ recoveryApplyDelay(XLogReaderState *record)
 
elog(DEBUG2, "recovery apply delay %ld milliseconds", msecs);
 
-   (void) WaitLatch(MyLatch,
+   (void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
 WL_LATCH_SET | WL_TIMEOUT | 
WL_EXIT_ON_PM_DEATH,
 msecs,
 
WAIT_EVENT_RECOVERY_APPLY_DELAY);
@@ -6469,11 +6480,11 @@ StartupXLOG(void)
}
 
/*
-* Adver

Re: [HACKERS] logical decoding of two-phase transactions

2020-12-16 Thread Amit Kapila
On Wed, Dec 16, 2020 at 1:04 PM Masahiko Sawada  wrote:
>
> Thank you for updating the patch. I have two questions:
>
> -
> @@ -239,6 +239,19 @@ CREATE SUBSCRIPTION  class="parameter">subscription_name   
>  
> 
> +   
> +two_phase (boolean)
> +
> + 
> +  Specifies whether two-phase commit is enabled for this 
> subscription.
> +  The default is false.
> +  When two-phase commit is enabled then the decoded
> transactions are sent
> +  to the subscriber on the PREPARE TRANSACTION. When
> two-phase commit is not
> +  enabled then PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED are 
> not
> +  decoded on the publisher.
> + 
> +
> +   
>
> The user will need to specify the 'two_phase’ option on CREATE
> SUBSCRIPTION. It would mean the user will need to control what data is
> streamed both on publication side for INSERT/UPDATE/DELETE/TRUNCATE
> and on subscriber side for PREPARE. Looking at the implementation of
> the ’two_phase’ option of CREATE SUBSCRIPTION, it ultimately just
> passes the ‘two_phase' option to the publisher. Why don’t we set it on
> the publisher side?
>

There could be multiple subscriptions for the same publication, some
want to decode the transaction at prepare time and others might want
to decode at commit time only.  Also, one subscription could subscribe
to multiple publications, so not sure if it is even feasible to set at
publication level (consider one txn has changes belonging to multiple
publications). This option controls how the data is streamed from a
publication similar to other options like 'streaming'. Why do you
think this should be any different?

> Also, I guess we can improve the description of
> ’two_phase’ option of CREATE SUBSCRIPTION in the doc by adding the
> fact that when this option is not enabled the transaction prepared on
> the publisher is decoded as a normal transaction:
>

Sounds reasonable.

> --
> +   if (LookupGXact(begin_data.gid))
> +   {
> +   /*
> +* If this gid has already been prepared then we dont want to apply
> +* this txn again. This can happen after restart where upstream can
> +* send the prepared transaction again. See
> +* ReorderBufferFinishPrepared. Don't update remote_final_lsn.
> +*/
> +   skip_prepared_txn = true;
> +   return;
> +   }
>
> When PREPARE arrives at the subscriber node but there is the prepared
> transaction with the same transaction identifier, the apply worker
> skips the whole transaction. So if the users prepared a transaction
> with the same identifier on the subscriber, the prepared transaction
> that came from the publisher would be ignored without any messages. On
> the other hand, if applying other operations such as HEAP_INSERT
> conflicts (such as when violating the unique constraint) the apply
> worker raises an ERROR and stops logical replication until the
> conflict is resolved. IIUC since we can know that the prepared
> transaction came from the same publisher again by checking origin_lsn
> in TwoPhaseFileHeader I guess we can skip the PREPARE message only
> when the existing prepared transaction has the same LSN and the same
> identifier. To be exact, it’s still possible that the subscriber gets
> two PREPARE messages having the same LSN and name from two different
> publishers but it’s unlikely happen in practice.
>

The idea sounds reasonable. I'll try and see if this works.

Thanks.

-- 
With Regards,
Amit Kapila.




Re: Confusing behavior of psql's \e

2020-12-16 Thread Laurenz Albe
On Tue, 2020-12-01 at 11:34 -0500, Chapman Flack wrote:
> On 12/01/20 11:21, Laurenz Albe wrote:
> > On Tue, 2020-12-01 at 11:03 -0500, Chapman Flack wrote:
> > > > I propose to clear the query buffer if the
> > > > editor exits without modifying the file.
> > > 
> > > Or how about keeping the query buffer content unchanged, but not
> > > executing it? Then it's still there if you have another idea about
> > > editing it. It's easy enough to \r if you really want it gone.
> > 
> > What if the buffer was empty?  Would you want to get the previous
> > query in the query buffer?  I'd assume not...
> 
> I took your proposal to be specifically about what happens if the editor
> is exited with no change to the buffer, and in that case, I would suggest
> making no change to the buffer, but not re-executing it.
> 
> If the editor is exited after deliberately emptying the editor buffer,
> I would expect that to be treated as emptying the query buffer.
> 
> I don't foresee any case that would entail bringing a /previous/ query
> back into the query buffer.

I see I'll have to try harder.

The attached patch changes the behavior as follows:

- If the current query buffer is edited, and you quit the editor,
  the query buffer is retained.  This is as it used to be.

- If the query buffer is empty and you run \e, the previous query
  is edited (as it used to be), but quitting the editor will empty
  the query buffer and execute nothing.

- Similarly, if you "\e file" and quit the editor, nothing will
  be executed and the query buffer is emptied.

- The same behavior change applies to \ef and \ev.
  There is no need to retain the definition in the query buffer,
  you can always run the \ev or \ev again.

I consider this a bug fix, but one that shouldn't be backpatched.
Re-executing the previous query if the editor is quit is
annoying at least and dangerous at worst.

I'll add this patch to the next commitfest.

Yours,
Laurenz Albe
From ecbf6cb81282ea5646b087290ca60a50daabe22c Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Wed, 16 Dec 2020 10:34:14 +0100
Subject: [PATCH] Improve \e, \ef and \ev if the editor is quit

Before, quitting the editor without saving would cause
the *previous query* to be executed, which is certainly not
what anyone would expect.

It is better to execute nothing and clear the query buffer
in this case.

The exception is if the current query buffer is editied:
in this case, the behavior is unclanged, and the query buffer
is retained.

Reviewed-by: Chapman Flack
Discussion: https://postgr.es/m/0ba3f2a658bac6546d9934ab6ba63a805d46a49b.camel%40cybertec.at
---
 doc/src/sgml/ref/psql-ref.sgml | 10 +---
 src/bin/psql/command.c | 43 +++---
 2 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 221a967bfe..44eebbf141 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1949,7 +1949,9 @@ testdb=>
 
 
 
-The new contents of the query buffer are then re-parsed according to
+If you edit a file or the previous query, and you quit the editor without
+modifying the file, the query buffer is cleared.
+Otherwise, the new contents of the query buffer are re-parsed according to
 the normal rules of psql, treating the
 whole buffer as a single line.  Any complete queries are immediately
 executed; that is, if the query buffer contains or ends with a
@@ -2018,7 +2020,8 @@ Tue Oct 26 21:40:57 CEST 1999
  in the form of a CREATE OR REPLACE FUNCTION or
  CREATE OR REPLACE PROCEDURE command.
  Editing is done in the same way as for \edit.
- After the editor exits, the updated command is executed immediately
+ If you quit the editor without saving, the statement is discarded.
+ If you save and exit the editor, the updated command is executed immediately
  if you added a semicolon to it.  Otherwise it is redisplayed;
  type semicolon or \g to send it, or \r
  to cancel.
@@ -2094,7 +2097,8 @@ Tue Oct 26 21:40:57 CEST 1999
  This command fetches and edits the definition of the named view,
  in the form of a CREATE OR REPLACE VIEW command.
  Editing is done in the same way as for \edit.
- After the editor exits, the updated command is executed immediately
+ If you quit the editor without saving, the statement is discarded.
+ If you save and exit the editor, the updated command is executed immediately
  if you added a semicolon to it.  Otherwise it is redisplayed;
  type semicolon or \g to send it, or \r
  to cancel.
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 38b52d..15d7c343e9 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -151,7 +151,7 @@ static void copy_previous_query(PQExpBuffer query_buf, PQExpBuffer previ

Re: Rethinking plpgsql's assignment implementation

2020-12-16 Thread Pavel Stehule
út 15. 12. 2020 v 21:18 odesílatel Tom Lane  napsal:

> I realized that the speedup patch I posted yesterday is flawed: it's
> too aggressive about applying the R/W param mechanism, instead of
> not aggressive enough.
>
> To review, the point of that logic is that if we have an assignment
> like
> arrayvar := array_append(arrayvar, some-scalar-expression);
> a naive implementation would have array_append construct an entire
> new array, which we'd then have to copy into plpgsql's variable
> storage.  Instead, if the array variable is in expanded-array
> format (which plpgsql encourages it to be) then we can pass the
> array parameter as a "read/write expanded datum", which array_append
> recognizes as license to scribble right on its input and return the
> modified input; that takes only O(1) time not O(N).  Then plpgsql's
> assignment code notices that the expression result datum is the same
> pointer already stored in the variable, so it does nothing.
>
> With the patch at hand, a subscripted assignment a[i] := x becomes,
> essentially,
> a := subscriptingref(a, i, x);
> and we need to make the same sort of transformation to allow
> array_set_element to scribble right on the original value of "a"
> instead of making a copy.
>
> However, we can't simply not consider the source expression "x",
> as I proposed yesterday.  For example, if we have
> a := subscriptingref(a, i, f(array_append(a, x)));
> it's not okay for array_append() to scribble on "a".  The R/W
> param mechanism normally disallows any additional references to
> the target variable, which would prevent this error, but I broke
> that safety check with the 0007 patch.
>
> After thinking about this awhile, I decided that plpgsql's R/W param
> mechanism is really misdesigned.  Instead of requiring the assignment
> source expression to be such that *all* its references to the target
> variable could be passed as R/W, we really want to identify *one*
> reference to the target variable to be passed as R/W, allowing any other
> ones to be passed read/only as they would be by default.  As long as the
> R/W reference is a direct argument to the top-level (hence last to be
> executed) function in the expression, there is no harm in R/O references
> being passed to other lower parts of the expression.  Nor is there any
> use-case for more than one argument of the top-level function being R/W.
>
> So the attached rewrite of the 0007 patch reimplements that logic to
> identify one single Param that references the target variable, and
> make only that Param pass a read/write reference, not any other
> Params referencing the target variable.  This is a good change even
> without considering the assignment-reimplementation proposal, because
> even before this patchset we could have cases like
> arrayvar := array_append(arrayvar, arrayvar[i]);
> The existing code would be afraid to optimize this, but it's in fact
> safe.
>
> I also re-attach the 0001-0006 patches, which have not changed, just
> to keep the cfbot happy.
>
>
I run some performance tests and it looks very well.


regards, tom lane
>
>


Re: Minor documentation error regarding streaming replication protocol

2020-12-16 Thread Brar Piening

Jeff Davis wrote:

Attached a simple patch that enforces an all-ASCII restore point name
rather than documenting the possibility of non-ASCII text.


+1

This is probably the best solution because it gives guarantees to the
client without causing compatibility issues with old clients.

Thanks!

Brar





Deadlock between backend and recovery may not be detected

2020-12-16 Thread Fujii Masao

Hi,

While reviewing the patch proposed at [1], I found that there is the case
where deadlock that recovery conflict on lock is involved in may not be
detected. This deadlock can happen between backends and the startup
process, in the standby server. Please see the following procedure to
reproduce the deadlock.

#1. Set up streaming replication.

#2. Set max_standby_streaming_delay to -1 in the standby.

#3. Create two tables in the primary.

[PRIMARY: SESSION1]
CREATE TABLE t1 ();
CREATE TABLE t2 ();

#4. Start transaction and access to the table t1.

[STANDBY: SESSION2]
BEGIN;
SELECT * FROM t1;

#5. Start transaction and lock table t2 in access exclusive mode,
in the primary. Also execute pg_switch_wal() to transfer WAL record
for access exclusive lock to the standby.

[PRIMARY: SESSION1]
BEGIN;
LOCK TABLE t2 IN ACCESS EXCLUSIVE MODE;
SELECT pg_switch_wal();

#6. Access to the table t2 within the transaction that started at #4,
in the standby.

[STANDBY: SESSION2]
SELECT * FROM t2;

#7. Lock table t1 in access exclusive mode within the transaction that
started in #5, in the primary. Also execute pg_switch_wal() to transfer
WAL record for access exclusive lock to the standby.

[PRIMARY: SESSION1]
LOCK TABLE t1 IN ACCESS EXCLUSIVE MODE;
SELECT pg_switch_wal();

After doing this procedure, you can see the startup process and backend
wait for the table lock each other, i.e., deadlock. But this deadlock remains
even after deadlock_timeout passes.

This seems a bug to me.


* Deadlocks involving the Startup process and an ordinary backend process
* will be detected by the deadlock detector within the ordinary backend.


The cause of this issue seems that ResolveRecoveryConflictWithLock() that
the startup process calls when recovery conflict on lock happens doesn't
take care of deadlock case at all. You can see this fact by reading the above
source code comment for ResolveRecoveryConflictWithLock().

To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
timer in ResolveRecoveryConflictWithLock() so that the startup process can
send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
the backend should check whether the deadlock actually happens or not.
Attached is the POC patch implimenting this.

Thought?

Regards,

[1] https://postgr.es/m/9a60178c-a853-1440-2cdc-c3af916cf...@amazon.com

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/storage/ipc/standby.c 
b/src/backend/storage/ipc/standby.c
index 4ea3cf1f5c..c398f9a3a5 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -407,7 +407,7 @@ ResolveRecoveryConflictWithLock(LOCKTAG locktag)
 
ltime = GetStandbyLimitTime();
 
-   if (GetCurrentTimestamp() >= ltime)
+   if (GetCurrentTimestamp() >= ltime && ltime > 0)
{
/*
 * We're already behind, so clear a path as quickly as possible.
@@ -431,12 +431,21 @@ ResolveRecoveryConflictWithLock(LOCKTAG locktag)
/*
 * Wait (or wait again) until ltime
 */
-   EnableTimeoutParams timeouts[1];
+   EnableTimeoutParams timeouts[2];
+   int i = 0;
 
-   timeouts[0].id = STANDBY_LOCK_TIMEOUT;
-   timeouts[0].type = TMPARAM_AT;
-   timeouts[0].fin_time = ltime;
-   enable_timeouts(timeouts, 1);
+   if (ltime > 0)
+   {
+   timeouts[i].id = STANDBY_LOCK_TIMEOUT;
+   timeouts[i].type = TMPARAM_AT;
+   timeouts[i].fin_time = ltime;
+   i++;
+   }
+   timeouts[i].id = STANDBY_DEADLOCK_TIMEOUT;
+   timeouts[i].type = TMPARAM_AFTER;
+   timeouts[i].delay_ms = DeadlockTimeout;
+   i++;
+   enable_timeouts(timeouts, i);
}
 
/* Wait to be signaled by the release of the Relation Lock */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3679799e50..3717381a6a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2923,7 +2923,11 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 * more to do.
 */
if (!HoldingBufferPinThatDelaysRecovery())
+   {
+   if (reason == 
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK)
+   CheckDeadLockAlert();
return;
+   }
 
MyProc->recoveryConflictPending = true;
 


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-12-16 Thread Bharath Rupireddy
On Mon, Dec 14, 2020 at 8:03 PM Fujii Masao  wrote:
> On 2020/12/14 14:36, Bharath Rupireddy wrote:
> > On Mon, Dec 14, 2020 at 9:38 AM Fujii Masao  > > wrote:
> >  > On 2020/12/12 15:05, Bharath Rupireddy wrote:
> >  > > On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao 
> > mailto:masao.fu...@oss.nttdata.com> 
> > >> 
> > wrote:
> >  > >  > I was thinking that in the case of drop of user mapping or server, 
> > hash_search(ConnnectionHash) in GetConnection() cannot find the cached 
> > connection entry invalidated by that drop. Because "user->umid" used as 
> > hash key is changed. So I was thinking that that invalidated connection 
> > will not be closed nor reconnected.
> >  > >  >
> >  > >
> >  > > You are right in saying that the connection leaks.
> >  > >
> >  > > Use case 1:
> >  > > 1) Run foreign query in session1 with server1, user mapping1
> >  > > 2) Drop user mapping1 in another session2, invalidation message gets 
> > logged which will have to be processed by other sessions
> >  > > 3) Run foreign query again in session1, at the start of txn, the 
> > cached entry gets invalidated via pgfdw_inval_callback(). Whatever may be 
> > the type of foreign query (select, update, explain, delete, insert, analyze 
> > etc.), upon next call to GetUserMapping() from postgres_fdw.c, the cache 
> > lookup fails(with ERROR:  user mapping not found for "") since the user 
> > mapping1 has been dropped in session2 and the query will also fail before 
> > reaching GetConnection() where the connections associated with invalidated 
> > entries would have got disconnected.
> >  > >
> >  > > So, the connection associated with invalidated entry would remain 
> > until the local session exits which is a problem to solve.
> >  > >
> >  > > Use case 2:
> >  > > 1) Run foreign query in session1 with server1, user mapping1
> >  > > 2) Try to drop foreign server1, then we would not be allowed to do so 
> > because of dependency. If we use CASCADE, then the dependent user mapping1 
> > and foreign tables get dropped too [1].
> >  > > 3) Run foreign query again in session1, at the start of txn, the 
> > cached entry gets invalidated via pgfdw_inval_callback(), it fails because 
> > there is no foreign table and user mapping1.
> >  > >
> >  > > But, note that the connection remains open in session1, which is again 
> > a problem to solve.
> >  > >
> >  > > To solve the above connection leak problem, it looks like the right 
> > place to close all the invalid connections is pgfdw_xact_callback(), once 
> > registered, which gets called at the end of every txn in the current 
> > session(by then all the sub txns also would have been finished). Note that 
> > if there are too many invalidated entries, then one of the following txn 
> > has to bear running this extra code, but that's okay than having leaked 
> > connections. Thoughts? If okay, I can code a separate patch.
> >  >
> >  > Thanks for further analysis! Sounds good. Also +1 for making it as 
> > separate patch. Maybe only this patch needs to be back-patched.
> >
> > Thanks. Yeah once agreed on the fix, +1 to back patch. Shall I start a 
> > separate thread for connection leak issue and patch, so that others might 
> > have different thoughts??
>
> Yes, of course!

Thanks. I posted the patch in a separate thread[1] for fixing the
connection leak problem.

[1] - 
https://www.postgresql.org/message-id/flat/CALj2ACVNcGH_6qLY-4_tXz8JLvA%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: On login trigger: take three

2020-12-16 Thread Konstantin Knizhnik



On 15.12.2020 20:16, Pavel Stehule wrote:



Please notice that we still need GUC to disable on-login
triggers: to make it possible for superuser who did mistake
and defined incorrect on-login trigger to
login to the system.
Do we need GUC to disable all other event triggers? May be I
am wrong, but I do not see much need in such GUC: error in
any of such event triggers is non fatal
and can be easily reverted.
So the only question is whether
"disable_client_connection_trigger" should be true by default
or not...

I agree with you that @2 is a little bit chaotic and @1 looks
like a workaround.
But from my point of view @3 is not the best solution but
overkill: maintaining yet another shared hash just to save
few milliseconds on login seems to be too high price.
Actually there are many things which are loaded by new
backend from the database on start: for example - catalog.
This is why launch of new backend is an expensive operation.
Certainly if we execute "select 1", then system catalog is
not needed...
But does anybody start new backend just to execute "select 1"
and exit?



I understand so the implementation of a new shared cache can be a
lot of work. The best way is enhancing pg_database about one
column with information about the login triggers
(dathaslogontriggers). In init time these data are in syscache,
and can be easily checked. Some like pg_attribute have an
atthasdef column.  Same fields has pg_class - relhasrules,
relhastriggers, ... Then the overhead of this design should be
really zero.

What do you think about it?


I like this approach more than implementation of shared hash.
But still I have some concerns:

1. pg_database table format has to be changed. Certainly it is not
something  completely unacceptable. But IMHO we should try to avoid
modification of such commonly used catalog tables as much as possible.


yes, I know. Unfortunately I  don't see any other and correct 
solution. There should be more wide discussion before this work about 
this topic. On second hand, this change should not break anything (if 
this new field will be placed on as the last field). The logon trigger 
really looks like a database trigger - so I think so this semantic is 
correct. I have no idea if it is acceptable for committers :-/. I hope 
so.


The fact that the existence of a logon trigger can be visible from a 
shared database object can be an interesting feature too.



2. It is not so easy to maintain this flag. There can be multiple
on-login triggers defined. If such trigger is dropped, we can not
just clear this flag.
We should check if other triggers exist. Now assume that there are
two triggers and two concurrent transactions dropping each one.
According to their snapshots them do not see changes made by other
transaction. So them remove both triggers but didn't clear the flag.
Certainly we can use counter instead of flag. But I am not sure
that their will be not other problems with maintaining counter.


I don't think it is necessary.  My opinion is not too strong, but if 
pg_class doesn't need to hold a number of triggers, then I think so 
pg_database doesn't need to hold this number too.


Attached please find new versoin of the patch based on 
on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch
So there is still only  "disable_client_connection_trigger" GUC? because 
we need possibility to disable client connect triggers and there is no 
such need for other event types.


As you suggested  have added "dathaslogontriggers" flag which is set 
when client connection trigger is created.
This flag is never cleaned (to avoid visibility issues mentioned in my 
previous mail).




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
index 60366a9..00f69b8 100644
--- a/doc/src/sgml/event-trigger.sgml
+++ b/doc/src/sgml/event-trigger.sgml
@@ -28,6 +28,7 @@
  An event trigger fires whenever the event with which it is associated
  occurs in the database in which it is defined. Currently, the only
  supported events are
+ client_connection,
  ddl_command_start,
  ddl_command_end,
  table_rewrite
@@ -36,6 +37,29 @@

 

+ The client_connection event occurs when a client connection
+ to the server is established.
+ There are two mechanisms for dealing with any bugs in a trigger procedure for
+ this event which might prevent successful login to the system:
+ 
+  
+   
+ The configuration parameter disable_client_connection_trigger
+ makes it possible to disable firing the client_connection
+ trigger when a client connects

Re: Deadlock between backend and recovery may not be detected

2020-12-16 Thread Victor Yegorov
ср, 16 дек. 2020 г. в 13:49, Fujii Masao :

> After doing this procedure, you can see the startup process and backend
> wait for the table lock each other, i.e., deadlock. But this deadlock
> remains
> even after deadlock_timeout passes.
>
> This seems a bug to me.
>
> > * Deadlocks involving the Startup process and an ordinary backend process
> > * will be detected by the deadlock detector within the ordinary backend.
>
> The cause of this issue seems that ResolveRecoveryConflictWithLock() that
> the startup process calls when recovery conflict on lock happens doesn't
> take care of deadlock case at all. You can see this fact by reading the
> above
> source code comment for ResolveRecoveryConflictWithLock().
>
> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
> timer in ResolveRecoveryConflictWithLock() so that the startup process can
> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
> the backend should check whether the deadlock actually happens or not.
> Attached is the POC patch implimenting this.
>

I agree that this is a bug.

Unfortunately, we've been hit by it in production.
Such deadlock will, eventually, make all sessions wait on the startup
process, making
streaming replica unusable. In case replica is used for balancing out RO
queries from the primary,
it causes downtime for the project.

If I understand things right, session will release it's locks
when max_standby_streaming_delay is reached.
But it'd be much better if conflict is resolved faster,
around deadlock_timeout.

So — huge +1 from me for fixing it.


-- 
Victor Yegorov


Re: ResourceOwner refactoring

2020-12-16 Thread Heikki Linnakangas

On 26/11/2020 10:52, Kyotaro Horiguchi wrote:

+   for (int i = 0; i < owner->narr; i++)
{
+   if (owner->arr[i].kind->phase == phase)
+   {
+   Datum   value = owner->arr[i].item;
+   ResourceOwnerFuncs *kind = owner->arr[i].kind;
+
+   if (printLeakWarnings)
+   kind->PrintLeakWarning(value);
+   kind->ReleaseResource(value);
+   found = true;
+   }
+   /*
+* If any resources were released, check again because some of 
the
+* elements might have moved by the callbacks. We don't want to
+* miss them.
+*/
+   } while (found && owner->narr > 0);

Coundn't that missing be avoided by just not incrementing i if found?


Hmm, perhaps. ResourceOwnerForget() can move an entry from the end of 
the array to the beginning, but if it's removing the entry that we're 
just looking at, it probably can't move anything before that entry. I'm 
not very comfortable with that, though. What if the callback releases 
something else as a side effect?


This isn't super-critical for performance, and given that the array is 
very small, it's very cheap to loop through it. So I'm inclined to keep 
it safe.



+   /*
+* Like with the array, we must check again after we reach the
+* end, if any callbacks were called. XXX: We could probably
+* stipulate that the callbacks may not do certain thing, like
+* remember more references in the same resource owner, to avoid
+* that.
+*/

If I read this patch correctly, ResourceOwnerForget doesn't seem to do
such a thing for hash?


Hmm, true. I tried removing the loop and hit another issue, however: if 
the same resource has been remembered twice in the same resource owner, 
the callback might remove different reference to it than what we're 
looking at. So we need to loop anyway to handle that. Also, what if the 
callback remembers some other resource in the same resowner, causing the 
hash to grow? I'm not sure if any of the callbacks currently do that, 
but better safe than sorry. I updated the code and the comments accordingly.



Here's an updated version of this patch. I fixed the bit rot, and 
addressed all the other comment issues and such that people pointed out 
(thanks!).


TODO:

1. Performance testing. We discussed this a little bit, but I haven't 
done any more testing.


2. Before this patch, the ResourceOwnerEnlarge() calls enlarged the 
array for the particular "kind" of resource, but now they are all stored 
in the same structure. That could lead to trouble if we do something 
like this:


ResourceOwnerEnlargeAAA()
ResourceOwnerEnlargeBBB()
ResourceOwnerRememberAAA()
ResourceOwnerRememberBBB()

Previously, this was OK, because resources AAA and BBB were kept in 
separate arrays. But after this patch, it's not guaranteed that the 
ResourceOwnerRememberBBB() will find an empty slot.


I don't think we do that currently, but to be sure, I'm planning to grep 
ResourceOwnerRemember and look at each call carefullly. And perhaps we 
can add an assertion for this, although I'm not sure where.


- Heikki
>From 71a372b9a82a5b50a40386b403d256b4f6fac794 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 16 Dec 2020 17:26:03 +0200
Subject: [PATCH v2 1/1] Make resowners more easily extensible.

Use a single array and hash, instead of one for each object kind.
---
 src/backend/access/common/tupdesc.c   |   42 +-
 src/backend/jit/jit.c |2 -
 src/backend/jit/llvm/llvmjit.c|   40 +-
 src/backend/storage/buffer/bufmgr.c   |   43 +-
 src/backend/storage/buffer/localbuf.c |2 +-
 src/backend/storage/file/fd.c |   44 +-
 src/backend/storage/ipc/dsm.c |   44 +-
 src/backend/storage/lmgr/lock.c   |2 +-
 src/backend/utils/cache/catcache.c|   98 ++-
 src/backend/utils/cache/plancache.c   |   50 +-
 src/backend/utils/cache/relcache.c|   39 +-
 src/backend/utils/cache/syscache.c|2 +-
 src/backend/utils/resowner/README |   19 +-
 src/backend/utils/resowner/resowner.c | 1139 +++--
 src/backend/utils/time/snapmgr.c  |   38 +-
 src/common/cryptohash_openssl.c   |   45 +-
 src/include/storage/buf_internals.h   |   15 +
 src/include/utils/catcache.h  |3 -
 src/include/utils/plancache.h |2 +
 src/include/utils/resowner.h  |   41 +-
 src/include/utils/resowner_private.h  |  105 ---
 21 files changed, 795 insertions(+), 1020 deletions(-)
 delete mode 100644 src/include/utils/resowner_private.h

diff --git a/src/backend/access/common/tupdesc.c b/src/backend/access/common/tupdesc.c
index 30c30cf3a2e..c099f04f551 1006

Re: Proposed patch for key managment

2020-12-16 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote:
> > Yeah, looking at what's been done there seems like the right approach to
> > me as well, ideally we'd have one set of APIs that'll support all these
> > use cases (not unlike what curl does..).
> 
> Ooh...  This is interesting.  What did curl do wrong here?  It is
> always good to hear from such past experiences.

Not sure that came across very well- curl did something right in terms
of their vTLS layer which allows for building curl with a number of
different SSL/TLS libraries without the core code having to know about
all the differences.  I was suggesting that we might want to look at how
they did that, and naturally discuss with Daniel and ask him what
thoughts he has from having worked with curl and the vTLS layer.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: SELECT INTO deprecation

2020-12-16 Thread Peter Eisentraut

On 2020-12-15 23:13, Bruce Momjian wrote:

Do we really want to carry around confusing syntax for compatibility?  I
doubt we would ever add INTO now, even for compatibility.


Right, we would very likely not add it now.  But it doesn't seem to 
cause a lot of ongoing maintenance burden, so if there is a use case, 
it's not unreasonable to keep it around.  I have no other motive here, I 
was just curious.





Re: Deadlock between backend and recovery may not be detected

2020-12-16 Thread Fujii Masao



On 2020/12/16 23:28, Drouvot, Bertrand wrote:

Hi,

On 12/16/20 2:36 PM, Victor Yegorov wrote:


*CAUTION*: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


ср, 16 дек. 2020 г. в 13:49, Fujii Masao mailto:masao.fu...@oss.nttdata.com>>:

After doing this procedure, you can see the startup process and backend
wait for the table lock each other, i.e., deadlock. But this deadlock 
remains
even after deadlock_timeout passes.

This seems a bug to me.


+1



> * Deadlocks involving the Startup process and an ordinary backend process
> * will be detected by the deadlock detector within the ordinary backend.

The cause of this issue seems that ResolveRecoveryConflictWithLock() that
the startup process calls when recovery conflict on lock happens doesn't
take care of deadlock case at all. You can see this fact by reading the 
above
source code comment for ResolveRecoveryConflictWithLock().

To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
timer in ResolveRecoveryConflictWithLock() so that the startup process can
send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
the backend should check whether the deadlock actually happens or not.
Attached is the POC patch implimenting this.


good catch!

I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be 
set in ResolveRecoveryConflictWithLock() too (it is already set in 
ResolveRecoveryConflictWithBufferPin()).

So + 1 to consider this as a bug and for the way the patch proposes to fix it.


Thanks Victor and Bertrand for agreeing!
Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c
index ee912b9d5e..7c1a3f8b3f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3324,6 +3324,13 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid 
dbOid)
  */
 pid_t
 CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
+{
+   return SignalVirtualTransaction(vxid, sigmode, true);
+}
+
+pid_t
+SignalVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode,
+bool conflictPending)
 {
ProcArrayStruct *arrayP = procArray;
int index;
@@ -3342,7 +3349,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, 
ProcSignalReason sigmode)
if (procvxid.backendId == vxid.backendId &&
procvxid.localTransactionId == vxid.localTransactionId)
{
-   proc->recoveryConflictPending = true;
+   proc->recoveryConflictPending = conflictPending;
pid = proc->pid;
if (pid != 0)
{
diff --git a/src/backend/storage/ipc/standby.c 
b/src/backend/storage/ipc/standby.c
index 4ea3cf1f5c..4b7762644e 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,10 +42,15 @@ int max_standby_streaming_delay = 30 * 1000;
 
 static HTAB *RecoveryLockLists;
 
+/* Is a deadlock check pending? */
+static volatile sig_atomic_t got_standby_deadlock_timeout;
+
 static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId 
*waitlist,

   ProcSignalReason reason,

   uint32 wait_event_info,

   bool report_waiting);
+static void SendRecoveryConflictWithLock(ProcSignalReason reason,
+   
 LOCKTAG locktag);
 static void SendRecoveryConflictWithBufferPin(ProcSignalReason reason);
 static XLogRecPtr LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
 static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
@@ -395,8 +400,10 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
  * lock.  As we are already queued to be granted the lock, no new lock
  * requests conflicting with ours will be granted in the meantime.
  *
- * Deadlocks involving the Startup process and an ordinary backend process
- * will be detected by the deadlock detector within the ordinary backend.
+ * We also must check for deadlocks involving the Startup process and
+ * hot-standby backend processes. If deadlock_timeout is reached in
+ * this function, the backends are requested to check them

Re: pg_upgrade test for binary compatibility of core data types

2020-12-16 Thread Justin Pryzby
On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
> I meant to notice if the binary format is accidentally changed again, which 
> was
> what happened here:
> 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.
> 
> I added a table to the regression tests so it's processed by pg_upgrade tests,
> run like:
> 
> | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 
> oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin

Per cfbot, this avoids testing ::xml (support for which may not be enabled)
And also now tests oid types.

I think the per-version hacks should be grouped by logical change, rather than
by version.  Which I've started doing here.

-- 
Justin
>From 6a5bcdf6b3c9244e164455792cec612e317cb8d3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 5 Dec 2020 22:31:19 -0600
Subject: [PATCH v2 1/3] WIP: pg_upgrade/test.sh: changes needed to allow
 testing upgrade from v11

---
 src/bin/pg_upgrade/test.sh | 92 ++
 1 file changed, 84 insertions(+), 8 deletions(-)

diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index 04aa7fd9f5..9733217535 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -23,7 +23,7 @@ standard_initdb() {
 	# To increase coverage of non-standard segment size and group access
 	# without increasing test runtime, run these tests with a custom setting.
 	# Also, specify "-A trust" explicitly to suppress initdb's warning.
-	"$1" -N --wal-segsize 1 -g -A trust
+	"$1" -N -A trust
 	if [ -n "$TEMP_CONFIG" -a -r "$TEMP_CONFIG" ]
 	then
 		cat "$TEMP_CONFIG" >> "$PGDATA/postgresql.conf"
@@ -108,6 +108,9 @@ export EXTRA_REGRESS_OPTS
 mkdir "$outputdir"
 mkdir "$outputdir"/testtablespace
 
+mkdir "$outputdir"/sql
+mkdir "$outputdir"/expected
+
 logdir=`pwd`/log
 rm -rf "$logdir"
 mkdir "$logdir"
@@ -172,16 +175,83 @@ if "$MAKE" -C "$oldsrc" installcheck-parallel; then
 		fix_sql=""
 		case $oldpgversion in
 			804??)
-fix_sql="DROP FUNCTION public.myfunc(integer); DROP FUNCTION public.oldstyle_length(integer, text);"
+fix_sql="$fix_sql DROP FUNCTION public.myfunc(integer);"
 ;;
-			*)
-fix_sql="DROP FUNCTION public.oldstyle_length(integer, text);"
+		esac
+
+		# Removed in v10 commit 5ded4bd21
+		case $oldpgversion in
+			804??|9)
+fix_sql="$fix_sql DROP FUNCTION public.oldstyle_length(integer, text);"
+;;
+		esac
+
+		# commit 068503c76511cdb0080bab689662a20e86b9c845
+		case $oldpgversion in
+			10) # XXX
+fix_sql="$fix_sql DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;"
+;;
+		esac
+
+		# commit db3af9feb19f39827e916145f88fa5eca3130cb2
+		case $oldpgversion in
+			10) # XXX
+fix_sql="$fix_sql DROP FUNCTION boxarea(box);"
+fix_sql="$fix_sql DROP FUNCTION funny_dup17();"
 ;;
 		esac
+
+		# commit cda6a8d01d391eab45c4b3e0043a1b2b31072f5f
+		case $oldpgversion in
+			10) # XXX
+fix_sql="$fix_sql DROP TABLE abstime_tbl;"
+fix_sql="$fix_sql DROP TABLE reltime_tbl;"
+fix_sql="$fix_sql DROP TABLE tinterval_tbl;"
+;;
+		esac
+
+		# Various things removed for v14
+		case $oldpgversion in
+			804??|9|10|11|12|13)
+# commit 76f412ab3
+# This one is only needed for v11+ ??
+# (see below for more operators removed that also apply to older versions)
+fix_sql="$fix_sql DROP OPERATOR public.!=- (pg_catalog.int8, NONE);"
+;;
+		esac
+		case $oldpgversion in
+			804??|9|10|11|12|13)
+# commit 76f412ab3
+fix_sql="$fix_sql DROP OPERATOR public.#@# (pg_catalog.int8, NONE);"
+fix_sql="$fix_sql DROP OPERATOR public.#%# (pg_catalog.int8, NONE);"
+fix_sql="$fix_sql DROP OPERATOR public.#@%# (pg_catalog.int8, NONE);"
+
+# commit 9e38c2bb5 and 97f73a978
+# fix_sql="$fix_sql DROP AGGREGATE array_larger_accum(anyarray);"
+fix_sql="$fix_sql DROP AGGREGATE array_cat_accum(anyarray);"
+fix_sql="$fix_sql DROP AGGREGATE first_el_agg_any(anyelement);"
+
+# commit 76f412ab3
+#fix_sql="$fix_sql DROP OPERATOR @#@(bigint,NONE);"
+fix_sql="$fix_sql DROP OPERATOR @#@(NONE,bigint);"
+;;
+		esac
+
+		# commit 578b22971: OIDS removed in v12
+		case $oldpgversion in
+			804??|9|10|11)
+fix_sql="$fix_sql ALTER TABLE public.tenk1 SET WITHOUT OIDS;"
+fix_sql="$fix_sql ALTER TABLE public.tenk1 SET WITHOUT OIDS;"
+#fix_sql="$fix_sql ALTER TABLE public.stud_emp SET WITHOUT OIDS;" # inherited
+fix_sql="$fix_sql ALTER TABLE public.emp SET WITHOUT OIDS;"
+fix_sql="$fix_sql ALTER TABLE public.tt7 SET WITHOUT OIDS;"
+;;
+		esac
+
 		psql -X -d regression -c "$fix_sql;" || psql_fix_sql_status=$?
 	fi
 
-	pg_dumpall --no-sync -f "$temp_root"/dump1.sql || pg_dumpall1_status=$?
+	pg_dumpall --extra-float-digits=0 --no-sync -f "$temp_root"/dump1.sql || pg_dumpall1_status=$?
 
 	if [ "$newsrc" != "$oldsrc" ]; then
 		# update references to old source tree's regress.so etc
@@ -227,23 +297,29 @@ pg_upgrade $PG_UPGRAD

Re: Proposed patch for key managment

2020-12-16 Thread Alastair Turner
Hi Bruce

On Wed, 16 Dec 2020 at 00:12, Bruce Momjian  wrote:
>
...
>
> The second approach is to make a new API for what you want

I am trying to motivate for an alternate API. Specifically, an API
which allows any potential adopter of Postgres and Cluster File
Encryption to adopt them without having to accept any particular
approach to key management, key derivation, wrapping, validation, etc.
A passphrase key-wrapper with validation will probably be very useful
to a lot of people, but making it mandatory and requiring twists and
turns to integrate with already-established security infrastructure
sounds like a barrier to adoption.

> ...It would be
> similar to the cluster_passphrase_command, but it would be simple in
> some ways, but complex in others since we need at least two DEKs, and
> key rotation might be very risky since there isn't validation in the
> server.
>

I guess that the risk you're talking about here is encryption with a
bogus key and bricking the data? In a world where the wrapped keys are
opaque to the application, a key would be validated through a
roundtrip: request the unwrapping of the key, encrypt a known string,
request the unwrapping again, decrypt the known string, compare. If
any of those steps fail, don't use the wrapped key provided.
Validating that the stored keys have not been fiddled/damaged is the
KMS/HSM's responsibility.

>
>... I don't think this can be accomplished by a contrib module, but
> would actually be easy to implement, but perhaps hard to document
> because the user API might be tricky.
>

If integration through a pipeline isn't good enough (it would be a
pain for the passphrase, with multiple prompts), what else do you see
an API having to provide?

>
> I think my big question is about your sentence, "A key feature of these
> key management approaches is that the application handling the
> encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
> after unwrapping it."  It is less secure for the HSM to return a KEK
> rather than a DEK?  I can't see why it would be.  The application never
> sees the KEK used to wrap the DEK it has stored in the file system,
> though that DEK is actually used as a passphrase by Postgres.  This is
> the same with the Yubikey --- Postgres never sees the private key used
> to unlock what it locked with the Yubikey public key.
>

The protocols and practices are designed for a lot of DEKs and small
number of KEK's. So the compromise of a KEK would, potentially, lead
to compromise of thousands of DEKs. In this particular case with 2
DEKs wrapped by one KEK, it doesn't sound like much of a difference, I
agree. From an acceptance and adoption point of view, I'm just
inclined to use the security ecosystem in an established and
understood way.

Regards
Alastair




Re: Proposed patch for key managment

2020-12-16 Thread Bruce Momjian
On Wed, Dec 16, 2020 at 06:07:26PM +, Alastair Turner wrote:
> Hi Bruce
> 
> On Wed, 16 Dec 2020 at 00:12, Bruce Momjian  wrote:
> >
> ...
> >
> > The second approach is to make a new API for what you want
> 
> I am trying to motivate for an alternate API. Specifically, an API
> which allows any potential adopter of Postgres and Cluster File
> Encryption to adopt them without having to accept any particular
> approach to key management, key derivation, wrapping, validation, etc.
> A passphrase key-wrapper with validation will probably be very useful
> to a lot of people, but making it mandatory and requiring twists and
> turns to integrate with already-established security infrastructure
> sounds like a barrier to adoption.

Attached is a script that uses the AWS Secrets Manager, and it does key
rotation with the new pg_altercpass tool too, just like all the other
methods.

I am not inclined to add any more APIs unless there is clear value for
it, and I am not seeing that yet.

> > ...It would be
> > similar to the cluster_passphrase_command, but it would be simple in
> > some ways, but complex in others since we need at least two DEKs, and
> > key rotation might be very risky since there isn't validation in the
> > server.
> >
> 
> I guess that the risk you're talking about here is encryption with a
> bogus key and bricking the data? In a world where the wrapped keys are
> opaque to the application, a key would be validated through a
> roundtrip: request the unwrapping of the key, encrypt a known string,
> request the unwrapping again, decrypt the known string, compare. If
> any of those steps fail, don't use the wrapped key provided.
> Validating that the stored keys have not been fiddled/damaged is the
> KMS/HSM's responsibility.

OK, but we already have validation.

> >... I don't think this can be accomplished by a contrib module, but
> > would actually be easy to implement, but perhaps hard to document
> > because the user API might be tricky.
> >
> 
> If integration through a pipeline isn't good enough (it would be a
> pain for the passphrase, with multiple prompts), what else do you see
> an API having to provide?

The big problem is that at bootstrap time you have to call the command
at a specific time, and I don't see how that could be done via contrib.
Also, I am trying to see value in offering another API.  We don't need
to serve every need.

> > I think my big question is about your sentence, "A key feature of these
> > key management approaches is that the application handling the
> > encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
> > after unwrapping it."  It is less secure for the HSM to return a KEK
> > rather than a DEK?  I can't see why it would be.  The application never
> > sees the KEK used to wrap the DEK it has stored in the file system,
> > though that DEK is actually used as a passphrase by Postgres.  This is
> > the same with the Yubikey --- Postgres never sees the private key used
> > to unlock what it locked with the Yubikey public key.
> 
> The protocols and practices are designed for a lot of DEKs and small
> number of KEK's. So the compromise of a KEK would, potentially, lead
> to compromise of thousands of DEKs. In this particular case with 2
> DEKs wrapped by one KEK, it doesn't sound like much of a difference, I
> agree. From an acceptance and adoption point of view, I'm just
> inclined to use the security ecosystem in an established and
> understood way.

Right, if there were many DEKs I can see your point.  Using many DEKs in
a database is a big problem since so many parts are interrelated.  We
looked at per-user or per-tablespace DEKs, but found it unworkable.  We
use two DEKs so we can failover to a standby for DEK rotation purposes.

I think for your purposes, your KMS DEK ends up being a KEK for
Postgres.  I am guessing most applications don't have the validation and
key rotation needs Postgres has, so a DEK is fine, but for Postgres,
because we are supporing already four different authentication methods
via a single command, we have those features already, so getting a DEK
from a KMS that we treat as a KEK seems natural to me.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



pass_aws.sh
Description: Bourne shell script


Re: On login trigger: take three

2020-12-16 Thread Pavel Stehule
Attached please find new versoin of the patch based on
on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch

> So there is still only  "disable_client_connection_trigger" GUC? because
> we need possibility to disable client connect triggers and there is no such
> need for other event types.
>
> As you suggested  have added "dathaslogontriggers" flag which is set when
> client connection trigger is created.
> This flag is never cleaned (to avoid visibility issues mentioned in my
> previous mail).
>

This is much better - I don't see any slowdown when logon trigger is not
defined

I did some benchmarks and looks so starting language handler is relatively
expensive - it is about 25% and starting handler like event trigger has
about 35%. But this problem can be solved later and elsewhere.

I prefer the inverse form of disable_connection_trigger. Almost all GUC are
in positive form - so enable_connection_triggger is better

I don't think so current handling dathaslogontriggers is good for
production usage. The protection against race condition can be solved by
lock on pg_event_trigger

Regards

Pavel



>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


\gsetenv

2020-12-16 Thread David Fetter
Hi,

We have \gset to set some parameters, but not ones in the environment,
so I fixed this with a new analogous command, \gsetenv. I considered
refactoring SetVariable to include environment variables, but for a
first cut, I just made a separate function and an extra if.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 04059e68ffcd8cf4052ccb6a013f0cf2e0095eb8 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 16 Dec 2020 13:17:32 -0800
Subject: [PATCH v1] Implement \gsetenv, similar to \gset
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.28.0"

This is a multi-part message in MIME format.
--2.28.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


In passing, make \setenv without arguments dump the environment in a way
similar to what bare \set and \pset do.

diff --git doc/src/sgml/ref/psql-ref.sgml doc/src/sgml/ref/psql-ref.sgml
index 221a967bfe..97d5a2ed19 100644
--- doc/src/sgml/ref/psql-ref.sgml
+++ doc/src/sgml/ref/psql-ref.sgml
@@ -2297,6 +2297,49 @@ hello 10
 
   
 
+  
+\gsetenv [ prefix ]
+
+
+
+ Sends the current query buffer to the server and stores the
+ query's output into environment variables.
+ The query to be executed must return exactly one row.  Each column of
+ the row is stored into a separate variable, named the same as the
+ column.  For example:
+
+=> SELECT 'hello' AS var1, 10 AS var2
+-> \gsetenv
+=> \! echo $var1 $var2
+hello 10
+
+
+
+ If you specify a prefix,
+ that string is prepended to the query's column names to create the
+ variable names to use:
+
+=> SELECT 'hello' AS var1, 10 AS var2
+-> \gsetenv result_
+=> \! echo $result_var1 $result_var2
+hello 10
+
+
+
+ If a column result is NULL, the corresponding environment variable is unset
+ rather than being set.
+
+
+ If the query fails or does not return one row,
+ no variables are changed.
+
+
+ If the current query buffer is empty, the most recently sent query
+ is re-executed instead.
+
+
+  
+
 
   
 \gx [ (option=value [...]) ] [ filename ]
diff --git src/bin/psql/command.c src/bin/psql/command.c
index 38b52d..8b8749aca6 100644
--- src/bin/psql/command.c
+++ src/bin/psql/command.c
@@ -94,7 +94,9 @@ static backslashResult process_command_g_options(char *first_option,
  const char *cmd);
 static backslashResult exec_command_gdesc(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_gexec(PsqlScanState scan_state, bool active_branch);
-static backslashResult exec_command_gset(PsqlScanState scan_state, bool active_branch);
+static backslashResult exec_command_gset(PsqlScanState scan_state,
+		 bool active_branch,
+		 GSET_TARGET gset_target);
 static backslashResult exec_command_help(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_html(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_include(PsqlScanState scan_state, bool active_branch,
@@ -346,7 +348,9 @@ exec_command(const char *cmd,
 	else if (strcmp(cmd, "gexec") == 0)
 		status = exec_command_gexec(scan_state, active_branch);
 	else if (strcmp(cmd, "gset") == 0)
-		status = exec_command_gset(scan_state, active_branch);
+		status = exec_command_gset(scan_state, active_branch, GSET_TARGET_PSET);
+	else if (strcmp(cmd, "gsetenv") == 0)
+		status = exec_command_gset(scan_state, active_branch, GSET_TARGET_ENV);
 	else if (strcmp(cmd, "h") == 0 || strcmp(cmd, "help") == 0)
 		status = exec_command_help(scan_state, active_branch);
 	else if (strcmp(cmd, "H") == 0 || strcmp(cmd, "html") == 0)
@@ -1451,10 +1455,11 @@ exec_command_gexec(PsqlScanState scan_state, bool active_branch)
 }
 
 /*
- * \gset [prefix] -- send query and store result into variables
+ * \gset[env] [prefix] -- send query and store result into variables
  */
 static backslashResult
-exec_command_gset(PsqlScanState scan_state, bool active_branch)
+exec_command_gset(PsqlScanState scan_state, bool active_branch,
+  GSET_TARGET gset_target)
 {
 	backslashResult status = PSQL_CMD_SKIP_LINE;
 
@@ -1463,6 +1468,8 @@ exec_command_gset(PsqlScanState scan_state, bool active_branch)
 		char	   *prefix = psql_scan_slash_option(scan_state,
 	OT_NORMAL, NULL, false);
 
+		pset.gset_target = gset_target;
+
 		if (prefix)
 			pset.gset_prefix = prefix;
 		else
@@ -2275,39 +2282,8 @@ exec_command_setenv(PsqlScanState scan_state, bool active_branch,
 	OT_NORMAL, NULL, false);
 		char	   *envval = psql_scan_slash_option(scan_state,
 	OT_NORMAL, NULL, false);
+		success = SetEnvVariable(cmd, envvar, env

Re: Proposed patch for key managment

2020-12-16 Thread Stephen Frost
Greetings,

* Alastair Turner (min...@decodable.me) wrote:
> On Wed, 16 Dec 2020 at 00:12, Bruce Momjian  wrote:
> > The second approach is to make a new API for what you want
> 
> I am trying to motivate for an alternate API. Specifically, an API
> which allows any potential adopter of Postgres and Cluster File
> Encryption to adopt them without having to accept any particular
> approach to key management, key derivation, wrapping, validation, etc.
> A passphrase key-wrapper with validation will probably be very useful
> to a lot of people, but making it mandatory and requiring twists and
> turns to integrate with already-established security infrastructure
> sounds like a barrier to adoption.

Yeah, I mentioned this in one of the other threads where we were
discussing KEKs and DEKs and such.  Forcing one solution for working
with the KEK/DEK isn't really ideal.

That said, maybe there's an option to have the user indicate to PG if
they're passing in a passphrase or a DEK, but we otherwise more-or-less
keep things as they are where the DEK that the user provides actually
goes to encrypting the sub-keys that we use for tables vs. WAL..?  That
avoids the complication of having to have an API that somehow provides
more than one key, while also using the primary DEK key as-is from the
key management service and the KEK never being seen on the system where
PG is running.

> > ...It would be
> > similar to the cluster_passphrase_command, but it would be simple in
> > some ways, but complex in others since we need at least two DEKs, and
> > key rotation might be very risky since there isn't validation in the
> > server.
> 
> I guess that the risk you're talking about here is encryption with a
> bogus key and bricking the data? In a world where the wrapped keys are
> opaque to the application, a key would be validated through a
> roundtrip: request the unwrapping of the key, encrypt a known string,
> request the unwrapping again, decrypt the known string, compare. If
> any of those steps fail, don't use the wrapped key provided.
> Validating that the stored keys have not been fiddled/damaged is the
> KMS/HSM's responsibility.

I'm not really sure what the validation concern being raised here is,
but I can understand Bruce's point about having to set up an API that
allows us to ask for two distinct DEK's- but I'd think that keeping the
two keys we have today as sub-keys addresses that as I outline above.

> >... I don't think this can be accomplished by a contrib module, but
> > would actually be easy to implement, but perhaps hard to document
> > because the user API might be tricky.
> 
> If integration through a pipeline isn't good enough (it would be a
> pain for the passphrase, with multiple prompts), what else do you see
> an API having to provide?

I certainly agree that it'd be good to have a way to get an encrypted
cluster set up and running which doesn't involve actual prompts.

> > I think my big question is about your sentence, "A key feature of these
> > key management approaches is that the application handling the
> > encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
> > after unwrapping it."  It is less secure for the HSM to return a KEK
> > rather than a DEK?  I can't see why it would be.  The application never
> > sees the KEK used to wrap the DEK it has stored in the file system,
> > though that DEK is actually used as a passphrase by Postgres.  This is
> > the same with the Yubikey --- Postgres never sees the private key used
> > to unlock what it locked with the Yubikey public key.
> 
> The protocols and practices are designed for a lot of DEKs and small
> number of KEK's. So the compromise of a KEK would, potentially, lead
> to compromise of thousands of DEKs. In this particular case with 2
> DEKs wrapped by one KEK, it doesn't sound like much of a difference, I
> agree. From an acceptance and adoption point of view, I'm just
> inclined to use the security ecosystem in an established and
> understood way.

I'm not quite following this- I agree that we don't want PG, or anything
really, to see the private key that's on the Yubikey, as that wouldn't
be good- instead, we should be able to ask for the Yubikey to decrypt
the DEK for us and then use that, which I had thought was happening but
it's not entirely clear from the above discussion (and, admittedly, I've
not tried using the patch with my yubikey yet, but I do plan to...).

Are we sure we're talking about the same thing here..?  That's really
the question I'm asking myself.

There's an entirely independent discussion to be had about figuring out
a way to actually off-load *entirely* the encryption/decryption of data
to the linux crypto system or hardware devices, but unless someone can
step up and write code for those today, I'd suggest that we table that
effort until after we get this initial capability of having TDE with PG
doing all of the encryption/decryption.

Thanks,

Stephen


signature.asc
Description: PGP sig

range_agg

2020-12-16 Thread Zhihong Yu
Hi,
w.r.t. patch v27.

+* The idea is to prepend underscores as needed until we make a name
that
+* doesn't collide with anything ...

I wonder if other characters (e.g. [a-z0-9]) can be used so that name
without collision can be found without calling truncate_identifier().

+   else if (strcmp(defel->defname, "multirange_type_name") == 0)
+   {
+   if (multirangeTypeName != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("conflicting or redundant options")));

Maybe make the error message a bit different from occurrences of similar
error message (such as including multirangeTypeName) ?

Thanks


Re: range_agg

2020-12-16 Thread Alexander Korotkov
On Thu, Dec 17, 2020 at 12:54 AM Zhihong Yu  wrote:
> +* The idea is to prepend underscores as needed until we make a name that
> +* doesn't collide with anything ...
>
> I wonder if other characters (e.g. [a-z0-9]) can be used so that name without 
> collision can be found without calling truncate_identifier().

Probably.  But multiranges just shares naming logic already existing
in arrays.  If we're going to change this, I think we should change
this for arrays too.  And this change shouldn't be part of multirange
patch.

> +   else if (strcmp(defel->defname, "multirange_type_name") == 0)
> +   {
> +   if (multirangeTypeName != NULL)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("conflicting or redundant options")));
>
> Maybe make the error message a bit different from occurrences of similar 
> error message (such as including multirangeTypeName) ?

This is again isn't an invention of multirange.  We use this message
many times in DefineRange() and other places.  From the first glance,
I've nothing against changing this to a more informative message, but
that should be done globally.  And this change isn't directly related
to multirage.  Feel free to propose a patch improving this.

--
Regards,
Alexander Korotkov




Re: Proposed patch for key managment

2020-12-16 Thread Bruce Momjian
On Wed, Dec 16, 2020 at 10:23:23AM +0900, Michael Paquier wrote:
> On Tue, Dec 15, 2020 at 04:02:12PM -0500, Bruce Momjian wrote:
> > I thought this was going to be a huge job, but once I looked at it, it
> > was clear exactly what you were saying.  Comparing cryptohash.c and
> > cryptohash_openssl.c I saw exactly what you wanted, and I think I have
> > completed it in the attached patch.  cryptohash.c implemented the hash
> > in C code if OpenSSL is not present --- I assume you didn't want me to
> > do that, but rather to split the API so it was easy to add another
> > implementation.
> 
> Hmm.  I don't think that this is right.  First, this still mixes
> ciphers and HMAC.

I looked at the code.  The HMAC function body is just one function call
to OpenSSL.  Do you want it moved to cryptohash.c and
cryptohash_openssl.c?  To a new C file?

> Second, it is only possible here to use HMAC with
> SHA512 but we want to be able to use SHA256 as well with SCRAM (per se
> the scram_HMAC_* business in scram-common.c).  Third, this lacks a

I looked at the SCRAM HMAC code.  It looks complex, but I assume ideally
that would be moved into a separate C file and used by cluster file
encryption and SCRAM, right?  I need help to do that because I am
unclear how much of the SCRAM hash code is specific to SCRAM.

> fallback implementation.  Finally, pgcrypto is not touched, but we

I have a fallback implemention --- it fails?  ;-)  Did you want me to
include an AES implementation?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposed patch for key managment

2020-12-16 Thread Bruce Momjian
On Wed, Dec 16, 2020 at 04:32:00PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Alastair Turner (min...@decodable.me) wrote:
> > On Wed, 16 Dec 2020 at 00:12, Bruce Momjian  wrote:
> > > The second approach is to make a new API for what you want
> > 
> > I am trying to motivate for an alternate API. Specifically, an API
> > which allows any potential adopter of Postgres and Cluster File
> > Encryption to adopt them without having to accept any particular
> > approach to key management, key derivation, wrapping, validation, etc.
> > A passphrase key-wrapper with validation will probably be very useful
> > to a lot of people, but making it mandatory and requiring twists and
> > turns to integrate with already-established security infrastructure
> > sounds like a barrier to adoption.
> 
> Yeah, I mentioned this in one of the other threads where we were
> discussing KEKs and DEKs and such.  Forcing one solution for working
> with the KEK/DEK isn't really ideal.
> 
> That said, maybe there's an option to have the user indicate to PG if
> they're passing in a passphrase or a DEK, but we otherwise more-or-less
> keep things as they are where the DEK that the user provides actually
> goes to encrypting the sub-keys that we use for tables vs. WAL..?  That

Yes, that is what I suggested in an earlier email.

> avoids the complication of having to have an API that somehow provides
> more than one key, while also using the primary DEK key as-is from the
> key management service and the KEK never being seen on the system where
> PG is running.

How is that diffeent from what we have now?  Did you read my reply today
to Alastair with the AWS script?

> > > ...It would be
> > > similar to the cluster_passphrase_command, but it would be simple in
> > > some ways, but complex in others since we need at least two DEKs, and
> > > key rotation might be very risky since there isn't validation in the
> > > server.
> > 
> > I guess that the risk you're talking about here is encryption with a
> > bogus key and bricking the data? In a world where the wrapped keys are
> > opaque to the application, a key would be validated through a
> > roundtrip: request the unwrapping of the key, encrypt a known string,
> > request the unwrapping again, decrypt the known string, compare. If
> > any of those steps fail, don't use the wrapped key provided.
> > Validating that the stored keys have not been fiddled/damaged is the
> > KMS/HSM's responsibility.
> 
> I'm not really sure what the validation concern being raised here is,
> but I can understand Bruce's point about having to set up an API that
> allows us to ask for two distinct DEK's- but I'd think that keeping the
> two keys we have today as sub-keys addresses that as I outline above.

Right, how is a passphrase different than subkeys?

> > >... I don't think this can be accomplished by a contrib module, but
> > > would actually be easy to implement, but perhaps hard to document
> > > because the user API might be tricky.
> > 
> > If integration through a pipeline isn't good enough (it would be a
> > pain for the passphrase, with multiple prompts), what else do you see
> > an API having to provide?
> 
> I certainly agree that it'd be good to have a way to get an encrypted
> cluster set up and running which doesn't involve actual prompts.

Uh, two of my four scripts do not require prompts.

> > > I think my big question is about your sentence, "A key feature of these
> > > key management approaches is that the application handling the
> > > encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
> > > after unwrapping it."  It is less secure for the HSM to return a KEK
> > > rather than a DEK?  I can't see why it would be.  The application never
> > > sees the KEK used to wrap the DEK it has stored in the file system,
> > > though that DEK is actually used as a passphrase by Postgres.  This is
> > > the same with the Yubikey --- Postgres never sees the private key used
> > > to unlock what it locked with the Yubikey public key.
> > 
> > The protocols and practices are designed for a lot of DEKs and small
> > number of KEK's. So the compromise of a KEK would, potentially, lead
> > to compromise of thousands of DEKs. In this particular case with 2
> > DEKs wrapped by one KEK, it doesn't sound like much of a difference, I
> > agree. From an acceptance and adoption point of view, I'm just
> > inclined to use the security ecosystem in an established and
> > understood way.
> 
> I'm not quite following this- I agree that we don't want PG, or anything
> really, to see the private key that's on the Yubikey, as that wouldn't
> be good- instead, we should be able to ask for the Yubikey to decrypt
> the DEK for us and then use that, which I had thought was happening but
> it's not entirely clear from the above discussion (and, admittedly, I've
> not tried using the patch with my yubikey yet, but I do plan to...).

What it does is to encrypt the passphrase on boot, store it on disk,

Re: Proposed patch for key managment

2020-12-16 Thread Bruce Momjian
On Wed, Dec 16, 2020 at 01:42:57PM -0500, Bruce Momjian wrote:
> On Wed, Dec 16, 2020 at 06:07:26PM +, Alastair Turner wrote:
> > Hi Bruce
> > 
> > On Wed, 16 Dec 2020 at 00:12, Bruce Momjian  wrote:
> > >
> > ...
> > >
> > > The second approach is to make a new API for what you want
> > 
> > I am trying to motivate for an alternate API. Specifically, an API
> > which allows any potential adopter of Postgres and Cluster File
> > Encryption to adopt them without having to accept any particular
> > approach to key management, key derivation, wrapping, validation, etc.
> > A passphrase key-wrapper with validation will probably be very useful
> > to a lot of people, but making it mandatory and requiring twists and
> > turns to integrate with already-established security infrastructure
> > sounds like a barrier to adoption.
> 
> Attached is a script that uses the AWS Secrets Manager, and it does key
> rotation with the new pg_altercpass tool too, just like all the other
> methods.

Attached is an improved script that does not pass the secret on the
command line.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



pass_aws.sh
Description: Bourne shell script


Re: Proposed patch for key managment

2020-12-16 Thread Alastair Turner
On Wed, 16 Dec 2020 at 21:32, Stephen Frost  wrote:
>
> Greetings,
>
> * Alastair Turner (min...@decodable.me) wrote:
> > On Wed, 16 Dec 2020 at 00:12, Bruce Momjian  wrote:
> > > The second approach is to make a new API for what you want
> >
> > I am trying to motivate for an alternate API. Specifically, an API
> > which allows any potential adopter of Postgres and Cluster File
> > Encryption to adopt them without having to accept any particular
> > approach to key management, key derivation, wrapping, validation, etc.
> > A passphrase key-wrapper with validation will probably be very useful
> > to a lot of people, but making it mandatory and requiring twists and
> > turns to integrate with already-established security infrastructure
> > sounds like a barrier to adoption.
>
> Yeah, I mentioned this in one of the other threads where we were
> discussing KEKs and DEKs and such.  Forcing one solution for working
> with the KEK/DEK isn't really ideal.
>
> That said, maybe there's an option to have the user indicate to PG if
> they're passing in a passphrase or a DEK, ...

Some options very much like the current cluster_passphrase_command,
but that takes an input sounds to me like it would meet that
requirement. The challenge sent to the command could be just a text
prompt, or it could be the wrapped DEK. The selection of the command
would indicate what the user was expected to pass. The return would be
a DEK.

> ...but we otherwise more-or-less
> keep things as they are where the DEK that the user provides actually
> goes to encrypting the sub-keys that we use for tables vs. WAL..?  ...

The idea of subkeys sounds great, if it can merge the two potential
interactions into one and not complicate rotating the two parts
separately. But having Postgres encrypting them, leaves us open to
many of the same issues. That still feels like managing the keys, so a
corporate who have strong opinions on how that should be done will
still ask all sorts of pointy questions, complicating adoption.

> ...That
> avoids the complication of having to have an API that somehow provides
> more than one key, while also using the primary DEK key as-is from the
> key management service and the KEK never being seen on the system where
> PG is running.
>

Other than calling out (and therefore potentially prompting) twice,
what do you see as the complications of having more than one key?

> > > ...It would be
> > > similar to the cluster_passphrase_command, but it would be simple in
> > > some ways, but complex in others since we need at least two DEKs, and
> > > key rotation might be very risky since there isn't validation in the
> > > server.
> >
...
>
> I'm not quite following this- I agree that we don't want PG, or anything
> really, to see the private key that's on the Yubikey, as that wouldn't
> be good- instead, we should be able to ask for the Yubikey to decrypt
> the DEK for us and then use that, which I had thought was happening but
> it's not entirely clear from the above discussion (and, admittedly, I've
> not tried using the patch with my yubikey yet, but I do plan to...).
>
> Are we sure we're talking about the same thing here..?  That's really
> the question I'm asking myself.
>

I'd describe what the current patch does as using YubiKey to encrypt
and then decrypt an intermediate secret, which is then used to
generate/derive a KEK, which is then used to unwrap the stored,
wrapped DEK.

> There's an entirely independent discussion to be had about figuring out
> a way to actually off-load *entirely* the encryption/decryption of data
> to the linux crypto system or hardware devices, but unless someone can
> step up and write code for those today, I'd suggest that we table that
> effort until after we get this initial capability of having TDE with PG
> doing all of the encryption/decryption.

I'm hopeful that the work on abstracting OpenSSL, nsstls, etc is going
to help in this direct.

Regards

Alastair




Re: Add docs stub for recovery.conf

2020-12-16 Thread Bruce Momjian
On Fri, Dec  4, 2020 at 02:00:23PM -0500, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > Yes, that is pretty much the same thing I was suggesting, except that
> > each rename has its own _original_ URL link, which I think is also
> > acceptable.  My desire is for these items to all exist in one place, and
> > an appendix of them seems fine.
> 
> Alright, so, to try and move this forward I'll list out (again) the
> renames that we have in pgweb:
> 
> catalog-pg-replication-slots.html <-> view-pg-replication-slots.html
> pgxlogdump.html <-> pgwaldump.html
> app-pgresetxlog.html <-> app-pgresetwal.html
> app-pgreceivexlog.html <-> app-pgreceivewal.html
> 
> (excluding the 'legal notice' one)
> 
> Bruce, are you saying that we need to take Craig's patch and then add to
> it entries for all of the above, effectively removing the need for the
> web page aliases and redirects?  If that was done, would that be

Yes, I think putting the compatibility section headings in our main
documentation flow will make it too hard to read and cause unnecessary
complexity, but if we have a separate section for them, adding the
section headings seems fine.  This way, we don't have to add a redirect
every time we add a new entry.

> sufficient to get this committed?  Are there other things that people
> can think of off-hand that we should include, I think Craig might have
> mentioned something else earlier on..?  I don't think we should require
> that someone troll through everything that ever existed, just to be
> clear, as we can always add to this later if other things come up.  If
> that's the expectation though, then someone needs to say so, in which
> case I'll assume it's status quo unless/until someone steps up to do
> that.

Agreed.  I just wanted something that could scale going forward, and be
easily identified as compatibility, so maybe one day we can remove them.
However, if they are in a separate section, we might never do that.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: \gsetenv

2020-12-16 Thread Tom Lane
David Fetter  writes:
> We have \gset to set some parameters, but not ones in the environment,
> so I fixed this with a new analogous command, \gsetenv.

In view of the security complaints we just had about \gset
(CVE-2020-25696), I cannot fathom why we'd consider adding another
way to cause similar problems.

We were fortunate enough to be able to close off the main security risk
of \gset without deleting the feature altogether ... but how exactly
would we distinguish "safe" from "unsafe" environment variables?  It kind
of seems like anything that would be worth setting at all would tend to
fall into the "unsafe" category, because the implications of setting it
would be unclear.  But *for certain* we're not taking a patch that allows
remotely setting PATH and things like that.

Besides which, you haven't bothered with even one word of positive
justification.  What's the non-hazardous use case?

regards, tom lane




Re: A few new options for CHECKPOINT

2020-12-16 Thread Bruce Momjian
On Mon, Dec  7, 2020 at 11:22:01AM +0900, Michael Paquier wrote:
> On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:
> > * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> >> You keep making this statement, and I don't necessarily disagree, but if
> >> that is the case, please explain why don't we have
> >> checkpoint_completion_target set to 0.9 by default?  Should we change
> >> that?
> > 
> > Yes, I do think we should change that..
> 
> Agreed.  FWIW, no idea for others, but it is one of those parameters I
> keep telling to update after a default installation.

+1 for making it 0.9.   FYI< Robert Haas has argued that our estimation
of completion isn't great, which is why it defaults to 0.5.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposed patch for key managment

2020-12-16 Thread Stephen Frost
Greetings,

* Alastair Turner (min...@decodable.me) wrote:
> On Wed, 16 Dec 2020 at 21:32, Stephen Frost  wrote:
> > * Alastair Turner (min...@decodable.me) wrote:
> > > On Wed, 16 Dec 2020 at 00:12, Bruce Momjian  wrote:
> > > > The second approach is to make a new API for what you want
> > >
> > > I am trying to motivate for an alternate API. Specifically, an API
> > > which allows any potential adopter of Postgres and Cluster File
> > > Encryption to adopt them without having to accept any particular
> > > approach to key management, key derivation, wrapping, validation, etc.
> > > A passphrase key-wrapper with validation will probably be very useful
> > > to a lot of people, but making it mandatory and requiring twists and
> > > turns to integrate with already-established security infrastructure
> > > sounds like a barrier to adoption.
> >
> > Yeah, I mentioned this in one of the other threads where we were
> > discussing KEKs and DEKs and such.  Forcing one solution for working
> > with the KEK/DEK isn't really ideal.
> >
> > That said, maybe there's an option to have the user indicate to PG if
> > they're passing in a passphrase or a DEK, ...
> 
> Some options very much like the current cluster_passphrase_command,
> but that takes an input sounds to me like it would meet that
> requirement. The challenge sent to the command could be just a text
> prompt, or it could be the wrapped DEK. The selection of the command
> would indicate what the user was expected to pass. The return would be
> a DEK.

If I'm following, you're suggesting something like:

cluster_passphrase_command = 'aws get %q'

and then '%q' gets replaced with "Please provide the WAL DEK: ", or
something like that?  Prompting the user for each key?  Not sure how
well that's work if want to automate everything though.

If you have specific ideas, it'd really be helpful to give examples of
what you're thinking.

> > ...but we otherwise more-or-less
> > keep things as they are where the DEK that the user provides actually
> > goes to encrypting the sub-keys that we use for tables vs. WAL..?  ...
> 
> The idea of subkeys sounds great, if it can merge the two potential
> interactions into one and not complicate rotating the two parts
> separately. But having Postgres encrypting them, leaves us open to
> many of the same issues. That still feels like managing the keys, so a
> corporate who have strong opinions on how that should be done will
> still ask all sorts of pointy questions, complicating adoption.

This really needs more detail- what exactly is the concern?  What are
the 'pointy questions'?  What's complicated about using sub-keys?  One
of the great advantages of using sub-keys is that you can rotate the KEK
(or whatever is passed to PG) without having to actually go through the
entire system and decyprt/re-encrypt everything.  There's, of course,
the downside that then you don't get the keys rotated as often as you
might like to, but I am, at least, hopeful that we might be able to find
a way to do that in the future.

> > ...That
> > avoids the complication of having to have an API that somehow provides
> > more than one key, while also using the primary DEK key as-is from the
> > key management service and the KEK never being seen on the system where
> > PG is running.
> 
> Other than calling out (and therefore potentially prompting) twice,
> what do you see as the complications of having more than one key?

Mainly just a concern about the API and about what happens if, say, we
decide that we want to have another sub-key, for example.  If we're
handling them then there's really no issue- we just add another key in,
but if that's not happening then it's going to mean changes for
administrators.  If there's a good justification for it, then perhaps
that's alright, but hand waving at what the issue is doesn't really
help.

> > > > ...It would be
> > > > similar to the cluster_passphrase_command, but it would be simple in
> > > > some ways, but complex in others since we need at least two DEKs, and
> > > > key rotation might be very risky since there isn't validation in the
> > > > server.
> > >
> ...
> >
> > I'm not quite following this- I agree that we don't want PG, or anything
> > really, to see the private key that's on the Yubikey, as that wouldn't
> > be good- instead, we should be able to ask for the Yubikey to decrypt
> > the DEK for us and then use that, which I had thought was happening but
> > it's not entirely clear from the above discussion (and, admittedly, I've
> > not tried using the patch with my yubikey yet, but I do plan to...).
> >
> > Are we sure we're talking about the same thing here..?  That's really
> > the question I'm asking myself.
> 
> I'd describe what the current patch does as using YubiKey to encrypt
> and then decrypt an intermediate secret, which is then used to
> generate/derive a KEK, which is then used to unwrap the stored,
> wrapped DEK.

This seems like a crux of at least one concern- that the

Re: Added missing copy related data structures to typedefs.list

2020-12-16 Thread Bruce Momjian
On Mon, Dec  7, 2020 at 01:56:50PM +0530, vignesh C wrote:
> Hi,
> 
> Added missing copy related data structures to typedefs.list, these
> data structures were added while copy files were split during the
> recent commit. I found this while running pgindent for parallel copy
> patches.
> The Attached patch has the changes for the same.
> Thoughts?

Uh, we usually only update the typedefs file before we run pgindent on
the master branch.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Proposed patch for key managment

2020-12-16 Thread Michael Paquier
On Wed, Dec 16, 2020 at 05:04:12PM -0500, Bruce Momjian wrote:
> On Wed, Dec 16, 2020 at 10:23:23AM +0900, Michael Paquier wrote:
>> Hmm.  I don't think that this is right.  First, this still mixes
>> ciphers and HMAC.
> 
> I looked at the code.  The HMAC function body is just one function call
> to OpenSSL.  Do you want it moved to cryptohash.c and
> cryptohash_openssl.c?  To a new C file?
> 
>> Second, it is only possible here to use HMAC with
>> SHA512 but we want to be able to use SHA256 as well with SCRAM (per se
>> the scram_HMAC_* business in scram-common.c).  Third, this lacks a
> 
> I looked at the SCRAM HMAC code.  It looks complex, but I assume ideally
> that would be moved into a separate C file and used by cluster file
> encryption and SCRAM, right?  I need help to do that because I am
> unclear how much of the SCRAM hash code is specific to SCRAM.

I have gone though the same exercise for HMAC, with more success it
seems:
https://www.postgresql.org/message-id/x9m0nkejezipx...@paquier.xyz

This includes a fallback implementation that returns the same results
as OpenSSL, and the same results as samples in wikipedia or such
sources.  The set APIs in this refactoring could be reused here and
plugged into any SSL libraries, and my patch has changed SCRAM to
reuse that.  With this, it is also possible to remove all the HMAC
code from pgcrypto but this cannot happen without SHA1 support in
cryptohash.c, another patch I have submitted -hackers recently.  So I
think that it is a win as a whole.

>> fallback implementation.  Finally, pgcrypto is not touched, but we
> 
> I have a fallback implemention --- it fails?  ;-)  Did you want me to
> include an AES implementation?

No idea about this one yet.  There are no direct users of AES except
pgcrypto in core.  One thing that would be good IMO is to properly
split the patch of this thread into individual parts that could be
reviewed separately using for example "git format-patch" to generate
patch series.  What's presented is a mixed bag, so that's harder to
look at it and consider how this stuff should work, and if there are
pieces that should be designed better or not.
--
Michael


signature.asc
Description: PGP signature


Re: range_agg

2020-12-16 Thread Alexander Korotkov
On Thu, Dec 17, 2020 at 1:03 AM Alexander Korotkov  wrote:
> On Thu, Dec 17, 2020 at 12:54 AM Zhihong Yu  wrote:
> > +* The idea is to prepend underscores as needed until we make a name 
> > that
> > +* doesn't collide with anything ...
> >
> > I wonder if other characters (e.g. [a-z0-9]) can be used so that name 
> > without collision can be found without calling truncate_identifier().
>
> Probably.  But multiranges just shares naming logic already existing
> in arrays.  If we're going to change this, I think we should change
> this for arrays too.  And this change shouldn't be part of multirange
> patch.

I gave this another thought.  Now we have facility to name multirange
types manually.  I think we should give up with underscore naming
completely.  If both replacing "range" with "mutlirange" in the
typename and appending "_multirange" to the type name failed (very
unlikely), then let user manually name the multirange.  Any thoughts?

--
Regards,
Alexander Korotkov




Re: range_agg

2020-12-16 Thread Zhihong Yu
Letting user manually name the multirange (after a few automatic attempts)
seems reasonable.

Cheers

On Wed, Dec 16, 2020 at 3:34 PM Alexander Korotkov 
wrote:

> On Thu, Dec 17, 2020 at 1:03 AM Alexander Korotkov 
> wrote:
> > On Thu, Dec 17, 2020 at 12:54 AM Zhihong Yu  wrote:
> > > +* The idea is to prepend underscores as needed until we make a
> name that
> > > +* doesn't collide with anything ...
> > >
> > > I wonder if other characters (e.g. [a-z0-9]) can be used so that name
> without collision can be found without calling truncate_identifier().
> >
> > Probably.  But multiranges just shares naming logic already existing
> > in arrays.  If we're going to change this, I think we should change
> > this for arrays too.  And this change shouldn't be part of multirange
> > patch.
>
> I gave this another thought.  Now we have facility to name multirange
> types manually.  I think we should give up with underscore naming
> completely.  If both replacing "range" with "mutlirange" in the
> typename and appending "_multirange" to the type name failed (very
> unlikely), then let user manually name the multirange.  Any thoughts?
>
> --
> Regards,
> Alexander Korotkov
>


Re: range_agg

2020-12-16 Thread Alexander Korotkov
On Thu, Dec 17, 2020 at 2:37 AM Zhihong Yu  wrote:
> Letting user manually name the multirange (after a few automatic attempts) 
> seems reasonable.

Accepted.  Thank you for your feedback.

--
Regards,
Alexander Korotkov




Re: Proposed patch for key managment

2020-12-16 Thread Alastair Turner
On Wed, 16 Dec 2020 at 22:43, Stephen Frost  wrote:
>
> Greetings,
...
>
> If I'm following, you're suggesting something like:
>
> cluster_passphrase_command = 'aws get %q'
>
> and then '%q' gets replaced with "Please provide the WAL DEK: ", or
> something like that?  Prompting the user for each key?  Not sure how
> well that's work if want to automate everything though.
>
> If you have specific ideas, it'd really be helpful to give examples of
> what you're thinking.

I can think of three specific ideas off the top of my head: the
passphrase key wrapper, the secret store and the cloud/HW KMS.

Since the examples expand the purpose of cluster_passphrase_command,
let's call it cluster_key_challenge_command in the examples.

Starting with the passphrase key wrapper, since it's what's in place now.

 - cluster_key_challenge_command = 'password_key_wrapper %q'
 - Supplied on stdin to the process is the wrapped DEK (either a
combined key for db files and WAL or one for each, on separate calls)
 - %q is "Please provide WAL key wrapper password" or just "...provide
key wrapper password"
 - Returned on stdout is the unwrapped DEK

For a secret store

 - cluster_key_challenge_command = 'vault_key_fetch'
 - Supplied on stdin to the process is the secret's identifier (pg_dek_xxUUIDxx)
 - Returned on stdout is the DEK, which may never have been wrapped,
depending on implementation
 - Access control to the secret store is managed through the challenge
command's own config, certs, HBA, ...

For an HSM or cloud KMS

 - cluster_key_challenge_command = 'gcp_kms_key_fetch'
 - Supplied on stdin to the process is the the wrapped DEK (individual
or combined)
 - Returned on stdout is the DEK (individual or combined)
 - Access control to the kms is managed through the challenge
command's own config, certs, HBA, ...

The secret store and HSM/KMS options may be promptless, and therefore
amenable to automation, depending on the setup of those clients.

>
...
>
> > > ...That
> > > avoids the complication of having to have an API that somehow provides
> > > more than one key, while also using the primary DEK key as-is from the
> > > key management service and the KEK never being seen on the system where
> > > PG is running.
> >
> > Other than calling out (and therefore potentially prompting) twice,
> > what do you see as the complications of having more than one key?
>
> Mainly just a concern about the API and about what happens if, say, we
> decide that we want to have another sub-key, for example.  If we're
> handling them then there's really no issue- we just add another key in,
> but if that's not happening then it's going to mean changes for
> administrators.  If there's a good justification for it, then perhaps
> that's alright, but hand waving at what the issue is doesn't really
> help.
>

Sorry, I wasn't trying to hand wave it away. For automated
interactions, like big iron HSMs or cloud KSMs, the difference between
2 operations and 10 operations to start a DB server won't matter. For
an admin/operator having to type 10 passwords or get 10 clean
thumbprint scans, it would be horrible. My underlying question was, is
that toil the only problem to be solved, or is there another reason to
get into key combination, key splitting and the related issues which
are less documented and less well understood than key wrapping.

>
...
> >
> > I'd describe what the current patch does as using YubiKey to encrypt
> > and then decrypt an intermediate secret, which is then used to
> > generate/derive a KEK, which is then used to unwrap the stored,
> > wrapped DEK.
>
> This seems like a crux of at least one concern- that the current patch
> is deriving the actual KEK from the passphrase and not just taking the
> provided value (at least, that's what it looks like from a *very* quick
> look into it), and that's the part that I was suggesting that we might
> add an option for- to indicate if the cluster passphrase command is
> actually returning a passphrase which should be used to derive a key, or
> if it's returning a key directly to be used.  That does seem to be a
> material difference to me and one which we should care about.
>

Yes. Caring about that is the reason I've been making a nuisance of myself.

> > > There's an entirely independent discussion to be had about figuring out
> > > a way to actually off-load *entirely* the encryption/decryption of data
> > > to the linux crypto system or hardware devices, but unless someone can
> > > step up and write code for those today, I'd suggest that we table that
> > > effort until after we get this initial capability of having TDE with PG
> > > doing all of the encryption/decryption.
> >
> > I'm hopeful that the work on abstracting OpenSSL, nsstls, etc is going
> > to help in this direct.
>
> Yes, I agree with that general idea but it's a 'next step' kind of
> thing, not something we need to try and bake in today.
>

Agreed.

Thanks
Alastair




Re: Proposed patch for key managment

2020-12-16 Thread Daniel Gustafsson
> On 16 Dec 2020, at 17:26, Stephen Frost  wrote:
> 
> Greetings,
> 
> * Michael Paquier (mich...@paquier.xyz) wrote:
>> On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote:
>>> Yeah, looking at what's been done there seems like the right approach to
>>> me as well, ideally we'd have one set of APIs that'll support all these
>>> use cases (not unlike what curl does..).
>> 
>> Ooh...  This is interesting.  What did curl do wrong here?  It is
>> always good to hear from such past experiences.
> 
> Not sure that came across very well- curl did something right in terms
> of their vTLS layer which allows for building curl with a number of
> different SSL/TLS libraries without the core code having to know about
> all the differences.

The vtls layer in libcurl is actually quite similar to what we have in libpq
with be-secure.c and be-secure-*.c for TLS and with what Michael has been doing
lately with cryptohash.

In vtls library contexts are abstracted to the core code, with implementations
supplying a struct with a set of function pointers implementing functionality
(this difference is due to libcurl supporting multiple TLS libraries compiled
at the same time, something postgres IMO shouldn't do).  We do give
implementations a bit more leeway with how feature complete they must be,
mainly due to the wide variety of libraries supported (from OpenSSL to IBM
GSKit and most ones in between).  While basic it has served us quite well and
we have had first time contributors successfully come with a new TLS library as
a patch.

What we have so far in the tree (an abstract *reasonably generic* API hiding
all library context interactions) makes implementing support for a new TLS
library less daunting as no core code needs to be touched really.  Having
kicked the tyres on this a fair bit I really think we should maintain that
separation, it's complicated enough as it is given just how much code needs to
be written.

cheers ./daniel



Re: On login trigger: take three

2020-12-16 Thread Zhihong Yu
Hi,
For EventTriggerOnConnect():

+   PG_CATCH();
+   {
...
+   AbortCurrentTransaction();
+   return;

Should runlist be freed in the catch block ?

+   gettext_noop("In case of errors in the ON client_connection
EVENT TRIGGER procedure, this parameter can be used to disable trigger
activation and provide access to the database."),

I think the text should be on two lines (current line too long).

Cheers


Re: Proposed patch for key managment

2020-12-16 Thread Michael Paquier
On Thu, Dec 17, 2020 at 01:15:37AM +0100, Daniel Gustafsson wrote:
> In vtls library contexts are abstracted to the core code, with implementations
> supplying a struct with a set of function pointers implementing functionality
> (this difference is due to libcurl supporting multiple TLS libraries compiled
> at the same time, something postgres IMO shouldn't do).  We do give
> implementations a bit more leeway with how feature complete they must be,
> mainly due to the wide variety of libraries supported (from OpenSSL to IBM
> GSKit and most ones in between).  While basic it has served us quite well and
> we have had first time contributors successfully come with a new TLS library 
> as
> a patch.

This infrastructure has been chosen because curl requires to be able
to use multiple types of libraries at run-time, right?  I don't think
we need to get down to that for Postgres and keep things so as we are
only able to use one TLS library at the same time, the one compiled
with.  This makes the protocol simpler.  But perhaps I just lack
ambition and vision.
--
Michael


signature.asc
Description: PGP signature


[PATCH] nbtree: Do not show debugmessage if deduplication is disabled

2020-12-16 Thread Justin Pryzby
Even though the message literally says whether the index "can safely" or
"cannot" use deduplication, the function specifically avoids the debug message
for system columns, so I think it also makes sense to hide it when
deduplication is turned off. 

diff --git a/src/backend/access/nbtree/nbtutils.c 
b/src/backend/access/nbtree/nbtutils.c
index 2f5f14e527..b78b542429 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -2710,6 +2710,9 @@ _bt_allequalimage(Relation rel, bool debugmessage)
if (IsSystemRelation(rel))
return false;
 
+   if (!BTGetDeduplicateItems(rel))
+   return false;
+
for (int i = 0; i < IndexRelationGetNumberOfKeyAttributes(rel); i++)
{
Oid opfamily = rel->rd_opfamily[i];
-- 
2.17.0




Re: SELECT INTO deprecation

2020-12-16 Thread Michael Paquier
On Wed, Dec 16, 2020 at 06:07:08PM +0100, Peter Eisentraut wrote:
> Right, we would very likely not add it now.  But it doesn't seem to cause a
> lot of ongoing maintenance burden, so if there is a use case, it's not
> unreasonable to keep it around.  I have no other motive here, I was just
> curious.

From the point of view of the code, this would simplify the grammar
rules of SELECT, which is a good thing.  And this would also simplify
some code in the rewriter where we create some CreateTableAsStmt
on-the-fly where an IntoClause is specified, but my take is that this
is not really a burden when it comes to long-term maintenance.  And
the git history tells, at least it seems to me so, that this is not
manipulated much.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] logical decoding of two-phase transactions

2020-12-16 Thread Ajin Cherian
> v31-0009-Support-2PC-txn-Subscription-option
> 1.
> --- a/src/include/catalog/catversion.h
> +++ b/src/include/catalog/catversion.h
> @@ -53,6 +53,6 @@
>   */
>
>  /* mmddN */
> -#define CATALOG_VERSION_NO 202011251
> +#define CATALOG_VERSION_NO 202011271
>
> No need to change catversion as this gets changed frequently and that
> leads to conflict in the patch. We can change it either in the final
> version or normally committers take care of this. If you want to
> remember it, maybe adding a line for it in the commit message should
> be okay. For now, I have removed this from the patch.
>
>
> --
> With Regards,
> Amit Kapila.

I have reviewed the changes, did not have any new comments.
While testing, I found an issue in this patch. During initialisation,
the pg_output is not initialised fully and the subscription parameters
are not all read. As a result, ctx->twophase could be
set to true , even if the subscription does not specify so. For this,
we need to make the following change in pgoutput.c:
pgoutput_startup(), similar to how streaming is handled.

/*
 * This is replication start and not slot initialization.
 *
 * Parse and validate options passed by the client.
 */
if (!is_init)
{
:
:
}
 else
{
   /* Disable the streaming during the slot initialization mode. */
ctx->streaming = false;
+ctx->twophase = false
 }

regards,
Ajin




STANDBY_LOCK_TIMEOUT may not interrupt ProcWaitForSignal()?

2020-12-16 Thread Fujii Masao

Hi,

When the startup process needs to wait for recovery conflict on lock,
STANDBY_LOCK_TIMEOUT is enabled to interrupt ProcWaitForSignal()
if necessary. If this timeout happens, StandbyLockTimeoutHandler() is
called and this function does nothing as follows.

/*
 * StandbyLockTimeoutHandler() will be called if STANDBY_LOCK_TIMEOUT is 
exceeded.
 * This doesn't need to do anything, simply waking up is enough.
 */
void
StandbyLockTimeoutHandler(void)
{
}

But if STANDBY_LOCK_TIMEOUT happens just before entering ProcWaitForSignal(),
the timeout fails to interrupt that wait. Also a signal sent by this timeout
doesn't interrupt poll() used in ProcWaitForSignal(), on all platforms.

So I think that StandbyLockTimeoutHandler() should do SetLatch(MyLatch)
so that the timeout can interrupt ProcWaitForSignal() even in those cases.
Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [HACKERS] logical decoding of two-phase transactions

2020-12-16 Thread Amit Kapila
On Thu, Dec 17, 2020 at 7:02 AM Ajin Cherian  wrote:
>
> > v31-0009-Support-2PC-txn-Subscription-option
> > 1.
> > --- a/src/include/catalog/catversion.h
> > +++ b/src/include/catalog/catversion.h
> > @@ -53,6 +53,6 @@
> >   */
> >
> >  /* mmddN */
> > -#define CATALOG_VERSION_NO 202011251
> > +#define CATALOG_VERSION_NO 202011271
> >
> > No need to change catversion as this gets changed frequently and that
> > leads to conflict in the patch. We can change it either in the final
> > version or normally committers take care of this. If you want to
> > remember it, maybe adding a line for it in the commit message should
> > be okay. For now, I have removed this from the patch.
> >
> >
> > --
> > With Regards,
> > Amit Kapila.
>
> I have reviewed the changes, did not have any new comments.
> While testing, I found an issue in this patch. During initialisation,
> the pg_output is not initialised fully and the subscription parameters
> are not all read. As a result, ctx->twophase could be
> set to true , even if the subscription does not specify so. For this,
> we need to make the following change in pgoutput.c:
> pgoutput_startup(), similar to how streaming is handled.
>
> /*
>  * This is replication start and not slot initialization.
>  *
>  * Parse and validate options passed by the client.
>  */
> if (!is_init)
> {
> :
> :
> }
>  else
> {
>/* Disable the streaming during the slot initialization mode. */
> ctx->streaming = false;
> +ctx->twophase = false
>  }
>

makes sense. I can take care of this in the next version where I am
planning to address Sawada-San's comments and few other clean up work.

-- 
With Regards,
Amit Kapila.




Re: [HACKERS] logical decoding of two-phase transactions

2020-12-16 Thread Amit Kapila
On Thu, Dec 17, 2020 at 9:02 AM Amit Kapila  wrote:
>
> On Thu, Dec 17, 2020 at 7:02 AM Ajin Cherian  wrote:
> >
> >
> > I have reviewed the changes, did not have any new comments.
> > While testing, I found an issue in this patch. During initialisation,
> > the pg_output is not initialised fully and the subscription parameters
> > are not all read. As a result, ctx->twophase could be
> > set to true , even if the subscription does not specify so. For this,
> > we need to make the following change in pgoutput.c:
> > pgoutput_startup(), similar to how streaming is handled.
> >
> > /*
> >  * This is replication start and not slot initialization.
> >  *
> >  * Parse and validate options passed by the client.
> >  */
> > if (!is_init)
> > {
> > :
> > :
> > }
> >  else
> > {
> >/* Disable the streaming during the slot initialization mode. */
> > ctx->streaming = false;
> > +ctx->twophase = false
> >  }
> >
>
> makes sense.
>

On again thinking about this, I think it is good to disable it during
slot initialization but will it create any problem because during slot
initialization we don't stream any xact and stop processing WAL as
soon as we reach CONSISTENT_STATE? Did you observe any problem with
this?

-- 
With Regards,
Amit Kapila.




Re: \gsetenv

2020-12-16 Thread David Fetter
On Wed, Dec 16, 2020 at 05:30:13PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > We have \gset to set some parameters, but not ones in the environment,
> > so I fixed this with a new analogous command, \gsetenv.
> 
> In view of the security complaints we just had about \gset
> (CVE-2020-25696), I cannot fathom why we'd consider adding another
> way to cause similar problems.

The RedHat site says, in part:

the attacker can execute arbitrary code as the operating system
account running psql

This is true of literally everything that requires a login shell in
order to use. I remember a "virus" my friend Keith McMillan wrote in
TeX back in the 1994. You can download the PostScript file detailing
the procedure, bearing in mind that PostScript also contains ways to
execute arbitrary code if opened:

ftp://ftp.cerias.purdue.edu/pub/doc/viruses/KeithMcMillan-PlatformIndependantVirus.ps

That one got itself a remote code execution by fiddling with a
person's .emacs, and it got Keith a master's degree in CS.  I suspect
that equally nasty things are possible when it comes to \i and \o in
psql. It would be a terrible idea to hobble psql in the attempt to
prevent such attacks.

> We were fortunate enough to be able to close off the main security
> risk of \gset without deleting the feature altogether ... but how
> exactly would we distinguish "safe" from "unsafe" environment
> variables?  It kind of seems like anything that would be worth
> setting at all would tend to fall into the "unsafe" category,
> because the implications of setting it would be unclear.  But *for
> certain* we're not taking a patch that allows remotely setting PATH
> and things like that.

Would you be so kind as to explain what the actual problem is here
that not doing this would mitigate?

If people run code they haven't seen from a server they don't trust,
neither psql nor anything else[1] can protect them from the
consequences. Seeing what they're about to run is dead easy in this
case because \gsetenv, like \gset and what in my view is the much more
dangerous \gexec, is something anyone with the tiniest modicum of
caution would run only after testing it with \g.

> Besides which, you haven't bothered with even one word of positive
> justification.  What's the non-hazardous use case?

Thanks for asking, and my apologies for not including it.

I ran into a situation where we sometimes got a very heavily loaded
and also well-protected PostgreSQL server. At times, just getting a
shell on it could take a few tries. To mitigate situations like that,
I used a method that's a long way from new, abstruse, or secret: have
psql open in a long-lasting tmux or screen session. It could both
access the database at a time when getting a new connection would be
somewhere between difficult and impossible.  The bit that's unlikely
to be new was when I noticed that it could also shell out
and send information off to other places, but only when I put together
a pretty baroque procedure that involved using combinations of \gset,
\o, and \!. All of the same things \gsetenv could do were doable with
those, just less convenient, so I drafted up a patch in the hope that
fewer others would find themselves jumping through the hoops I did to
get that set up.

Separately, I confess to some bafflement at the reasoning behind the
CVE you referenced. By the time an attacker has compromised a database
server, it's already game over. Code running on the compromised
database is capable of doing much nastier things than crashing a
client machine, and very likely has access to other high-value targets
on its own say-so than said client does.

Best,
David.

[1] search for "gods themselves" here:
https://en.wikiquote.org/wiki/Friedrich_Schiller
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: [HACKERS] logical decoding of two-phase transactions

2020-12-16 Thread Ajin Cherian
On Thu, Dec 17, 2020 at 2:41 PM Amit Kapila  wrote:

> On again thinking about this, I think it is good to disable it during
> slot initialization but will it create any problem because during slot
> initialization we don't stream any xact and stop processing WAL as
> soon as we reach CONSISTENT_STATE? Did you observe any problem with
> this?
>
Yes, it did not stream any xact during initialization but I was
surprised that the DecodePrepare code was invoked even though
I hadn't created the subscription with twophase enabled. No problem
was observed.

regards,
Ajin Cherian
Fujitsu Australia




Re: [HACKERS] logical decoding of two-phase transactions

2020-12-16 Thread Ajin Cherian
Adding a test case that tests that when a consistent snapshot is
formed after a prepared transaction but before it has been COMMIT
PREPARED.
This test makes sure that in this case, the entire transaction is
decoded on a COMMIT PREPARED. This patch applies on top of v31.

regards,
Ajin Cherian
Fujitsu Australia


v31-0010-Support-2PC-consistent-snapshot-isolation-tests.patch
Description: Binary data


Re: New Table Access Methods for Multi and Single Inserts

2020-12-16 Thread Justin Pryzby
Typos:

+ *  1) Specify is_multi as true, then multi insert state is allcoated.
=> allocated
+ * dropped, short-lived memory context is delted and mistate is freed up.
=> deleted
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minmal and
=> minimal
+   /* Mulit insert state if requested, otherwise NULL. */
=> multi
+ * Buffer the input slots and insert the tuples from the buffered slots at a
=> *one* at a time ?
+ * Compute the size of the tuple only if mi_max_size i.e. the total tuple size
=> I guess you mean max_size

This variable could use a better name:
+CopyMulitInsertFlushBuffers(List **mirri, ..
mirri is fine for a local variable like an element of a struture/array, or a
loop variable, but not for a function parameter which is an "List" of arbitrary
pointers.

I think this comment needs to be updated (again) for the removal of the Info
structure.
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
+ * multi insert buffer items stored in CopyMultiInsertInfo's

I think the COPY patch should be 0002 (or maybe merged into 0001).
There's some superfluous whitespace (and other) changes there which make the
patch unnecessarily long.

You made the v2 insert interface a requirement for all table AMs.
Should it be optional, and fall back to simple inserts if not implemented ? 

For CTAS, I think we need to consider Paul's idea here.
https://www.postgresql.org/message-id/26C14A63-CCE5-4B46-975A-57C1784B3690%40vmware.com
Conceivably, tableam should support something like that for arbitrary AMs
("insert into a new table for which we have exclusive lock").  I think that AM
method should also be optional.  It should be possible to implement a minimal
AM without implementing every available optimization, which may not apply to
all AMs, anyway.

-- 
Justin




Re: [HACKERS] Custom compression methods

2020-12-16 Thread Dilip Kumar
On Wed, Dec 9, 2020 at 5:37 PM Dilip Kumar  wrote:
>
> On Sat, Nov 21, 2020 at 3:50 AM Robert Haas  wrote:
> >
> > On Wed, Nov 11, 2020 at 9:39 AM Dilip Kumar  wrote:
> > > There were a few problems in this rebased version, basically, the
> > > compression options were not passed while compressing values from the
> > > brin_form_tuple, so I have fixed this.
> >
> > Since the authorship history of this patch is complicated, it would be
> > nice if you would include authorship information and relevant
> > "Discussion" links in the patches.
>
> I have added that.
>
> > Design level considerations and overall notes:
> >
> > configure is autogenerated from configure.in, so the patch shouldn't
> > include changes only to the former.
>
> Yeah, I missed those changes. Done now.
>
> > Looking over the changes to src/include:
> >
> > +   PGLZ_COMPRESSION_ID,
> > +   LZ4_COMPRESSION_ID
> >
> > I think that it would be good to assign values to these explicitly.
>
> Done
>
> > +/* compresion handler routines */
> >
> > Spelling.
>
> Done
>
> > +   /* compression routine for the compression method */
> > +   cmcompress_function cmcompress;
> > +
> > +   /* decompression routine for the compression method */
> > +   cmcompress_function cmdecompress;
> >
> > Don't reuse cmcompress_function; that's confusing. Just have a typedef
> > per structure member, even if they end up being the same.
>
> Fixed as suggested
>
> >  #define TOAST_COMPRESS_SET_RAWSIZE(ptr, len) \
> > -   (((toast_compress_header *) (ptr))->rawsize = (len))
> > +do { \
> > +   Assert(len > 0 && len <= RAWSIZEMASK); \
> > +   ((toast_compress_header *) (ptr))->info = (len); \
> > +} while (0)
> >
> > Indentation.
>
> Done
>
> > +#define TOAST_COMPRESS_SET_COMPRESSION_METHOD(ptr, cm_method) \
> > +   ((toast_compress_header *) (ptr))->info |= ((cm_method) << 30);
> >
> > What about making TOAST_COMPRESS_SET_RAWSIZE() take another argument?
> > And possibly also rename it to TEST_COMPRESS_SET_SIZE_AND_METHOD() or
> > something? It seems not great to have separate functions each setting
> > part of a 4-byte quantity. Too much chance of failing to set both
> > parts. I guess you've got a function called
> > toast_set_compressed_datum_info() for that, but it's just a wrapper
> > around two macros that could just be combined, which would reduce
> > complexity overall.
>
> Done that way
>
> > +   T_CompressionRoutine,   /* in access/compressionapi.h */
> >
> > This looks misplaced. I guess it should go just after these:
> >
> > T_FdwRoutine,   /* in foreign/fdwapi.h */
> > T_IndexAmRoutine,   /* in access/amapi.h */
> > T_TableAmRoutine,   /* in access/tableam.h */
>
> Done
>
> > Looking over the regression test changes:
> >
> > The tests at the top of create_cm.out that just test that we can
> > create tables with various storage types seem unrelated to the purpose
> > of the patch. And the file doesn't test creating a compression method
> > either, as the file name would suggest, so either the file name needs
> > to be changed (compression, compression_method?) or the tests don't go
> > here.
>
> Changed to "compression"
>
> > +-- check data is okdd
> >
> > I guess whoever is responsible for this comment prefers vi to emacs.
>
> Fixed
>
> > I don't quite understand the purpose of all of these tests, and there
> > are some things that I feel like ought to be tested that seemingly
> > aren't. Like, you seem to test using an UPDATE to move a datum from a
> > table to another table with the same compression method, but not one
> > with a different compression method.
>
> Added test for this, and some other tests to improve overall coverage.
>
>  Testing the former is nice and
> > everything, but that's the easy case: I think we also need to test the
> > latter. I think it would be good to verify not only that the data is
> > readable but that it's compressed the way we expect. I think it would
> > be a great idea to add a pg_column_compression() function in a similar
> > spirit to pg_column_size(). Perhaps it could return NULL when
> > compression is not in use or the data type is not varlena, and the
> > name of the compression method otherwise. That would allow for better
> > testing of this feature, and it would also be useful to users who are
> > switching methods, to see what data they still have that's using the
> > old method. It could be useful for debugging problems on customer
> > systems, too.
>
> This is a really great idea, I have added this function and used in my test.
>
> > I wonder if we need a test that moves data between tables through an
> > intermediary. For instance, suppose a plpgsql function or DO block
> > fetches some data and stores it in a plpgsql variable and then uses
> > the variable to insert into another table. Hmm, maybe that would force
> > de-TOASTing. But perhaps there are other cases. Maybe a more general
> > way to approach the problem 

Re: [HACKERS] logical decoding of two-phase transactions

2020-12-16 Thread Ajin Cherian
On Tue, Dec 15, 2020 at 11:42 PM Amit Kapila  wrote:
>
> On Mon, Dec 14, 2020 at 2:59 PM Amit Kapila  wrote:
> >
>
> Today, I looked at one of the issues discussed earlier in this thread
> [1] which is that decoding can block (or deadlock can happen) when the
> user explicitly locks the catalog relation (like Lock pg_class) or
> perform Cluster on non-relmapped catalog relations (like Cluster
> pg_trigger using pg_class_oid_index; and the user_table on which we
> have performed any operation has a trigger) in the prepared xact. As
> discussed previously, we don't have a problem when user tables are
> exclusively locked because during decoding we don't acquire any lock
> on those and in fact, we have a test case for the same in the patch.
>
Yes, and as described in that mail, the current code explicitly denies
preparation of a 2PC transaction.
under some circumstances:

postgres=# BEGIN;
postgres=# CLUSTER pg_class using pg_class_oid_index ;
postgres=# PREPARE TRANSACTION 'test_prepared_lock';
ERROR: cannot PREPARE a transaction that modified relation mapping


> In the previous discussion, most people seem to be of opinion that we
> should document it in a category "don't do that", or prohibit to
> prepare transactions that lock system tables in the exclusive mode as
> any way that can block the entire system. The other possibility could
> be that the plugin can allow enabling lock_timeout when it wants to
> allow decoding of two-phase xacts and if the timeout occurs it tries
> to fetch by disabling two-phase option provided by the patch.
>
> I think it is better to document this as there is no realistic
> scenario where it can happen. I also think separately (not as part of
> this patch) we can investigate whether it is a good idea to prohibit
> prepare for transactions that acquire exclusive locks on catalog
> relations.
>
> Thoughts?

I agree with the documentation option. If we choose to disable
two-phase on timeout, we still need to decide what to
do with already prepared transactions.

regards,
Ajin Cherian
Fujitsu Australia




Re: Cache relation sizes?

2020-12-16 Thread Andy Fan
On Thu, Nov 19, 2020 at 1:02 PM Thomas Munro  wrote:

> On Tue, Nov 17, 2020 at 10:48 PM Amit Kapila 
> wrote:
> > Yeah, it is good to verify VACUUM stuff but I have another question
> > here. What about queries having functions that access the same
> > relation (SELECT c1 FROM t1 WHERE c1 <= func(); assuming here function
> > access the relation t1)? Now, here I think because the relation 't1'
> > is already opened, it might use the same value of blocks from the
> > shared cache even though the snapshot for relation 't1' when accessed
> > from func() might be different. Am, I missing something, or is it
> > dealt in some way?
>
> I think it should be covered by the theory about implicit memory
> barriers snapshots, but to simplify things I have now removed the
> lock-free stuff from the main patch (0001), because it was a case of
> premature optimisation and it distracted from the main concept.  The
> main patch has 128-way partitioned LWLocks for the mapping table, and
> then per-relfilenode spinlocks for modifying the size.  There are
> still concurrency considerations, which I think are probably handled
> with the dirty-update-wins algorithm you see in the patch.  In short:
> due to extension and exclusive locks, size changes AKA dirty updates
> are serialised, but clean updates can run concurrently, so we just
> have to make sure that clean updates never clobber dirty updates -- do
> you think that is on the right path?
>

Hi Thomas:

Thank you for working on it.

I spent one day studying the patch and I want to talk about one question
for now.
What is the purpose of calling smgrimmedsync to evict a DIRTY sr (what will
happen
if we remove it and the SR_SYNCING and SR_JUST_DIRTIED flags)?


-- 
Best Regards
Andy Fan


Re: On login trigger: take three

2020-12-16 Thread Pavel Stehule
st 16. 12. 2020 v 20:38 odesílatel Pavel Stehule 
napsal:

> Attached please find new versoin of the patch based on
> on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch
>
>> So there is still only  "disable_client_connection_trigger" GUC? because
>> we need possibility to disable client connect triggers and there is no such
>> need for other event types.
>>
>> As you suggested  have added "dathaslogontriggers" flag which is set when
>> client connection trigger is created.
>> This flag is never cleaned (to avoid visibility issues mentioned in my
>> previous mail).
>>
>
> This is much better - I don't see any slowdown when logon trigger is not
> defined
>
> I did some benchmarks and looks so starting language handler is relatively
> expensive - it is about 25% and starting handler like event trigger has
> about 35%. But this problem can be solved later and elsewhere.
>
> I prefer the inverse form of disable_connection_trigger. Almost all GUC
> are in positive form - so enable_connection_triggger is better
>
> I don't think so current handling dathaslogontriggers is good for
> production usage. The protection against race condition can be solved by
> lock on pg_event_trigger
>

I thought about it, and probably the counter of connect triggers will be
better there. The implementation will be simpler and more robust.

Pavel


> Regards
>
> Pavel
>
>
>
>>
>>
>> --
>> Konstantin Knizhnik
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>


Re: row filtering for logical replication

2020-12-16 Thread Önder Kalacı
Hi all,

I'm also interested in this patch. I rebased the changes to the current
master branch and attached. The rebase had two issues. First, patch-8 was
conflicting, and that seems only helpful for debugging purposes during
development. So, I dropped it for simplicity. Second, the changes have a
conflict with `publish_via_partition_root` changes. I tried to fix the
issues, but ended-up having a limitation for now. The limitation is that
"cannot create publication with WHERE clause on the partitioned table
without publish_via_partition_root is set to true". This restriction can be
lifted, though I left out for the sake of focusing on the some issues that
I observed on this patch.

Please see my review:

+   if (list_length(relentry->qual) > 0)
+   {
+   HeapTuple   old_tuple;
+   HeapTuple   new_tuple;
+   TupleDesc   tupdesc;
+   EState *estate;
+   ExprContext *ecxt;
+   MemoryContext oldcxt;
+   ListCell   *lc;
+   boolmatched = true;
+
+   old_tuple = change->data.tp.oldtuple ?
&change->data.tp.oldtuple->tuple : NULL;
+   new_tuple = change->data.tp.newtuple ?
&change->data.tp.newtuple->tuple : NULL;
+   tupdesc = RelationGetDescr(relation);
+   estate = create_estate_for_relation(relation);
+
+   /* prepare context per tuple */
+   ecxt = GetPerTupleExprContext(estate);
+   oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+   ecxt->ecxt_scantuple = ExecInitExtraTupleSlot(estate,
tupdesc, &TTSOpsHeapTuple);
+
+   ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple,
ecxt->ecxt_scantuple, false);
+
+   foreach(lc, relentry->qual)
+   {
+   Node   *qual;
+   ExprState  *expr_state;
+   Expr   *expr;
+   Oid expr_type;
+   Datum   res;
+   boolisnull;
+
+   qual = (Node *) lfirst(lc);
+
+   /* evaluates row filter */
+   expr_type = exprType(qual);
+   expr = (Expr *) coerce_to_target_type(NULL, qual,
expr_type, BOOLOID, -1, COERCION_ASSIGNMENT, COERCE_IMPLICIT_CAST, -1);
+   expr = expression_planner(expr);
+   expr_state = ExecInitExpr(expr, NULL);
+   res = ExecEvalExpr(expr_state, ecxt, &isnull);
+
+   /* if tuple does not match row filter, bail out */
+   if (!DatumGetBool(res) || isnull)
+   {
+   matched = false;
+   break;
+   }
+   }
+
+   MemoryContextSwitchTo(oldcxt);
+


The above part can be considered the core of the logic, executed per tuple.
As far as I can see, it has two downsides.

First, calling `expression_planner()` for every tuple can be quite
expensive. I created a sample table, loaded data and ran a quick benchmark
to see its effect. I attached the very simple script that I used to
reproduce the issue on my laptop. I'm pretty sure you can find nicer ways
of doing similar perf tests, just sharing as a reference.

The idea of the test is to add a WHERE clause to a table, but none of the
tuples are filtered out. They just go through this code-path and send it to
the remote node.

#rows   Patched| Master
1M  00:00:25.067536| 00:00:16.633988
10M  00:04:50.770791| 00:02:40.945358


So, it seems a significant overhead to me. What do you think?

Secondly, probably more importantly, allowing any operator is as dangerous
as allowing any function as users can create/overload operator(s). For
example, assume that users create an operator which modifies the table that
is being filtered out:

```
CREATE OR REPLACE FUNCTION function_that_modifies_table(left_art INTEGER,
right_arg INTEGER)
RETURNS BOOL AS
$$
BEGIN

  INSERT INTO test SELECT * FROM test;

  return left_art > right_arg;
 END;
$$ LANGUAGE PLPGSQL VOLATILE;

CREATE OPERATOR >>= (
  PROCEDURE = function_that_modifies_table,
  LEFTARG = INTEGER,
  RIGHTARG = INTEGER
);

CREATE PUBLICATION pub FOR TABLE test WHERE (key >>= 0);
``

With the above, we seem to be in trouble. Although the above is an extreme
example, it felt useful to share to the extent of the problem. We probably
cannot allow any free-form SQL to be on the filters.

To overcome these issues, one approach could be to rely on known safe
operators and functions. I believe the btree and hash operators should
provide a pretty strong coverage across many use cases. As far as I can
see, the procs that the following query returns can be our baseline:

```
select   DISTINCT amproc.amproc::regproc AS opfamily_procedure
from pg_am am,

Re: [HACKERS] logical decoding of two-phase transactions

2020-12-16 Thread Masahiko Sawada
On Wed, Dec 16, 2020 at 6:22 PM Amit Kapila  wrote:
>
> On Wed, Dec 16, 2020 at 1:04 PM Masahiko Sawada  wrote:
> >
> > Thank you for updating the patch. I have two questions:
> >
> > -
> > @@ -239,6 +239,19 @@ CREATE SUBSCRIPTION  > class="parameter">subscription_name >   
> >  
> > 
> > +   
> > +two_phase (boolean)
> > +
> > + 
> > +  Specifies whether two-phase commit is enabled for this 
> > subscription.
> > +  The default is false.
> > +  When two-phase commit is enabled then the decoded
> > transactions are sent
> > +  to the subscriber on the PREPARE TRANSACTION. When
> > two-phase commit is not
> > +  enabled then PREPARE TRANSACTION and COMMIT/ROLLBACK PREPARED 
> > are not
> > +  decoded on the publisher.
> > + 
> > +
> > +   
> >
> > The user will need to specify the 'two_phase’ option on CREATE
> > SUBSCRIPTION. It would mean the user will need to control what data is
> > streamed both on publication side for INSERT/UPDATE/DELETE/TRUNCATE
> > and on subscriber side for PREPARE. Looking at the implementation of
> > the ’two_phase’ option of CREATE SUBSCRIPTION, it ultimately just
> > passes the ‘two_phase' option to the publisher. Why don’t we set it on
> > the publisher side?
> >
>
> There could be multiple subscriptions for the same publication, some
> want to decode the transaction at prepare time and others might want
> to decode at commit time only.  Also, one subscription could subscribe
> to multiple publications, so not sure if it is even feasible to set at
> publication level (consider one txn has changes belonging to multiple
> publications). This option controls how the data is streamed from a
> publication similar to other options like 'streaming'. Why do you
> think this should be any different?

Oh, I was thinking that the option controls what data is streamed
similar to the 'publish' option. But I agreed with you. As you
mentioned, it might be a problem if a subscription subscribes multiple
publications setting different ’two_phase’ options. Also in terms of
changing this option while streaming changes, it’s better to control
it on the subscriber side.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Cache relation sizes?

2020-12-16 Thread Thomas Munro
Hi Andy,

On Thu, Dec 17, 2020 at 7:29 PM Andy Fan  wrote:
> I spent one day studying the patch and I want to talk about one question for 
> now.
> What is the purpose of calling smgrimmedsync to evict a DIRTY sr (what will 
> happen
> if we remove it and the SR_SYNCING and SR_JUST_DIRTIED flags)?

The underlying theory of the patch is that after fsync() succeeds, the
new size is permanent, but before that it could change at any time
because of asynchronous activities in the kernel*.  Therefore, if you
want to evict the size from shared memory, you have to fsync() the
file first.  I used smgrimmedsync() to do that.

*I suspect that it can really only shrink with network file systems
such as NFS and GlusterFS, where the kernel has to take liberties to
avoid network round trips for every file extension, but I don't know
the details on all 9 of our supported OSes and standards aren't very
helpful for understanding buffered I/O.