date:20201215

On Mon, Dec 14, 2020 at 6:27 PM Amit Kapila  wrote:
>
> On Tue, Dec 8, 2020 at 2:01 PM Ajin Cherian  wrote:
> >
> > On Tue, Dec 1, 2020 at 6:26 PM Amit Kapila  wrote:
> >
> > > To skip it, we need to send GID in begin message and then on
> > > subscriber-side, check if the prepared xact already exists, if so then
> > > set a flag. The flag needs to be set in begin/start_stream and reset
> > > in stop_stream/commit/abort. Using the flag, we can skip the entire
> > > contents of the prepared xact. In ReorderFuffer-side also, we need to
> > > get and set GID in txn even when we skip it because we need to send
> > > the same at commit time. In this solution, we won't be able to send it
> > > during normal start_stream because by that time we won't know GID and
> > > I think that won't be required. Note that this is only required when
> > > we skipped sending prepare, otherwise, we just need to send
> > > Commit-Prepared at commit time.
> >
> > I have implemented these changes and tested the fix using the test
> > setup I had shared above and it seems to be working fine.
> > I have also tested restarts that simulate duplicate prepares being
> > sent by the publisher and verified that it is handled correctly by the
> > subscriber.
> >
>
> This implementation has a flaw in that it has used commit_lsn for the
> prepare when we are sending prepare just before commit prepared. We
> can't send the commit LSN with prepare because if the subscriber
> crashes after prepare then it would update
> replorigin_session_origin_lsn with that commit_lsn. Now, after the
> restart, because we will use that LSN to start decoding, the Commit
> Prepared will get skipped. To fix this, we need to remember the
> prepare LSN and other information even when we skip prepare and then
> use it while sending the prepare during commit prepared.
>
> Now, after fixing this, I discovered another issue which is that we
> allow adding a new snapshot to prepared transactions via
> SnapBuildDistributeNewCatalogSnapshot. We can only allow it to get
> added to in-progress transactions. If you comment out the changes
> added in SnapBuildDistributeNewCatalogSnapshot then you will notice
> one test failure which indicates this problem. This problem was not
> evident before the bug-fix in the previous paragraph because you were
> using commit-lsn even for the prepare and newly added snapshot change
> appears to be before the end_lsn.
>
> Some other assorted changes in various patches:
> v31-0001-Extend-the-output-plugin-API-to-allow-decoding-o
> 1. I have changed the filter_prepare API to match the signature with
> FilterByOrigin. I don't see the need for ReorderBufferTxn or xid in
> the API.
> 2. I have expanded the documentation of 'Begin Prepare Callback' to
> explain how a user can use it to detect already prepared transactions
> and in which scenarios that can happen.
> 3. Added a few comments in the code atop begin_prepare_cb_wrapper to
> explain why we are adding this new API.
> 4. Move the check whether the filter_prepare callback is defined from
> filter_prepare_cb_wrapper to caller. This is similar to how
> FilterByOrigin works.
> 5. Fixed various whitespace and cosmetic issues.
> 6. Update commit message to include two of the newly added APIs
>
> v31-0002-Allow-decoding-at-prepare-time-in-ReorderBuffer
> 1. Changed the variable names and comments in DecodeXactOp.
> 2. A new API for FilterPrepare similar to FilterByOrigin and use that
> instead of ReorderBufferPrepareNeedSkip.
> 3. In DecodeCommit, we need to update the reorderbuffer about the
> surviving subtransactions for both ReorderBufferFinishPrepared and
> ReorderBufferCommit because now both can process the transaction.
> 4. Because, now we need to remember the prepare info even when we skip
> it, I have simplified ReorderBufferPrepare API by removing the extra
> parameters as that information will be now available via
> ReorderBufferTxn.
> 5. Updated comments at various places.
>
> v31-0006-Support-2PC-txn-pgoutput
> 1. Added Asserts in streaming APIs on the subscriber-side to ensure
> that we should never reach there for the already prepared transaction
> case. We never need to stream the changes when we are skipping prepare
> either because the snapshot was not consistent by that time or we have
> already sent those changes before restart. Added the same Assert in
> Begin and Commit routines because while skipping prepared txn, we must
> not receive the changes of any other xact.
> 2.
> + /*
> + * Flags are determined from the state of the transaction. We know we
> + * always get PREPARE first and then [COMMIT|ROLLBACK] PREPARED, so if
> + * it's already marked as committed then it has to be COMMIT PREPARED (and
> + * likewise for abort / ROLLBACK PREPARED).
> + */
> + if (rbtxn_commit_prepared(txn))
> + flags = LOGICALREP_IS_COMMIT_PREPARED;
> + else if (rbtxn_rollback_prepared(txn))
> + flags = LOGICALREP_IS_ROLLBACK_PREPARED;
> + else
> + flags = LOGICALREP_IS_PREPARE;
>
> I don't

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao  
wrote in 
> > I pushed the following two patches.
> > - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
> > - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
> 
> As I told in other thread [1], I'm thinking to revert this patch
> because this change could have bad side-effect on the startup
> process waiting for recovery conflict.
> 
> Before applying the patch, the latch that the startup process
> used to wait for recovery conflict was different from the latch
> that SIGHUP signal handler or walreceiver process, etc set to
> wake the startup process up. So SIGHUP or walreceiver didn't
> wake the startup process waiting for recovery conflict up unnecessary.
>
> But the patch got rid of the dedicated latch for signaling
> the startup process. This change forced us to use the same latch
> to make the startup process wait or wake up. Which caused SIGHUP
> signal handler or walreceiver proces to wake the startup process
> waiting on the latch for recovery conflict up unnecessarily
> frequently.
> 
> While waiting for recovery conflict on buffer pin, deadlock needs
> to be checked at least every deadlock_timeout. But that frequent
> wakeups could prevent the deadlock timer from being triggered and
> could delay that deadlock checks.

I thought that spurious wakeups don't harm. But actually
ResolveRecoveryConflictWithBufferPin doesn't consider spurious
wakeups.  Only the timer woke up ResolveRecoveryConflictWithBufferPin
before the patch comes. Currently SIGHUP and XLogFlush() (on
walreceiver) also wake up startup process.

For a moment I thought that ResolveRecoveryConflictWithBufferPin
should wake up at shutdown time by the old recovery latch but it's not
the case since it wakes up after all blockers go away.  It seems to me
simpler to revert the patches than making the function properly handle
spurious wakeups.

> Therefore, I'm thinking to revert the commit ac22929a26 and
> 113d3591b8.
> 
> [1]
> https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726...@oss.nttdata.com

As the result, I agree to revert them. But I think we need to add a
comment for the reason we don't use MyLatch for recovery-wakeup after
reverting them.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Refactoring HMAC in the core code

Hi all,
(Added Bruce and Daniel in CC:)

$subject has been mentioned a couple of times lately, mainly by me for
the recent cryptohash refactoring that has been done.  We use in the
core code HMAC with SHA256 for SCRAM, but this logic should go through
SSL libraries able to support it (OpenSSL and libnss allow that) so as
requirements like FIPS can be pushed down to any lower-level library
we are building with and not Postgres.

FWIW, I have also bumped into this stuff as being a requirement for
the recent thread about file-level encryption in [1] where the code
makes use of HMAC with SHA512.

So, please find attached a patch set to rework all that.  This
provides a split similar to what I have done recently for cryptohash
functions, with a fallback implementation located as of
src/common/hmac.c, that depends itself on the fallback implementations
of cryptohash functions.  The OpenSSL part is done hmac_openssl.c.

There are five APIs to be able to plug in HMAC implementations to
create, initialize, update, finalize and free a HMAC context, in a
fashion similar to cryptohashes.

Regarding OpenSSL, upstream has changed lately the way it is possible
to control HMACs.  3.0.0 has introduced a new set of APIs, with
compatibility macros for older versions, as mentioned here:
https://www.openssl.org/docs/manmaster/man3/EVP_MAC_CTX_new.html
The new APIs are named EVP_MAC_CTX_new() and such.

I think that this is a bit too new to use though, as we need to
support OpenSSL down to 1.0.1 on HEAD and because there are
compatibility macros.  So instead I have decided to rely on the older
interface based on HMAC_Init_ex() and co:
https://www.openssl.org/docs/manmaster/man3/HMAC.html

After that there is another point to note.  In 1.1.0, any consumers of
HMAC *have* to let OpenSSL allocate the HMAC context, like
cryptohashes because the internals of the HMAC context are only known
to OpenSSL.  In 1.0.2 and older versions, it is possible to have
access to HMAC_CTX.  This requires an extra tweak in hmac_openssl.c
where we need to {m,p}alloc by ourselves instead of calling
HMAC_CTX_new() for 1.1.0 and 1.1.1 but some extra configure switches
are able to do the trick.  That means that we could use resowners only
when building with OpenSSL >= 1.1.0, and not for older versions but
note that the patch uses resowners anyway, as a matter of simplicity.
As the changes are local to a single file, that's easy enough to
follow and update.  It would be easy enough to rip off this code once
support for older OpenSSL versions is removed.

Please note that I have added code that should be enough for the
compilation on Windows, but I have not taken the time to check that.
I have checked that things compiled and that check-world passes
with and without OpenSSL 1.1.1 on Linux though, so I guess that I have
not messed up too badly.  This stuff requires much more tests, like
making sure that we are able to connect to PG correctly with SCRAM
when using combinations like libpq based on OpenSSL and the backend
using the fallback, or just check the consistency of the results of
computations with SQL functions or such.  An extra thing that can be
done is to clean up pgcrypto's px-hmac.c but this also requires SHA1
in cryptohash.c, something that I have submitted separately in [2].
So this could just be done later.  This patch has updated the code of
SCRAM so as we don't use anymore all the SCRAM/HMAC business but the
generic HMAC routines instead for this work.

Thoughts are welcome.  I am adding that to the next CF.

[1]: https://www.postgresql.org/message-id/X9lhi1ht04I+v/r...@paquier.xyz
[2]: https://commitfest.postgresql.org/31/2868/

Thanks,
--
Michael
From 26f9375c72a512640711953b6b52f227863e9113 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 16 Dec 2020 16:15:05 +0900
Subject: [PATCH] Refactor HMAC implementations

---
 src/include/common/hmac.h |  33 +++
 src/include/common/md5.h  |   2 +
 src/include/common/scram-common.h |  13 --
 src/include/pg_config.h.in|   6 +
 src/include/utils/resowner_private.h  |   7 +
 src/backend/libpq/auth-scram.c|  61 ++---
 src/backend/utils/resowner/resowner.c |  61 +
 src/common/Makefile   |   4 +-
 src/common/hmac.c | 312 ++
 src/common/hmac_openssl.c | 242 
 src/common/scram-common.c | 158 +++--
 src/interfaces/libpq/fe-auth-scram.c  |  70 +++---
 configure |   2 +-
 configure.ac  |   2 +-
 src/tools/msvc/Mkvcbuild.pm   |   2 +
 src/tools/msvc/Solution.pm|   4 +
 src/tools/pgindent/typedefs.list  |   1 -
 17 files changed, 782 insertions(+), 198 deletions(-)
 create mode 100644 src/include/common/hmac.h
 create mode 100644 src/common/hmac.c
 create mode 100644 src/common/hmac_openssl.c

diff --git a/src/include/common/hmac.h b/src/include/common/hmac.h
new file

Re: Add Information during standby recovery conflicts

At Wed, 16 Dec 2020 14:49:25 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > > > Conflicting processes are 41171, 41194.
> > > > Conflicting processes are: 41171, 41194.
> 
> Or I came up with the following after scanning throught the tree.
> 
> | Some processes are conflicting: 41171, 41194.

Or

Some processes are blocking recovery progress: 41171, 41194.

?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add Information during standby recovery conflicts

At Wed, 16 Dec 2020 12:08:31 +0900, Masahiko Sawada 
 wrote in

On Wed, Dec 16, 2020 at 11:22 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao 
 wrote in

> >
> >
> > On 2020/12/15 12:04, Kyotaro Horiguchi wrote:
> > > [40509:startup] DETAIL:  Conflicting processes: 41171, 41194.
> ...
> > > I'm not sure, but it seems to me at least the period is 
unnecessary

> > > here.
> >
> > Since Error Message Style Guide in the docs says "Detail and 
hint

> > messages:
> > Use complete sentences, and end each with a period.", I think 
that a

> > period
> > is necessary here. No?
>
> In the first place it is not a complete sentence. Might be better 
be

> something like this if we strictly follow the style guide?

FWIW I borrowed the message style in errdetail from log messages in 
ProcSleep():

(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
MyProcPid, modename, buf.data, msecs, usecs),
 (errdetail_log_plural("Process holding the lock: %s. Wait queue: 
%s.",
   "Processes holding the lock: %s. Wait queue: 
%s.",

   lockHoldersNum, lock_holders_sbuf.data,
lock_waiters_sbuf.data;

I was guessing that was the case.

> > Conflicting processes are 41171, 41194.
> > Conflicting processes are: 41171, 41194.

Or I came up with the following after scanning throught the tree.

| Some processes are conflicting: 41171, 41194.

If we use the above message we might want to change other similar
messages I exemplified as well.

I'm not sure what should we do for other anomalies. Other errdetails
of this category (incomplete sentences or the absence of a period) I
found are:

-- period is absent

pgarch.c",
auth.c

Re: Add Information during standby recovery conflicts

On Wed, Dec 16, 2020 at 11:22 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao  
> wrote in
> >
> >
> > On 2020/12/15 12:04, Kyotaro Horiguchi wrote:
> > > [40509:startup] DETAIL:  Conflicting processes: 41171, 41194.
> ...
> > > I'm not sure, but it seems to me at least the period is unnecessary
> > > here.
> >
> > Since Error Message Style Guide in the docs says "Detail and hint
> > messages:
> > Use complete sentences, and end each with a period.", I think that a
> > period
> > is necessary here. No?
>
> In the first place it is not a complete sentence. Might be better be
> something like this if we strictly follow the style guide?

FWIW I borrowed the message style in errdetail from log messages in ProcSleep():

(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
MyProcPid, modename, buf.data, msecs, usecs),
 (errdetail_log_plural("Process holding the lock: %s. Wait queue: %s.",
   "Processes holding the lock: %s. Wait queue: %s.",
   lockHoldersNum, lock_holders_sbuf.data,
lock_waiters_sbuf.data;

> > Conflicting processes are 41171, 41194.
> > Conflicting processes are: 41171, 41194.

If we use the above message we might want to change other similar
messages I exemplified as well.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: Add Information during standby recovery conflicts

On Mon, Dec 14, 2020 at 9:31 PM Fujii Masao  wrote:
>
>
>
> On 2020/12/05 12:38, Masahiko Sawada wrote:
> > On Fri, Dec 4, 2020 at 7:22 PM Drouvot, Bertrand  
> > wrote:
> >>
> >> Hi,
> >>
> >> On 12/4/20 2:21 AM, Fujii Masao wrote:
> >>>
> >>> On 2020/12/04 9:28, Masahiko Sawada wrote:
>  On Fri, Dec 4, 2020 at 2:54 AM Fujii Masao
>   wrote:
> >
> >
> >
> > On 2020/12/01 17:29, Drouvot, Bertrand wrote:
> >> Hi,
> >>
> >> On 12/1/20 12:35 AM, Masahiko Sawada wrote:
> >>> CAUTION: This email originated from outside of the organization.
> >>> Do not click links or open attachments unless you can confirm the
> >>> sender and know the content is safe.
> >>>
> >>>
> >>>
> >>> On Tue, Dec 1, 2020 at 3:25 AM Alvaro Herrera
> >>>  wrote:
>  On 2020-Dec-01, Fujii Masao wrote:
> 
> > + if (proc)
> > + {
> > + if (nprocs == 0)
> > + appendStringInfo(&buf, "%d", proc->pid);
> > + else
> > + appendStringInfo(&buf, ", %d", proc->pid);
> > +
> > + nprocs++;
> >
> > What happens if all the backends in wait_list have gone? In
> > other words,
> > how should we handle the case where nprocs == 0 (i.e., nprocs
> > has not been
> > incrmented at all)? This would very rarely happen, but can happen.
> > In this case, since buf.data is empty, at least there seems no
> > need to log
> > the list of conflicting processes in detail message.
>  Yes, I noticed this too; this can be simplified by changing the
>  condition in the ereport() call to be "nprocs > 0" (rather than
>  wait_list being null), otherwise not print the errdetail.  (You
>  could
>  test buf.data or buf.len instead, but that seems uglier to me.)
> >>> +1
> >>>
> >>> Maybe we can also improve the comment of this function from:
> >>>
> >>> + * This function also reports the details about the conflicting
> >>> + * process ids if *wait_list is not NULL.
> >>>
> >>> to " This function also reports the details about the conflicting
> >>> process ids if exist" or something.
> >>>
> >> Thank you all for the review/remarks.
> >>
> >> They have been addressed in the new attached patch version.
> >
> > Thanks for updating the patch! I read through the patch again
> > and applied the following chages to it. Attached is the updated
> > version of the patch. Could you review this version? If there is
> > no issue in it, I'm thinking to commit this version.
> 
>  Thank you for updating the patch! I have one question.
> 
> >
> > +   timeouts[cnt].id = STANDBY_TIMEOUT;
> > +   timeouts[cnt].type = TMPARAM_AFTER;
> > +   timeouts[cnt].delay_ms = DeadlockTimeout;
> >
> > Maybe STANDBY_TIMEOUT should be STANDBY_DEADLOCK_TIMEOUT here?
> > I changed the code that way.
> 
>  As the comment of ResolveRecoveryConflictWithLock() says the
>  following, a deadlock is detected by the ordinary backend process:
> 
> * Deadlocks involving the Startup process and an ordinary backend
>  proces
> * will be detected by the deadlock detector within the ordinary
>  backend.
> 
>  If we use STANDBY_DEADLOCK_TIMEOUT,
>  SendRecoveryConflictWithBufferPin() will be called after
>  DeadlockTimeout passed, but I think it's not necessary for the startup
>  process in this case.
> >>>
> >>> Thanks for pointing this! You are right.
> >>>
> >>>
>  If we want to just wake up the startup process
>  maybe we can use STANDBY_TIMEOUT here?
> >>>
> >> Thanks for the patch updates! Except what we are still discussing below,
> >> it looks good to me.
> >>
> >>> When STANDBY_TIMEOUT happens, a request to release conflicting buffer
> >>> pins is sent. Right? If so, we should not also use STANDBY_TIMEOUT there?
> >>
> >> Agree
> >>
> >>>
> >>> Or, first of all, we don't need to enable the deadlock timer at all?
> >>> Since what we'd like to do is to wake up after deadlock_timeout
> >>> passes, we can do that by changing ProcWaitForSignal() so that it can
> >>> accept the timeout and giving the deadlock_timeout to it. If we do
> >>> this, maybe we can get rid of STANDBY_LOCK_TIMEOUT from
> >>> ResolveRecoveryConflictWithLock(). Thought?
> >
> > Where do we enable deadlock timeout in hot standby case? You meant to
> > enable it in ProcWaitForSignal() or where we set a timer for not hot
> > standby case, in ProcSleep()?
>
> No, what I tried to say is to change ResolveRecoveryConflictWithLock() so 
> that it does
>
> 1. calculate the "minimum" timeout from deadlock_timeout and 
> max_standby_xxx_delay
> 2. give the

Re: xid wraparound danger due to INDEX_CLEANUP false

On Sat, Nov 21, 2020 at 8:03 AM Peter Geoghegan  wrote:
>
> On Fri, Nov 20, 2020 at 2:17 PM Robert Haas  wrote:
> > > Does that make sense?
> >
> > I *think* so. For me the point is that the index never has a right to
> > insist on being vacuumed, but it can offer an opinion on how helpful
> > it would be.
>
> Right, that might be the single most important point. It's a somewhat
> more bottom-up direction for VACUUM, that is still fundamentally
> top-down. Because that's still necessary.
>
> Opportunistic heap pruning is usually very effective, so today we
> realistically have these 4 byte line pointers accumulating in heap
> pages. The corresponding "bloatum" in index pages is an index tuple +
> line pointer (at least 16 bytes + 4 bytes). Meaning that we accumulate
> that *at least* 20 bytes for each 4 bytes in the table. And, indexes
> care about *where* items go, making the problem even worse. So in the
> absence of index tuple LP_DEAD setting/deletion (or bottom-up index
> deletion in Postgres 14), the problem in indexes is probably at least
> 5x worse.
>
> The precise extent to which this is true will vary. It's a mistake to
> try to reason about it at a high level, because there is just too much
> variation for that approach to work. We should just give index access
> methods *some* say. Sometimes this allows index vacuuming to be very
> lazy, other times it allows index vacuuming to be very eager. Often
> this variation exists among indexes on the same table.
>
> Of course, vacuumlazy.c is still responsible for not letting the
> accumulation of LP_DEAD heap line pointers get out of hand (without
> allowing index TIDs to point to the wrong thing due to dangerous TID
> recycling issues/bugs). The accumulation of LP_DEAD heap line pointers
> will often take a very long time to get out of hand. But when it does
> finally get out of hand, index access methods don't get to veto being
> vacuumed. Because this isn't actually about their needs anymore.
>
> Actually, the index access methods never truly veto anything. They
> merely give some abstract signal about how urgent it is to them (like
> the 0.0 - 1.0 thing). This difference actually matters. One index
> among many on a table saying "meh, I guess I could benefit from some
> index vacuuming if it's no real trouble to you vacuumlazy.c" rather
> than saying "it's absolutely unnecessary, don't waste CPU cycles
> vacuumlazy.c" may actually shift how vacuumlazy.c processes the heap
> (at least occasionally). Maybe the high level VACUUM operation decides
> that it is worth taking care of everything all at once -- if all the
> indexes together either say "meh" or "now would be a good time", and
> vacuumlazy.c then notices that the accumulation of LP_DEAD line
> pointers is *also* becoming a problem (it's also a "meh" situation),
> then it can be *more* ambitious. It can do a traditional VACUUM early.
> Which might still make sense.
>
> This also means that vacuumlazy.c would ideally think about this as an
> optimization problem. It may be lazy or eager for the whole table,
> just as it may be lazy or eager for individual indexes. (Though the
> eagerness/laziness dynamic is probably much more noticeable with
> indexes in practice.)
>

I discussed this topic off-list with Peter Geoghegan. And we think
that we can lead this fix to future improvement. Let me summarize the
proposals.

The first proposal is the fix of this inappropriate behavior discussed
on this thread. We pass a new flag in calling bulkdelete(), indicating
whether or not the index can safely skip this bulkdelete() call. This
is equivalent to whether or not lazy vacuum will do the heap clean
(making LP_DEAD LP_UNUSED in lazy_vacuum_heap()). If it's true
(meaning to do heap clean), since dead tuples referenced by index
tuples will be physically removed, index AM would have to delete the
index tuples. If it's false, we call to bulkdelete() with this flag so
that index AM can safely skip bulkdelete(). Of course index AM also
can dare not to skip it because of its personal reason. Index AM
including BRIN that doesn't store heap TID can decide whether or not
to do regardless of this flag.

The next proposal upon the above proposal is to add a new index AM
API, say ambulkdeletestrategy(), which is called before bulkdelete()
for each index and asks the index bulk-deletion strategy. In this API,
lazy vacuum asks, "Hey index X, I collected garbage heap tuples during
heap scanning, how urgent is vacuuming for you?", and the index
answers either "it's urgent" when it wants to do bulk-deletion or
"it's not urgent, I can skip it". The point of this proposal is to
isolate heap vacuum and index vacuum for each index so that we can
employ different strategies for each index. Lazy vacuum can decide
whether or not to do heap clean based on the answers from the indexes.
Lazy vacuum can set the flag I proposed above according to the
decision. If all indexes answer 'yes'  (meaning it will do
bulkdelete()), lazy vacuum

Re: Misleading comment in prologue of ReorderBufferQueueMessage

2020-12-15 Thread Amit Kapila

On Tue, Dec 15, 2020 at 11:25 AM Ashutosh Bapat
 wrote:
>
> On Mon, Dec 14, 2020 at 3:14 PM Amit Kapila  wrote:
>>
>> On Mon, Dec 14, 2020 at 2:45 PM Ashutosh Bapat
>>  wrote:
>> >
>> > The name of the function suggests that the given message will be queued in 
>> > ReorderBuffer. The prologue of the function says so too
>> >  776 /*
>> >  777  * Queue message into a transaction so it can be processed upon 
>> > commit.
>> >  778  */
>> > It led me to think that a non-transactional message is processed along 
>> > with the surrounding transaction, esp. when it has an associated xid.
>> >
>> > But in reality, the function queues only a transactional message and 
>> > decoders a non-transactional message immediately without waiting for a 
>> > commit.
>> >
>> > We should modify the prologue to say
>> > "Queue a transactional message into a transaction so that it can be 
>> > processed upon commit. A non-transactional message is processed 
>> > immediately." and also change the name of the function to 
>> > ReorderBufferProcessMessage(), but the later may break API compatibility.
>> >
>>
>> +1 for the comment change but I am not sure if it is a good idea to
>> change the API name.
>>
> Can you please review wording? I will create a patch with updated wording.
>

How about something like below:
A transactional message is queued to be processed upon commit and a
non-transactional message gets processed immediately.
OR
A transactional message is queued so it can be processed upon commit
and a non-transactional message gets processed immediately.

-- 
With Regards,
Amit Kapila.

Re: Add Information during standby recovery conflicts

At Tue, 15 Dec 2020 15:40:03 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/12/15 12:04, Kyotaro Horiguchi wrote:
> > [40509:startup] DETAIL:  Conflicting processes: 41171, 41194.
...
> > I'm not sure, but it seems to me at least the period is unnecessary
> > here.
> 
> Since Error Message Style Guide in the docs says "Detail and hint
> messages:
> Use complete sentences, and end each with a period.", I think that a
> period
> is necessary here. No?

In the first place it is not a complete sentence. Might be better be
something like this if we strictly follow the style guide?

> Conflicting processes are 41171, 41194.
> Conflicting processes are: 41171, 41194.

> 
> > +   boolmaybe_log_conflict =
> > +   (standbyWaitStart != 0 && !logged_recovery_conflict);
> > odd indentation.
> 
> This is the result of pgindent run. I'm not sure why pgindent indents
> that way, but for now I'd like to follow pgindent.

Agreed.

> > +   /* Also, set the timer if necessary */
> > +   if (logging_timer)
> > +   {
> > +   timeouts[cnt].id = STANDBY_LOCK_TIMEOUT;
> > +   timeouts[cnt].type = TMPARAM_AFTER;
> > +   timeouts[cnt].delay_ms = DeadlockTimeout;
> > +   cnt++;
> > +   }
> > This doesn't consider spurious wakeup. I'm not sure it actually
> > happenes but we usually consider that.  That is since before this
> > patch, but ProcWaitForSignal()'s comment says that:
> > 
> >>   * As this uses the generic process latch the caller has to be robust
> >>   * against
> >>   * unrelated wakeups: Always check that the desired state has occurred,
> >>   * and
> >>   * wait again if not.
> > If we don't care of spurious wakeups, we should add such a comment.
> 
> If ProcWaitForSignal() wakes up because of the reason (e.g., SIGHUP)
> other than deadlock_timeout, ProcSleep() will call
> ResolveRecoveryConflictWithLock() and we sleep on ProcWaitForSignal()
> again since the recovery conflict has not been resolved yet. So we can
> say that we consider "spurious wakeups"?

So, the following seems to be spurious wakeups..

> However when I read the related code again, I found another issue in
> the patch. In ResolveRecoveryConflictWithLock(), if SIGHUP causes us
> to
> wake up out of ProcWaitForSignal() before deadlock_timeout is reached,
> we will use deadlock_timeout again when sleeping on
> ProcWaitForSignal().
> Instead, probably we should use the "deadlock_timeout - elapsed time"
> so that we can emit a log message as soon as deadlock_timeout passes
> since starting waiting on recovery conflict. Otherwise it may take at
> most
> "deadlock_timeout * 2" to log "still waiting ..." message.
> 
> To fix this issue, "deadlock_timeout - elapsed time" needs to be used
> as
> the timeout when enabling the timer at least in
> ResolveRecoveryConflictWithLock() and
> ResolveRecoveryConflictWithBufferPin().
> Also similar change needs to be applied to
> ResolveRecoveryConflictWithVirtualXIDs().
> 
> BTW, without applying the patch, *originally*
> ResolveRecoveryConflictWithBufferPin() seems to have this issue.
> It enables deadlock_timeout timer so as to request for hot-standbfy
> backends to check themselves for deadlocks. But if we wake up out of
> ProcWaitForSignal() before deadlock_timeout is reached, the subsequent
> call to ResolveRecoveryConflictWithBufferPin() also uses
> deadlock_timeout
> again instead of "deadlock_timeout - elapsed time". So the request for
> deadlock check can be delayed. Furthermore,
> if ResolveRecoveryConflictWithBufferPin() always wakes up out of
> ProcWaitForSignal() before deadlock_timeout is reached, the request
> for deadlock check may also never happen infinitely.
> 
> Maybe we should fix the original issue at first separately from the
> patch.

Yeah, it's not an issue of this patch.

> > +   boolmaybe_log_conflict;
> > +   boolmaybe_update_title;
> > Although it should be a matter of taste and I understand that the
> > "maybe" means that "that logging and changing of ps display may not
> > happen in this iteration" , that variables seem expressing
> > respectively "we should write log if the timeout for recovery conflict
> > expires" and "we should update title if 500ms elapsed". So they seem
> > *to me* better be just "log_conflict" and "update_title".
> > I feel the same with  "maybe_log_conflict" in ProcSleep().
> 
> I have no strong opinion about those names. So if other people also
> think so, I'm ok to rename them.

Shorter is better as far as it makes sense and not-too abbreviated.

> > + for recovery conflicts.  This is useful in determining if recovery
> > +conflicts prevent the recovery from applying WAL.
> > (I'm not confident on this) Isn't the sentence better be in past or
> > present continuous tense?
> 
> Could you tell me why you think that's better?

To make sure, I mention

Re: archive status ".ready" files may be created too early

At Tue, 15 Dec 2020 19:32:57 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Mon, 14 Dec 2020 18:25:23 +, "Bossart, Nathan"  
> wrote in 
> > I wonder if these are safe assumptions to make.  For your example, if
> > we've written record B to the WAL buffers, but neither record A nor B
> > have been written to disk or flushed, aren't we still in trouble?
> 
> You're right in that regard. There's a window where partial record is
> written when write location passes F0 after insertion location passes
> F1. However, remembering all spanning records seems overkilling to me.
> 
> I modifed the previous patch so that it remembers the start LSN of the
> *oldest* corss-segment continuation record in the last consecutive
> bonded segments, and the end LSN of the latest cross-segmetn
> continuation record. This doesn't foreget past segment boundaries.
> The region is cleard when WAL-write LSN goes beyond the remembered end
> LSN.  So the region may contain several wal-segments that are not
> connected to the next one, but that doesn't matter so much.

Mmm. Even tough it'a PoC, it was too bogus. I fixed it to work saner
way.

- Record the beginning LSN of the first cross-seg record and the end
  LSN of the last cross-seg recrod in a consecutive segments bonded by
  cross-seg recrods. Spcifically X and Y below.

   X Z Y
   [recA]  [recB] [recC]
  [seg A] [seg B] [seg C] [seg D] [seg E]
(1)(2.2)(2.2)  (2.1)   (2.1)   (1)

1. If we wrote upto before X or beyond Y at a segment boundary, notify
  the finished segment immediately.

  1.1. If we have written beyond Y, clear the recorded region.

2. Otherwise we don't notify the segment immediately:

  2.1. If write request was up to exactly the current segment boundary
and we know the end LSN of the record there (that is, it is recC
above), extend the request to the end LSN. Then notify the segment
after the record is written to the end.

  2.2. Otherwise (that is recA or recB), we don't know whether the
last record of the last segment is ends just at the segment boundary
(Z) or a record spans between segments (recB). Anyway even if there
is such a record there, we don't know where it ends.  As the result
what we can do there is only to refrain from notifying. It doesn't
matter so much since we have already inserted recC so we will soon
reach recC and will notify up to seg C.

There might be a case where we insert up to Y before writing up to Z,
the segment-region X-Y contains non-connected segment boundary in that
case. It is processed as if it is a connected segment
boundary. However, like 2.2 above, It doesn't matter since we write up
to Y soon.

At Tue, 15 Dec 2020 19:32:57 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> I added an assertion that a record must be shorter than a wal segment
me> to XLogRecordAssemble(). This guarantees the assumption to be true?
me> (The condition is tentative, would need to be adjusted.)

Changed the assertion more direct way.

me> Also, the attached is a PoC.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 50b3b05dd0eed79cd0b97991e82090b9d569cbac Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 16 Dec 2020 10:36:14 +0900
Subject: [PATCH v4 1/2] Avoid archiving a WAL segment that continues to the
 next  segment

If the last record of a finshed segment continues to the next segment,
we need to defer archiving of the segment until the record is flushed
to the end. Otherwise crash recovery can overwrite the last record of
a segment and history diverges between archive and pg_wal.
---
 src/backend/access/transam/xlog.c   | 214 +++-
 src/backend/replication/walsender.c |  14 +-
 src/include/access/xlog.h   |   1 +
 3 files changed, 217 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8dd225c2e1..8705809160 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -723,6 +723,16 @@ typedef struct XLogCtlData
 	 */
 	XLogRecPtr	lastFpwDisableRecPtr;
 
+	/* The last segment notified to be archived. Protected by WALWriteLock */
+	XLogSegNo	lastNotifiedSeg;
+
+	/*
+	 * Remember the oldest and newest known segment that ends with a
+	 * continuation record.
+	 */
+	XLogRecPtr	firstSegContRecStart;
+	XLogRecPtr	lastSegContRecEnd;
+	
 	slock_t		info_lck;		/* locks shared variables shown above */
 } XLogCtlData;
 
@@ -1160,6 +1170,9 @@ XLogInsertRecord(XLogRecData *rdata,
 	 */
 	if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
 	{
+		XLogSegNo startseg;
+		XLogSegNo endseg;
+
 		SpinLockAcquire(&XLogCtl->info_lck);
 		/* advance global request to include new block(s) */
 		if (XLogCtl->LogwrtRqst.Write < EndPos)
@@ -1167,6 +1180,24 @@ XLogInsertRecord(XLogRecData *rdata,
 		/* update local result copy while I have the chance */
 		LogwrtResult = XLogCtl->LogwrtResult;
 		SpinLockRelease(&XLo

Re: pg_shmem_allocations & documentation

On Tue, Dec 15, 2020 at 10:09:35AM +0900, Michael Paquier wrote:
> Both of you seem to agree about having more details about that, which
> is fine by me at the end.  Horiguchi-san, do you have more thoughts to
> offer?  Benoit's version is similar to yours, just simpler.

Okay, applied this one then.  Thanks Benoit and Horigushi-san.
--
Michael


signature.asc
Description: PGP signature

Re: enable_incremental_sort changes query behavior

2020-12-15 Thread Tomas Vondra




On 12/16/20 1:51 AM, Jaime Casanova wrote:
> On Tue, Dec 1, 2020 at 4:08 AM Anastasia Lubennikova
>  wrote:
>>
>> On 01.12.2020 03:08, James Coleman wrote:
>>> On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
>>>  wrote:
 I've pushed the 0001 part, i.e. the main fix. Not sure about the other
 parts (comments and moving the code back to postgres_fdw) yet.
>>> I noticed the CF entry [1] got moved to the next CF; I'm thinking this
>>> entry should be marked as committed since the fix for the initial bug
>>> reported on this thread has been pushed. We have the parallel safety
>>> issue outstanding, but there's a separate thread and patch for that,
>>> so I'll make a new CF entry for that.
>>>
>>> I can mark it as committed, but I'm not sure how to "undo" (or if
>>> that's desirable) the move to the next CF.
>>>
>>> Thoughts?
>>>
>>> James
>>>
>>> 1: https://commitfest.postgresql.org/30/2754/
>>>
>>>
>> Oops...
>> I must have rushed with this one, thank you for noticing.
>> I don't see how to move it back either. I think it's fine to mark it as
>> Committed where it is now.
>>
> 
> BTW, I still see this one as needs review
> 

Hmm, not sure what was the plan, but I'll mark it as committed once I
push the remaining patches (for parallel-safe checks, SRFs etc.) in a
couple days.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix generate_useful_gather_paths for parallel unsafe pathkeys

2020-12-15 Thread Tomas Vondra

Hi,

I reviewed the patch series, tweaked a couple comments, added commit
messages etc. Barring objections, I'll push this in a couple days.

One thing that annoys me is that it breaks ABI because it adds two new
parameters to find_em_expr_usable_for_sorting_rel, but I don't think we
can get around that. We could assume require_parallel_safe=true, but we
still need to pass the PlannerInfo pointer.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 857e653607656d10586e2a0f1b936576513d4277 Mon Sep 17 00:00:00 2001
From: jcoleman 
Date: Fri, 20 Nov 2020 16:19:00 +
Subject: [PATCH 1/5] Consider unsorted paths in generate_useful_gather_paths

generate_useful_gather_paths used to skip unsorted paths (without any
pathkeys), but that is unnecessary - the later code actually can handle
such paths just fine by adding a Sort node. This is clearly a thinko,
preventing construction of useful plans.

Backpatch to 13, where this code was introduced (for Incremental Sort).

Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=h5m8lschj5e5oxp...@mail.gmail.com
---
 src/backend/optimizer/path/allpaths.c  |  8 
 src/test/regress/expected/incremental_sort.out | 13 +
 src/test/regress/sql/incremental_sort.sql  |  4 
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 84a69b064a..672a20760c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2914,14 +2914,6 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
 			Path	   *subpath = (Path *) lfirst(lc2);
 			GatherMergePath *path;
 
-			/*
-			 * If the path has no ordering at all, then we can't use either
-			 * incremental sort or rely on implicit sorting with a gather
-			 * merge.
-			 */
-			if (subpath->pathkeys == NIL)
-continue;
-
 			is_sorted = pathkeys_count_contained_in(useful_pathkeys,
 	subpath->pathkeys,
 	&presorted_keys);
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 7cf2eee7c1..51471ae92d 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1468,6 +1468,19 @@ explain (costs off) select * from t union select * from t order by 1,3;
->  Parallel Seq Scan on t t_1
 (13 rows)
 
+-- Full sort, not just incremental sort can be pushed below a gather merge path
+-- by generate_useful_gather_paths.
+explain (costs off) select distinct a,b from t;
+QUERY PLAN
+--
+ Unique
+   ->  Gather Merge
+ Workers Planned: 2
+ ->  Sort
+   Sort Key: a, b
+   ->  Parallel Seq Scan on t
+(6 rows)
+
 drop table t;
 -- Sort pushdown can't go below where expressions are part of the rel target.
 -- In particular this is interesting for volatile expressions which have to
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 3ee11c394b..cb48f200ce 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -220,6 +220,10 @@ explain (costs off) select a,b,sum(c) from t group by 1,2 order by 1,2,3 limit 1
 set enable_hashagg to off;
 explain (costs off) select * from t union select * from t order by 1,3;
 
+-- Full sort, not just incremental sort can be pushed below a gather merge path
+-- by generate_useful_gather_paths.
+explain (costs off) select distinct a,b from t;
+
 drop table t;
 
 -- Sort pushdown can't go below where expressions are part of the rel target.
-- 
2.26.2

>From 8ed86b447c07bdce8fafe95cdfb1e68286443274 Mon Sep 17 00:00:00 2001
From: James Coleman 
Date: Fri, 20 Nov 2020 20:29:51 -0500
Subject: [PATCH 2/5] Check parallel safety in generate_useful_gather_paths

Commit ebb7ae839d ensured we ignore pathkeys with volatile expressions
when considering adding a sort below a Gather Merge. Turns out we need
to care about parallel safety of the pathkeys too, otherwise we might
try sorting e.g. on results of a correlated subquery (as demonstrated
by a report from Luis Roberto).

Initial investigation by Tom Lane, patch by James Coleman. Backpatch
to 13, where the code was instroduced (as part of Incremental Sort).

Reported-by: Luis Roberto
Author: James Coleman
Reviewed-by: Tomas Vondra
Backpatch-through: 13
Discussion: https://postgr.es/m/622580997.37108180.1604080457319.JavaMail.zimbra%40siscobra.com.br
Discussion: https://postgr.es/m/CAAaqYe8cK3g5CfLC4w7bs=hC0mSksZC=h5m8lschj5e5oxp...@mail.gmail.com
---
 src/backend/optimizer/path/allpaths.c | 13 --
 src/backend/optimizer/path/equivclass.c   |  9 -
 src/include/optimize

Re: Proposed patch for key managment

On Tue, Dec 15, 2020 at 04:02:12PM -0500, Bruce Momjian wrote:
> I thought this was going to be a huge job, but once I looked at it, it
> was clear exactly what you were saying.  Comparing cryptohash.c and
> cryptohash_openssl.c I saw exactly what you wanted, and I think I have
> completed it in the attached patch.  cryptohash.c implemented the hash
> in C code if OpenSSL is not present --- I assume you didn't want me to
> do that, but rather to split the API so it was easy to add another
> implementation.

Hmm.  I don't think that this is right.  First, this still mixes
ciphers and HMAC.  Second, it is only possible here to use HMAC with
SHA512 but we want to be able to use SHA256 as well with SCRAM (per se
the scram_HMAC_* business in scram-common.c).  Third, this lacks a
fallback implementation.  Finally, pgcrypto is not touched, but we
should be able to simplify the area around int_digest_list in
px-hmac.c which lists for HMAC as available options MD5, SHA1 and all
four SHA2.

Please note that we have MD5 and SHA2 as choices in cryptohash.h, and
I have already a patch to add SHA1 to the cryptohash set pending for
review: https://commitfest.postgresql.org/31/2868/.  If all that is
done correctly, a lot of code can be cleaned up while making things
still pluggable with various SSL libraries.
--
Michael

signature.asc
Description: PGP signature

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

On Tue, Dec 15, 2020 at 09:45:17PM -0300, Alvaro Herrera wrote:
> I don't like this idea too much, because adding an option causes an ABI
> break.  I don't think we commonly add options in backbranches, but it
> has happened.  The bitmask is much easier to work with in that regard.

ABI flexibility is a good point here.  I did not consider this point
of view.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Reloptions for table access methods

On Tue, Dec 15, 2020 at 03:59:02PM -0800, Jeff Davis wrote:
> How should that work with the existing code? Should we have separate AM
> hooks for each of these interaction points, and then have the AM figure
> out how to handle its options? What about the toast.* options?

It seems to me that we would want a callback for options that is part
of TableAmRoutine to do the loading work, no?  The bigger picture is
more complex than that though because all in-core AMs rely on
build_reloptions() to do the work which itself depends on a et of
pre-existing reloptions (all the static relOpts array).  In the end,
I'd like to think that we should remove the dependency between relopt
build and initialization state, then switch all the different AMs to
do something similar to create_reloptions_table() in dummy_index_am.c
to define a new set of reloptions, except that we'd want to preload
all the options at backend startup or something similar to that for
all in-core AMs, for tables and indexes.

> It feels like some common options would make sense to avoid too much
> code duplication.

Having a set of options that can be plugged in to any AMs, like a set
of preset options for autovacuum or toast makes sense to have.

> I am not trying to push this in a specific direction, but I don't have
> a lot of good answers, so I went for the simplest thing I could think
> of that would allow an extension to have its own options, even if it's
> a bit hacky. I'm open to alternatives.

FWIW, I agree with Simon's feeling that bypassing a sanity check does
not feel like a good solution in the long term.
--
Michael

signature.asc
Description: PGP signature

Re: enable_incremental_sort changes query behavior

2020-12-15 Thread Jaime Casanova

On Tue, Dec 1, 2020 at 4:08 AM Anastasia Lubennikova
 wrote:
>
> On 01.12.2020 03:08, James Coleman wrote:
> > On Tue, Nov 3, 2020 at 4:39 PM Tomas Vondra
> >  wrote:
> >> I've pushed the 0001 part, i.e. the main fix. Not sure about the other
> >> parts (comments and moving the code back to postgres_fdw) yet.
> > I noticed the CF entry [1] got moved to the next CF; I'm thinking this
> > entry should be marked as committed since the fix for the initial bug
> > reported on this thread has been pushed. We have the parallel safety
> > issue outstanding, but there's a separate thread and patch for that,
> > so I'll make a new CF entry for that.
> >
> > I can mark it as committed, but I'm not sure how to "undo" (or if
> > that's desirable) the move to the next CF.
> >
> > Thoughts?
> >
> > James
> >
> > 1: https://commitfest.postgresql.org/30/2754/
> >
> >
> Oops...
> I must have rushed with this one, thank you for noticing.
> I don't see how to move it back either. I think it's fine to mark it as
> Committed where it is now.
>

BTW, I still see this one as needs review


--
Jaime Casanova
Professional PostgreSQL: Soporte 24x7 y capacitación

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-15 Thread Alvaro Herrera

On 2020-Dec-12, Peter Eisentraut wrote:

> On 2020-12-11 21:27, Alvaro Herrera wrote:
> > By the way--  What did you think of the idea of explictly marking the
> > types used for bitmasks using types bits32 and friends, instead of plain
> > int, which is harder to spot?
> 
> If we want to make it clearer, why not turn the thing into a struct, as in
> the attached patch, and avoid the bit fiddling altogether.

I don't like this idea too much, because adding an option causes an ABI
break.  I don't think we commonly add options in backbranches, but it
has happened.  The bitmask is much easier to work with in that regard.

Re: Proposed patch for key managment

Hi Stephen,

On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote:
> Yeah, looking at what's been done there seems like the right approach to
> me as well, ideally we'd have one set of APIs that'll support all these
> use cases (not unlike what curl does..).

Ooh...  This is interesting.  What did curl do wrong here?  It is
always good to hear from such past experiences.
--
Michael


signature.asc
Description: PGP signature

Re: Minor documentation error regarding streaming replication protocol

On Tue, Dec 15, 2020 at 12:54:51PM -0800, Andres Freund wrote:
> Minor nit: I'd put this into common/string.[ch], besides
> pg_clean_ascii(). Probably renaming it to pg_is_ascii().

Yeah.  There is already one pg_is_ascii_string() in saslprep.c,
is_all_ascii() in  collationcmds.c and string_is_ascii() in
pgp-pgsql.c.  I don't think we need a fourth copy of the same logic,
even if it is dead simple.
--
Michael


signature.asc
Description: PGP signature

Re: REINDEX backend filtering

On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> Is this really a common enough operation that we need it in the main grammar?
> 
> Having the functionality, definitely, but what if it was "just" a
> function instead? So you'd do something like:
> SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> \gexec
> 
> Or even a function that returns the REINDEX commands directly (taking
> a parameter to turn on/off concurrency for example).
> 
> That also seems like it would be easier to make flexible, and just as
> easy to plug into reindexdb?

Having control in the grammar to choose which index to reindex for a
table is very useful when it comes to parallel reindexing, because
it is no-brainer in terms of knowing which index to distribute to one
job or another.  In short, with this grammar you can just issue a set
of REINDEX TABLE commands that we know will not conflict with each
other.  You cannot get that level of control with REINDEX INDEX as it
may be possible that more or more commands conflict if they work on an
index of the same relation because it is required to take lock also on
the parent table.  Of course, we could decide to implement a
redistribution logic in all frontend tools that need such things, like
reindexdb, but that's not something I think we should let the client 
decide of.  A backend-side filtering is IMO much simpler, less code,
and more elegant.
--
Michael


signature.asc
Description: PGP signature

Re: copy.sgml and partitioned tables

On Thu, Dec  3, 2020 at 03:17:23PM -0600, Justin Pryzby wrote:
> https://www.postgresql.org/docs/current/sql-copy.html
> |. COPY FROM can be used with plain, foreign, or partitioned tables or with 
> views that have INSTEAD OF INSERT triggers.
> |. COPY only deals with the specific table named; IT DOES NOT COPY DATA TO OR 
> FROM CHILD TABLES. ...
> 
> That language in commit 854b5eb51 was never updated since partitioning was
> added, so I propose this.
> 
> I'm not sure, but maybe it should still say that "COPY TO does not copy data 
> to
> child tables of inheritance hierarchies."

I reworded it slightly, attached, and applied it back to PG 10, where we
added the partition syntax.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 369342b74d..0fca6583af 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -413,10 +413,16 @@ COPY count
   Notes
 

-COPY TO can only be used with plain tables, not
-with views.  However, you can write COPY (SELECT * FROM
-viewname) TO ...
-to copy the current contents of a view.
+COPY TO can be used only with plain
+tables, not views, and does not copy rows from child tables
+or child partitions.  For example, COPY table TO copies
+the same rows as SELECT * FROM ONLY table.
+The syntax COPY (SELECT * FROM table) TO ... can be used to
+dump all of the rows in an inheritance hierarchy, partitioned table,
+or view.

 

@@ -425,16 +431,6 @@ COPY count
 INSTEAD OF INSERT triggers.

 
-   
-COPY only deals with the specific table named;
-it does not copy data to or from child tables.  Thus for example
-COPY table TO
-shows the same data as SELECT * FROM ONLY table.  But COPY
-(SELECT * FROM table) TO ...
-can be used to dump all of the data in an inheritance hierarchy.
-   
-

 You must have select privilege on the table
 whose values are read by COPY TO, and

Re: Proposed patch for key managment

On Tue, Dec 15, 2020 at 11:13:12PM +, Alastair Turner wrote:
> Since it's exciting stuff, I've been trying to lash together a PoC integration
> with the crypto infrastructure we see at these customers. Unfortunately, in
> short, the API doesn't seem to suit integration with HSM big iron, like 
> Thales,
> Utimaco, (other options are available), or their cloudy lookalikes.
> 
> I understand the model of wrapping the Data Encryption Key and storing the
> wrapped key with the encrypted data. The thing I can't find support for in 
> your
> patch is passing a wrapped DEK to an external system for decryption. A key
> feature of these key management approaches is that the application handling 
> the
> encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back after
> unwrapping it. Google have a neat illustration of their approach, which is 
> very
> similar to others, at https://cloud.google.com/kms/docs/envelope-encryption#
> how_to_decrypt_data_using_envelope_encryption
> 
> So instead of calling out to a passphrase command which returns input for a
> PBKDF, Postgres (in the form of initdb or postmaster) should be passing the
> wrapped DEK and getting back the DEK in plain. The value passed from Postgres
> to the external process doesn't even have to be a wrapped DEK, it could be a
> key in the key->value sense, for use in a lookup against Vault, CredHub or the
> Kubernetes secret store. Making the stored keys opaque to the Postgres
> processes and pushing the task of turning them into cryptographic material to
> an external hepler process probably wouldn't shrink the patch overall, but it
> would move a lot of code from core processes into the helper. Maybe that 
> helper
> should live in contrib, as discussed below, where it would hopefully be joined
> by a bridge for KMIP and others.

Well, I think we can go two ways with this.  First, if you look at my
attached Yubikey script, you will see that, at boot time, since
$DIR/yubipass.key does not exit, the script generates a passphrase,
wraps it with the Yubikey's PIV #3 public key, and stores it in the live
key directory.  The next time it is called, it decrypts the wrapped
passphrase using the Yubikey, and uses that as the KEK to decrypt the
two DEKs.  Now, you could modify the script to call the HSM at boot time
to retrieve a DEK wrapped in a KEK, and store it in the key directory. 
On all future calls, you can pass the DEK wrapped by KEK to the HSM, and
use the returned DEK as the KEK/passphrase to decrypt the real DEK.  The
big value of this is that the keys are validated, and it uses the
existing API.  Ideally we write a sample script of how to this and
include it in /contrib.

The second approach is to make a new API for what you want.  It would be
similar to the cluster_passphrase_command, but it would be simple in
some ways, but complex in others since we need at least two DEKs, and
key rotation might be very risky since there isn't validation in the
server.  I don't think this can be accomplished by a contrib module, but
would actually be easy to implement, but perhaps hard to document
because the user API might be tricky.

I think my big question is about your sentence, "A key feature of these
key management approaches is that the application handling the
encrypted data doesn't get the KEK, the HSM/KSM passes the DEK back
after unwrapping it."  It is less secure for the HSM to return a KEK
rather than a DEK?  I can't see why it would be.  The application never
sees the KEK used to wrap the DEK it has stored in the file system,
though that DEK is actually used as a passphrase by Postgres.  This is
the same with the Yubikey --- Postgres never sees the private key used
to unlock what it locked with the Yubikey public key.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

pass_yubi_nopin.sh
Description: Bourne shell script

Re: Reloptions for table access methods

2020-12-15 Thread Jeff Davis

On Tue, 2020-12-15 at 17:37 +, Simon Riggs wrote:
> 
> I definitely don't agree with the premise that all existing heap
> options should be common across all new or extension tableAMs. There
> are dozens of such options and I don't believe that they would all
> have meaning in all future storage plugins.

I agree in principle, but looking at the options that are present
today, a lot of them are integrated into general database features,
like autovacuum, toast, logical replication, and parellel query
planning. 

We need to have some answer about how these features should interact
with a custom AM if we can't assume anything about the reloptions it
understands.

> I think options should just work exactly the same for Indexes and
> Tables.

How should that work with the existing code? Should we have separate AM
hooks for each of these interaction points, and then have the AM figure
out how to handle its options? What about the toast.* options?

It feels like some common options would make sense to avoid too much
code duplication.

I am not trying to push this in a specific direction, but I don't have
a lot of good answers, so I went for the simplest thing I could think
of that would allow an extension to have its own options, even if it's
a bit hacky. I'm open to alternatives.

Regards,
Jeff Davis

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-15 Thread Justin Pryzby

On Mon, Dec 14, 2020 at 06:14:17PM -0600, Justin Pryzby wrote:
> On Sat, Dec 12, 2020 at 01:45:26PM -0600, Justin Pryzby wrote:
> > On Sat, Dec 12, 2020 at 09:20:35AM +0100, Peter Eisentraut wrote:
> > > On 2020-12-11 21:27, Alvaro Herrera wrote:
> > > > By the way--  What did you think of the idea of explictly marking the
> > > > types used for bitmasks using types bits32 and friends, instead of plain
> > > > int, which is harder to spot?
> > > 
> > > If we want to make it clearer, why not turn the thing into a struct, as in
> > > the attached patch, and avoid the bit fiddling altogether.
> > 
> > I like this.
> > It's a lot like what I wrote as [PATCH v31 1/5] ExecReindex and 
> > ReindexParams
> > In my v31 patch, I moved ReindexOptions to a private structure in 
> > indexcmds.c,
> > with an "int options" bitmask which is passed to reindex_index() et al.  
> > Your
> > patch keeps/puts ReindexOptions index.h, so it also applies to 
> > reindex_index,
> > which I think is good.
> > 
> > So I've rebased this branch on your patch.
> > 
> > Some thoughts:
> > 
> >  - what about removing the REINDEXOPT_* prefix ?
> >  - You created local vars with initialization like "={}". But I thought it's
> >needed to include at least one struct member like "={false}", or else
> >they're not guaranteed to be zerod ?
> >  - You passed the structure across function calls.  The usual convention is 
> > to
> >pass a pointer.
> 
> I think maybe Michael missed this message (?)
> I had applied some changes on top of Peter's patch.
> 
> I squished those commits now, and also handled ClusterOption and VacuumOption
> in the same style.
> 
> Some more thoughts:
>  - should the structures be named in plural ? "ReindexOptions" etc.  Since 
> they
>define *all* the options, not just a single bit.
>  - For vacuum, do we even need a separate structure, or should the members be
>directly within VacuumParams ?  It's a bit odd to write
>params.options.verbose.  Especially since there's also ternary options 
> which
>are directly within params.
>  - Then, for cluster, I think it should be called ClusterParams, and 
> eventually
>include the tablespaceOid, like what we're doing for Reindex.
> 
> I am awaiting feedback on these before going further since I've done too much
> rebasing with these ideas going back and forth and back.

With Alexey's agreement, I propose something like this.

I've merged some commits and kept separate the ones which are more likely to be
disputed/amended.  But it may be best to read the series as a single commit,
like "git diff origin.."

-- 
Justin
>From ee702a6f49392e84bb51ec432d0974c7369b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 15 Dec 2020 17:55:21 -0600
Subject: [PATCH 1/4] Convert options to struct: Reindex/Cluster/Vacuum

Remove prefixes and lowercase structure member variables;
Rename to plural: Vacuum/ClusterOptions;
---
 src/backend/access/heap/vacuumlazy.c |   8 +-
 src/backend/catalog/index.c  |  26 +++---
 src/backend/commands/analyze.c   |  15 ++--
 src/backend/commands/cluster.c   |  29 +++---
 src/backend/commands/indexcmds.c | 100 ++---
 src/backend/commands/tablecmds.c |   4 +-
 src/backend/commands/vacuum.c| 128 +--
 src/backend/postmaster/autovacuum.c  |  14 +--
 src/backend/tcop/utility.c   |  10 +--
 src/include/catalog/index.h  |  16 ++--
 src/include/commands/cluster.h   |  10 +--
 src/include/commands/defrem.h|   9 +-
 src/include/commands/vacuum.h|  29 +++---
 13 files changed, 198 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 25f2d5df1b..6bfed2b2da 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -456,7 +456,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 		starttime = GetCurrentTimestamp();
 	}
 
-	if (params->options & VACOPT_VERBOSE)
+	if (params->options.verbose)
 		elevel = INFO;
 	else
 		elevel = DEBUG2;
@@ -484,7 +484,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 			   xidFullScanLimit);
 	aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
 			  mxactFullScanLimit);
-	if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
+	if (params->options.disable_page_skipping)
 		aggressive = true;
 
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
@@ -902,7 +902,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * be replayed on any hot standby, where it can be disruptive.
 	 */
 	next_unskippable_block = 0;
-	if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+	if (!params->options.disable_page_skipping)
 	{
 		while (next_unskippable_block < nblocks)
 		{
@@ -960,7 +960,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		{
 			/* Time to advance next_unskip

Re: Sorting case branches in outfuncs.c/outNode alphabetically

2020-12-15 Thread Tom Lane

Fedir Panasenko  writes:
> outfuncs.c contains a switch statement responsible for choosing
> serialization function per node type here:
> https://github.com/postgres/postgres/blob/master/src/backend/nodes/outfuncs.c#L3711

Why are you concerned about outfuncs.c in particular?  Its sibling files
(copyfuncs, equalfuncs, etc) have much the same structure.

> It spans over >650LOC and is quite unreadable, requiring using search
> or code analysis tools for pretty much anything.

But why exactly do you need to read it?  It's boring boilerplate.

> I'd like to sort these case branches alphabetically and I'd like to
> get some input on that prior to submitting a patch.

I'd be a strong -1 for alphabetical sort.  To my mind, the entries
here, and in other similar places, should match the order in which the
struct types are declared in src/include/nodes/*nodes.h.  And those
are not sorted alphabetically, but (more or less) by functionality.
I would *definitely* not favor a patch that arbitrarily re-orders
those header files alphabetically.

Now, IIRC the ordering in the backend/nodes/*.c files is not always
a perfect match to the headers.  I'd be good with a patch that makes
them line up better.  But TBH, that is just neatnik-ism; I still don't
see how it makes any interesting difference to readability.

Keep in mind also that various people have shown interest in
auto-generating the backend/nodes/*.c files from the header
declarations, in which case this discussion would be moot.

> (readfuncs.c contains a similar construct for deserializing nodes, but
> that one is if...else based as opposed to switch, so order there might
> have performance implications -> I'd reserve that topic for separate
> discussion).

Yeah, I keep wondering when that structure is going to become a
noticeable performance problem.  There's little reason to think that
we've ordered the node types by frequency there.  At some point it might
make sense to convert readfuncs' lookup logic into, say, a perfect hash
table (cf src/tools/PerfectHash.pm).  I'd certainly think that that
would be a more useful activity than arguing over the switch order.

regards, tom lane

Re: Proposed patch for key managment

2020-12-15 Thread Alastair Turner

Hi Bruce et al

Firstly, thanks for shaping the patch, getting it down to a manageable
scope of cluster file encryption. I think this is a great feature and it
matters to a lot of the customers I talk to at VMware about
adopting Postgres.

Since it's exciting stuff, I've been trying to lash together a PoC
integration with the crypto infrastructure we see at these customers.
Unfortunately, in short, the API doesn't seem to suit integration with HSM
big iron, like Thales, Utimaco, (other options are available), or their
cloudy lookalikes.

I understand the model of wrapping the Data Encryption Key and storing the
wrapped key with the encrypted data. The thing I can't find support for in
your patch is passing a wrapped DEK to an external system for decryption. A
key feature of these key management approaches is that the
application handling the encrypted data doesn't get the KEK, the HSM/KSM
passes the DEK back after unwrapping it. Google have a neat illustration of
their approach, which is very similar to others, at
https://cloud.google.com/kms/docs/envelope-encryption#how_to_decrypt_data_using_envelope_encryption

So instead of calling out to a passphrase command which returns input for a
PBKDF, Postgres (in the form of initdb or postmaster) should be passing the
wrapped DEK and getting back the DEK in plain. The value passed from
Postgres to the external process doesn't even have to be a wrapped DEK, it
could be a key in the key->value sense, for use in a lookup against Vault,
CredHub or the Kubernetes secret store. Making the stored keys opaque to
the Postgres processes and pushing the task of turning them into
cryptographic material to an external hepler process probably wouldn't
shrink the patch overall, but it would move a lot of code from core
processes into the helper. Maybe that helper should live in contrib, as
discussed below, where it would hopefully be joined by a bridge for KMIP
and others.

On Tue, 15 Dec 2020 at 19:09, Stephen Frost  wrote:

> Greetings,
>
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> > > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote:

> > > >   There are a few shell script I should include to show how to create
> > > >   commands.  Where should they be stored?  /contrib module?
> > >
> > > Why are these needed.  Is it a matter of documentation?
> >
> > I have posted some of the scripts.  They involved complex shell
> > scripting that I doubt the average user can do.  The scripts are for
> > prompting for a passphrase from the user's terminal, or using the
> > Yubikey, with our without typing a pin.  I can put them in the docs and
> > then users can copy them into a file.  Is that the preferred method?
>
> If we are going to include these in core anywhere, I would think a new
> directory under contrib would make the most sense.  I'd hope that we
> could find a way to automate the testing of them though, so that we have
> some idea when they break because otherwise I'd be concerned that
> they'll break somewhere down the line and we won't notice for quite a
> while.
>
> Regards

Alastair

Sorting case branches in outfuncs.c/outNode alphabetically

2020-12-15 Thread Fedir Panasenko

Hi all,

outfuncs.c contains a switch statement responsible for choosing
serialization function per node type here:
https://github.com/postgres/postgres/blob/master/src/backend/nodes/outfuncs.c#L3711
It spans over >650LOC and is quite unreadable, requiring using search
or code analysis tools for pretty much anything.

I'd like to sort these case branches alphabetically and I'd like to
get some input on that prior to submitting a patch. Obvious benefit
would be increase in readability, with the downside of somewhat
messing up the git history.

(readfuncs.c contains a similar construct for deserializing nodes, but
that one is if...else based as opposed to switch, so order there might
have performance implications -> I'd reserve that topic for separate
discussion).

---
Best regards,
Fedir

Re: Minor documentation error regarding streaming replication protocol

On Tue, Dec 15, 2020 at 01:22:44AM -0800, Jeff Davis wrote:
> On Thu, 2020-12-03 at 13:14 -0500, Bruce Momjian wrote:
> > Andres objected (in a separate conversation) to forcing a binary-
> > > format 
> > > value on a client that didn't ask for one. He suggested that we
> > > mandate
> > > that the data is ASCII-only (for both filename and content),
> > > closing
> > > the gap Michael raised[1]; and then just declare all values to be
> > > text
> > > format.
> > 
> > How do we mandate that?  Just mention it in the docs and C comments?
> 
> Attached a simple patch that enforces an all-ASCII restore point name
> rather than documenting the possibility of non-ASCII text.

Oh, nice idea, sure.  Would you like to apply it?  Master only or
backpatch?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: SELECT INTO deprecation

On Wed, Dec  9, 2020 at 09:48:54PM +0100, Peter Eisentraut wrote:
> On 2020-12-03 20:26, Peter Eisentraut wrote:
> > On 2020-12-03 16:34, Tom Lane wrote:
> > > As I recall, a whole lot of the pain we have with INTO has to do
> > > with the semantics we've chosen for INTO in a set-operation nest.
> > > We think you can write something like
> > > 
> > >  SELECT ... INTO foo FROM ... UNION SELECT ... FROM ...
> > > 
> > > but we insist on the INTO being in the first component SELECT.
> > > I'd like to know exactly how much of that messiness is shared
> > > by SQL Server.
> > 
> > On sqlfiddle.com, this works:
> > 
> > select a into t3 from t1 union select a from t2;
> > 
> > but this gets an error:
> > 
> > select a from t1 union select a into t4 from t2;
> > 
> > SELECT INTO must be the first query in a statement containing a UNION,
> > INTERSECT or EXCEPT operator.
> 
> So, with that in mind, here is an alternative proposal that points out that
> SELECT INTO has some use for compatibility.

Do we really want to carry around confusing syntax for compatibility?  I
doubt we would ever add INTO now, even for compatibility.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: pg_rewind copies

2020-12-15 Thread Cary Huang

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Hello

The patch seems to do as described and the regression and tap tests are passing 
+   /*
+* A local source is not expected to change while we're rewinding, so 
check
+* that we size of the file matches our earlier expectation.
+*/
Is this a tyoo?

thanks
Cary

Re: Proposed patch for key managment

On Tue, Dec 15, 2020 at 02:09:09PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> > > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote:
> > > > I am getting close to applying these patches, probably this week.  The
> > > > patches are cumulative:
> > > > 
> > > > 
> > > > https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> > > > 
> > > > https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff
> > > 
> > > I strongly object to a commit based on the current state of the patch.
> 
> Yeah, I agree that this could use some work though I don't think it's
> all that far away from being something we can get in, to allow us to
> move forward with the other work around supporting TDE.

Glad to hear that.  Michael Paquier was right that the common/cipher*.c
API was wrong and should not be committed until fixed, which I think it
has been.  If there are other messy parts of this patch, I would like to
fix those too.

> > I have posted some of the scripts.  They involved complex shell
> > scripting that I doubt the average user can do.  The scripts are for
> > prompting for a passphrase from the user's terminal, or using the
> > Yubikey, with our without typing a pin.  I can put them in the docs and
> > then users can copy them into a file.  Is that the preferred method?
> 
> If we are going to include these in core anywhere, I would think a new
> directory under contrib would make the most sense.  I'd hope that we
> could find a way to automate the testing of them though, so that we have
> some idea when they break because otherwise I'd be concerned that
> they'll break somewhere down the line and we won't notice for quite a
> while.

I am doing automated testing here, but I have a Yubikey.  Michael
Paquier recommended TAP tests, I guess like we do for pg_upgrade, and I
will look into that.

> This doesn't seem like something that would make sense to only have in
> the documentation, which isn't a terribly convenient way to make use of
> them.

Yeah, it is hard to cut-paste 60 lines.  The /contrib directory would be
for SSL and cluster file encryption passphrase control, so it should
have more general usefulness.  Would those file be copied to the install
directory somewhere if you install /contrib?  Do we have any cases of
this?

> > The point is that the command-line options will be active, but will not
> > be documented.  It will not do anything on its own unless you specify
> > that command-line option.  I can comment-out the command-line options
> > from being active but the code it going to look very messy.
> 
> I'm a bit concerned with what's being contemplated here..  Ideally,
> we'll actually get everything committed for v14 but if we don't and this
> doesn't serve any use-case then I'm not sure that it should be
> included in the release.  I also don't like committing and reverting
> things either, but having command line options that aren't documented
> but which exist in the code isn't great either, nor is having random
> code commented out..

Agreed.  Once we use this code for SSL passphrase prompting, many of the
options will actually have usefulness.  What we could do is to not hide
anything, including the docs, and then hide them once we are ready to
start beta.  At that point, maybe putting in the #ifdefs will be
acceptable, since we would not be working on the feature until we
branch, and then we can just remove them again.  We had similar issues
with the Win32 port, but that had fewer visible knobs.

> > > I think that you should attach such patches directly to the emails
> > > sent to pgsql-hackers, if those github links disappear for some
> > > reason, it would become impossible to refer to see what has been
> > > discussed here.
> > 
> > Well, the patches are changing frequently.  I am attaching a version to
> > this email.
> 
> I agree with having the patches posted to the list.  I don't really
> agree with the argument of "they change frequently".

Sure, they are only 35k compressed.  My point is that I am modifying my
git tree all day, and with github, I can easily push the changes and
when someone looks at the diff, they see the most recent version.

> > I think that designing a correct set of APIs that can be plugged with
> > any SSL library is the correct move in the long term.  I have on my
> > agenda to clean up HMAC as SCRAM uses that with SHA256 and you would
> > use that with SHA512.  Daniel has mentioned that he has been touching
> > this area, and I also got a patch halfly done though pgcrypto needs
> > some extra thoughts.  So this is still WIP but you could reuse that
> > here.
> 
> Yeah, looking at what's been done there seems like the right approach to
> me as well, ideally we'd have one set of APIs that'll support all these
> use cases (not unlike what curl does..).

I think I accomplished this in a p

Re: Proposed patch for key managment

On Tue, Dec 15, 2020 at 02:20:33PM +0900, Michael Paquier wrote:
> On Mon, Dec 14, 2020 at 10:19:02PM -0500, Bruce Momjian wrote:
> > I am going to need someone to help me make these changes.  I don't feel
> > I know enough about the crypto API to do it, and it will take me 1+ week
> > to learn it.
> 
> I think that designing a correct set of APIs that can be plugged with
> any SSL library is the correct move in the long term.  I have on my
> agenda to clean up HMAC as SCRAM uses that with SHA256 and you would
> use that with SHA512.  Daniel has mentioned that he has been touching
> this area, and I also got a patch halfly done though pgcrypto needs
> some extra thoughts.  So this is still WIP but you could reuse that
> here.

I thought this was going to be a huge job, but once I looked at it, it
was clear exactly what you were saying.  Comparing cryptohash.c and
cryptohash_openssl.c I saw exactly what you wanted, and I think I have
completed it in the attached patch.  cryptohash.c implemented the hash
in C code if OpenSSL is not present --- I assume you didn't want me to
do that, but rather to split the API so it was easy to add another
implementation.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

key-alter.diff.gz
Description: application/gzip

Re: Minor documentation error regarding streaming replication protocol

2020-12-15 Thread Andres Freund

Hi,

On 2020-12-15 01:22:44 -0800, Jeff Davis wrote:
> Attached a simple patch that enforces an all-ASCII restore point name
> rather than documenting the possibility of non-ASCII text.

+1


> diff --git a/src/backend/access/transam/xlogfuncs.c 
> b/src/backend/access/transam/xlogfuncs.c
> index 290658b22c..48daed56f6 100644
> --- a/src/backend/access/transam/xlogfuncs.c
> +++ b/src/backend/access/transam/xlogfuncs.c
> @@ -44,6 +44,8 @@
>  static StringInfo label_file;
>  static StringInfo tblspc_map_file;
>  
> +static bool is_all_ascii(const char *str);

Minor nit: I'd put this into common/string.[ch], besides
pg_clean_ascii(). Probably renaming it to pg_is_ascii().

Greetings,

Andres Freund

Re: Rethinking plpgsql's assignment implementation

2020-12-15 Thread Tom Lane

I realized that the speedup patch I posted yesterday is flawed: it's
too aggressive about applying the R/W param mechanism, instead of
not aggressive enough.

To review, the point of that logic is that if we have an assignment
like
arrayvar := array_append(arrayvar, some-scalar-expression);
a naive implementation would have array_append construct an entire
new array, which we'd then have to copy into plpgsql's variable
storage.  Instead, if the array variable is in expanded-array
format (which plpgsql encourages it to be) then we can pass the
array parameter as a "read/write expanded datum", which array_append
recognizes as license to scribble right on its input and return the
modified input; that takes only O(1) time not O(N).  Then plpgsql's
assignment code notices that the expression result datum is the same
pointer already stored in the variable, so it does nothing.

With the patch at hand, a subscripted assignment a[i] := x becomes,
essentially,
a := subscriptingref(a, i, x);
and we need to make the same sort of transformation to allow
array_set_element to scribble right on the original value of "a"
instead of making a copy.

However, we can't simply not consider the source expression "x",
as I proposed yesterday.  For example, if we have
a := subscriptingref(a, i, f(array_append(a, x)));
it's not okay for array_append() to scribble on "a".  The R/W
param mechanism normally disallows any additional references to
the target variable, which would prevent this error, but I broke
that safety check with the 0007 patch.

After thinking about this awhile, I decided that plpgsql's R/W param
mechanism is really misdesigned.  Instead of requiring the assignment
source expression to be such that *all* its references to the target
variable could be passed as R/W, we really want to identify *one*
reference to the target variable to be passed as R/W, allowing any other
ones to be passed read/only as they would be by default.  As long as the
R/W reference is a direct argument to the top-level (hence last to be
executed) function in the expression, there is no harm in R/O references
being passed to other lower parts of the expression.  Nor is there any
use-case for more than one argument of the top-level function being R/W.

So the attached rewrite of the 0007 patch reimplements that logic to
identify one single Param that references the target variable, and
make only that Param pass a read/write reference, not any other
Params referencing the target variable.  This is a good change even
without considering the assignment-reimplementation proposal, because
even before this patchset we could have cases like
arrayvar := array_append(arrayvar, arrayvar[i]);
The existing code would be afraid to optimize this, but it's in fact
safe.

I also re-attach the 0001-0006 patches, which have not changed, just
to keep the cfbot happy.

regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1fa9f19f08..9e7f1590b5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12095,7 +12095,7 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 	 * parse_analyze() or the rewriter, but instead we need to pass them
 	 * through parse_utilcmd.c to make them ready for execution.
 	 */
-	raw_parsetree_list = raw_parser(cmd);
+	raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT);
 	querytree_list = NIL;
 	foreach(list_item, raw_parsetree_list)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8f341ac006..88c76dd985 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -723,6 +723,15 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
  */
 %token		NOT_LA NULLS_LA WITH_LA
 
+/*
+ * The grammar likewise thinks these tokens are keywords, but they are never
+ * generated by the scanner.  Rather, they can be injected by parser.c as
+ * the initial token of the string (using the lookahead-token mechanism
+ * implemented there).  This provides a way to tell the grammar to parse
+ * something other than the usual list of SQL commands.
+ */
+%token		MODE_TYPE_NAME
+
 
 /* Precedence: lowest to highest */
 %nonassoc	SET/* see relation_expr_opt_alias */
@@ -787,11 +796,18 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 /*
  *	The target production for the whole parse.
+ *
+ * Ordinarily we parse a list of statements, but if we see one of the
+ * special MODE_XXX symbols as first token, we parse something else.
  */
 stmtblock:	stmtmulti
 			{
 pg_yyget_extra(yyscanner)->parsetree = $1;
 			}
+			| MODE_TYPE_NAME Typename
+			{
+pg_yyget_extra(yyscanner)->parsetree = list_make1($2);
+			}
 		;
 
 /*
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index 2709f6f9c7..633774b733 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend

Re: SQL/JSON: functions

út 15. 12. 2020 v 19:56 odesílatel Oleg Bartunov 
napsal:

> On Tue, Dec 15, 2020 at 8:37 PM Pavel Stehule 
> wrote:
> >
> >
> >
> > út 15. 12. 2020 v 18:00 odesílatel Simon Riggs 
> napsal:
> >>
> >> On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov 
> wrote:
> >> >
> >> > Attached 50th version of the patches. Only the documentation was
> changed
> >> > since the previous version.
> >>
> >> I can imagine the effort required to get to v50, so I salute your
> efforts.
> >>
> >> The document for SQL Standard has now been published as CD
> >> 9075-2-Foundation (Thanks Peter).
> >>
> >> That gives us a clearer picture of what is being voted on and should
> >> allow Nikita to complete his work.
> >>
> >> I suggest we move forwards on this now, but if anyone objects to
> >> including this in PG14 in favour of waiting for the vote, please say
> >> so clearly so we can skip to PG15.
> >
> >
> > Maybe this set of patches can be reorganized and divided. Some parts
> like json generating functions are almost trivial and without
> controversions with clean benefits for users.
>
> I agree with this, most interesting is JSON_TABLE.
>

It is very interesting, but it is very complex too. There is not any
similarly complex function in ANSI SQL. This function defines its own
language.

>
> > The most complexity is related to json_table function. Nikita did a very
> good job and implemented this function in maximal conformance with ANSI SQL
> with a maximal set of features. On second hand it is hard to do review
> because this patch is really complex, and a lot of functionality was not
> implemented elsewhere (so isn't possible to compare results). I think it
> should be possible to reduce complexity and divide acceptance of json_table
> to some steps like basic functionality (on MySQL level), enhanced
> functionality (on Oracle level), and full functionality (the Postgres will
> be first).  This functionality is interesting and maximal conformity with
> SQL/JSON is great so I am for merging it. But if it will be divided into
> some chronological steps, then there can be higher probability of merging
> to upstream in a good time.
>
> I think it's shame to look up on MySQL and Oracle, since we have much
> better and complete implementation of the Standard.
>

Maybe I used bad words. I would not reduce json_table patch to MySQL or
Oracle level. I proposed merging this patch in a few steps when any step
can be functional.

Standard divides JSON supports to some levels too.

There is about 50% code (and features) when review is not a problem,
because this functionality is very clean and natural. Another 50% can be
possibly problematic because Nikita implementation is first in the world
and the description in standard is complex and hard to read, and hard to
test. Because these patches are not divided we have to do lot of repeated
work in every cycle. I proposed to do work more in style step by step than
in big bang style.

I think there is a lot of code that can be commitable immediately (maybe
half). This can be done quickly without any controversies. This reduces
complexity for 50% so we can concentrate better on the rest of patches.
The final target is full support of standard and full merge of Nikita's
patches. Nikita did hard and good work and it is nonsense to throw away any
part of his work - and it is a pity so the merging process is too long
already . But I understand it is pretty hard to commit to this patch in
complexity like this patch has.

>
>
>
> >
> > Regards
> >
> > Pavel
> >
> >>
> >> --
> >> Simon Riggshttp://www.EnterpriseDB.com/
>
>
>
> --
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

Re: SQL/JSON: functions

2020-12-15 Thread Oleg Bartunov

On Tue, Dec 15, 2020 at 8:37 PM Pavel Stehule  wrote:
>
>
>
> út 15. 12. 2020 v 18:00 odesílatel Simon Riggs  napsal:
>>
>> On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov  wrote:
>> >
>> > Attached 50th version of the patches. Only the documentation was changed
>> > since the previous version.
>>
>> I can imagine the effort required to get to v50, so I salute your efforts.
>>
>> The document for SQL Standard has now been published as CD
>> 9075-2-Foundation (Thanks Peter).
>>
>> That gives us a clearer picture of what is being voted on and should
>> allow Nikita to complete his work.
>>
>> I suggest we move forwards on this now, but if anyone objects to
>> including this in PG14 in favour of waiting for the vote, please say
>> so clearly so we can skip to PG15.
>
>
> Maybe this set of patches can be reorganized and divided. Some parts like 
> json generating functions are almost trivial and without controversions with 
> clean benefits for users.

I agree with this, most interesting is JSON_TABLE.
>
> The most complexity is related to json_table function. Nikita did a very good 
> job and implemented this function in maximal conformance with ANSI SQL with a 
> maximal set of features. On second hand it is hard to do review because this 
> patch is really complex, and a lot of functionality was not implemented 
> elsewhere (so isn't possible to compare results). I think it should be 
> possible to reduce complexity and divide acceptance of json_table to some 
> steps like basic functionality (on MySQL level), enhanced functionality (on 
> Oracle level), and full functionality (the Postgres will be first).  This 
> functionality is interesting and maximal conformity with SQL/JSON is great so 
> I am for merging it. But if it will be divided into some chronological steps, 
> then there can be higher probability of merging to upstream in a good time.

I think it's shame to look up on MySQL and Oracle, since we have much
better and complete implementation of the Standard.




>
> Regards
>
> Pavel
>
>>
>> --
>> Simon Riggshttp://www.EnterpriseDB.com/



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: SQL/JSON: functions

2020-12-15 Thread Oleg Bartunov

On Tue, Dec 15, 2020 at 8:01 PM Simon Riggs  wrote:
>
> On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov  wrote:
> >
> > Attached 50th version of the patches. Only the documentation was changed
> > since the previous version.
>
> I can imagine the effort required to get to v50, so I salute your efforts.
>
> The document for SQL Standard has now been published as CD
> 9075-2-Foundation (Thanks Peter).

SQL-2016 is already 4 years old and this what we are using for our development.
What is CD 9075-2-Foundation ?

>
> That gives us a clearer picture of what is being voted on and should
> allow Nikita to complete his work.
>
> I suggest we move forwards on this now, but if anyone objects to
> including this in PG14 in favour of waiting for the vote, please say
> so clearly so we can skip to PG15.

There is upcoming JSON v2. SQL Standard, which  specifies JSON data type,
that mean we have to protect our users, or SQL-compliant applications will be
slow and developers will be disappointed.  I discussed all these in my talk
at Postgres Build
http://www.sai.msu.su/~megera/postgres/talks/json-build-2020.pdf
Andrew Dunstan has volunteered to work with us on completing SQL/JSON standard,
but ...

I think we need SQL/JSON functions in PG14 and  SQL/JSON compatibility mode for
json/jsonb.

>
> --
> Simon Riggshttp://www.EnterpriseDB.com/

-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Proposed patch for key managment

2020-12-15 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> > On Mon, Dec 14, 2020 at 06:06:15PM -0500, Bruce Momjian wrote:
> > > I am getting close to applying these patches, probably this week.  The
> > > patches are cumulative:
> > > 
> > >   https://github.com/postgres/postgres/compare/master...bmomjian:key.diff
> > >   
> > > https://github.com/bmomjian/postgres/compare/key...bmomjian:key-alter.diff
> > 
> > I strongly object to a commit based on the current state of the patch.

Yeah, I agree that this could use some work though I don't think it's
all that far away from being something we can get in, to allow us to
move forward with the other work around supporting TDE.

> > >   There are a few shell script I should include to show how to create
> > >   commands.  Where should they be stored?  /contrib module?
> > 
> > Why are these needed.  Is it a matter of documentation?
> 
> I have posted some of the scripts.  They involved complex shell
> scripting that I doubt the average user can do.  The scripts are for
> prompting for a passphrase from the user's terminal, or using the
> Yubikey, with our without typing a pin.  I can put them in the docs and
> then users can copy them into a file.  Is that the preferred method?

If we are going to include these in core anywhere, I would think a new
directory under contrib would make the most sense.  I'd hope that we
could find a way to automate the testing of them though, so that we have
some idea when they break because otherwise I'd be concerned that
they'll break somewhere down the line and we won't notice for quite a
while.

This doesn't seem like something that would make sense to only have in
the documentation, which isn't a terribly convenient way to make use of
them.

> > >   Are people okay with having the feature enabled, but invisible
> > >   since the docs and --help output are missing?  When we enable
> > >   ssl_passphrase_command to prompt from the terminal, some of the
> > >   command-line options will be useful.
> > 
> > You are suggesting to enable the feature by default, make it invisible
> > to the users and leave it undocumented?  Is there something I missing
> > here?
> 
> The point is that the command-line options will be active, but will not
> be documented.  It will not do anything on its own unless you specify
> that command-line option.  I can comment-out the command-line options
> from being active but the code it going to look very messy.

I'm a bit concerned with what's being contemplated here..  Ideally,
we'll actually get everything committed for v14 but if we don't and this
doesn't serve any use-case then I'm not sure that it should be
included in the release.  I also don't like committing and reverting
things either, but having command line options that aren't documented
but which exist in the code isn't great either, nor is having random
code commented out..

> > >   Do people like the command-letter choices?
> > > 
> > >   I called the alter passphrase utility pg_altercpass.  I could
> > >   have called it pg_clusterpass, but I wanted to highlight it is
> > >   only for changing the passphrase, not for creating them.
> > 
> > I think that you should attach such patches directly to the emails
> > sent to pgsql-hackers, if those github links disappear for some
> > reason, it would become impossible to refer to see what has been
> > discussed here.
> 
> Well, the patches are changing frequently.  I am attaching a version to
> this email.

I agree with having the patches posted to the list.  I don't really
agree with the argument of "they change frequently".

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Mon, Dec 14, 2020 at 10:19:02PM -0500, Bruce Momjian wrote:
> > On Tue, Dec 15, 2020 at 10:05:49AM +0900, Michael Paquier wrote:
> >> - The cipher split also should be done in its own patch, and reviewed
> >> on its own.  There is a mix of dependencies between non-OpenSSL and
> >> OpenSSL code paths, making the pluggin of a new SSL library harder to
> >> do.  In short, this requires proper API design, which is not something
> >> we have here.  There are also no changes in pgcrypto for that stuff.
> > 
> > I am going to need someone to help me make these changes.  I don't feel
> > I know enough about the crypto API to do it, and it will take me 1+ week
> > to learn it.
> 
> I think that designing a correct set of APIs that can be plugged with
> any SSL library is the correct move in the long term.  I have on my
> agenda to clean up HMAC as SCRAM uses that with SHA256 and you would
> use that with SHA512.  Daniel has mentioned that he has been touching
> this area, and I also got a patch halfly done though pgcrypto needs
> some extra thoughts.  So this is still WIP but you could reuse that
> here.

Yeah, looking at what's been done there seems like the right approach to
me as well, ideally we'd have one set of APIs that'll support all these
use cases (not unlik

Re: Minimal logical decoding on standbys

2020-12-15 Thread Fabrízio de Royes Mello

On Wed, Mar 18, 2020 at 4:50 PM Alvaro Herrera 
wrote:
>
>
> well, not "forever", but:
>
> $ make check PROVE_TESTS=t/019_standby_logical_decoding_conflicts.pl
PROVE_FLAGS=-v
> ...
> cd /pgsql/source/master/src/test/recovery &&
TESTDIR='/home/alvherre/mnt/crypt/alvherre/Code/pgsql/build/master/src/test/recovery'
PATH="/pgsql/build/master/tmp_install/pgsql/install/master/bin:$PATH"
LD_LIBRARY_PATH="/pgsql/build/master/tmp_install/pgsql/install/master/lib"
 PGPORT='655432'
PG_REGRESS='/home/alvherre/mnt/crypt/alvherre/Code/pgsql/build/master/src/test/recovery/../../../src/test/regress/pg_regress'
REGRESS_SHLIB='/pgsql/build/master/src/test/regress/regress.so'
/usr/bin/prove -I /pgsql/source/master/src/test/perl/ -I
/pgsql/source/master/src/test/recovery -v t/
019_standby_logical_decoding_conflicts.pl
> t/019_standby_logical_decoding_conflicts.pl ..
> 1..24
> ok 1 - dropslot on standby created
> ok 2 - activeslot on standby created
> # poll_query_until timed out executing this query:
> # SELECT '0/35C9190' <= replay_lsn AND state = 'streaming' FROM
pg_catalog.pg_stat_replication WHERE application_name = 'standby';
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> Bailout called.  Further testing stopped:  system pg_ctl failed
> Bail out!  system pg_ctl failed
> FAILED--Further testing stopped: system pg_ctl failed
> make: *** [Makefile:19: check] Error 255
>

After rebase and did minimal tweaks (duplicated oid, TAP tests numbering)
I'm facing similar problem but in other place:


make -C src/test/recovery check PROVE_TESTS=t/
023_standby_logical_decoding_conflicts.pl PROVE_FLAGS=-v
...
/usr/bin/mkdir -p '/data/src/pg/main/src/test/recovery'/tmp_check
cd . && TESTDIR='/data/src/pg/main/src/test/recovery'
PATH="/d/src/pg/main/tmp_install/home/fabrizio/pgsql/logical-decoding-standby/bin:$PATH"
LD_LIBRARY_PATH="/d/src/pg/main/tmp_install/home/fabrizio/pgsql/logical-decoding-standby/lib"
 PGPORT='65432'
PG_REGRESS='/data/src/pg/main/src/test/recovery/../../../src/test/regress/pg_regress'
REGRESS_SHLIB='/d/src/pg/main/src/test/regress/regress.so' /usr/bin/prove
-I ../../../src/test/perl/ -I . -v t/
023_standby_logical_decoding_conflicts.pl
t/023_standby_logical_decoding_conflicts.pl ..
1..24
ok 1 - dropslot on standby created
ok 2 - activeslot on standby created
not ok 3 - dropslot on standby dropped

#   Failed test 'dropslot on standby dropped'
#   at t/023_standby_logical_decoding_conflicts.pl line 67.
#  got: 'logical'
# expected: ''
not ok 4 - activeslot on standby dropped

#   Failed test 'activeslot on standby dropped'
#   at t/023_standby_logical_decoding_conflicts.pl line 68.
#  got: 'logical'
# expected: ''


TAP tests hang forever in `check_slots_dropped` exactly here:

# our client should've terminated in response to the walsender error
eval {
$slot_user_handle->finish;
};

Regards,

-- 
   Fabrízio de Royes Mello
   PostgreSQL Developer at OnGres Inc. - https://ongres.com
From a3e42f2fc53afd2bcbe6da9d029d91694a34562e Mon Sep 17 00:00:00 2001
From: Amit Khandekar 
Date: Thu, 16 Jan 2020 10:05:15 +0530
Subject: [PATCH v7 1/5] Allow logical decoding on standby.

Allow a logical slot to be created on standby. Restrict its usage
or its creation if wal_level on primary is less than logical.
During slot creation, it's restart_lsn is set to the last replayed
LSN. Effectively, a logical slot creation on standby waits for an
xl_running_xact record to arrive from primary. Conflicting slots
would be handled in next commits.

Andres Freund and Amit Khandekar.
---
 src/backend/access/transam/xlog.c | 11 +
 src/backend/replication/logical/decode.c  | 22 -
 src/backend/replication/logical/logical.c | 37 ---
 src/backend/replication/slot.c| 57 +++
 src/backend/replication/walsender.c   | 10 ++--
 src/include/access/xlog.h |  1 +
 6 files changed, 98 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8dd225c2e1..609edbaca6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5038,6 +5038,17 @@ LocalProcessControlFile(bool reset)
 	ReadControlFile();
 }
 
+/*
+ * Get the wal_level from the control file. For a standby, this value should be
+ * considered as its active wal_level, because it may be different from what
+ * was originally configured on standby.
+ */
+WalLevel
+GetActiveWalLevel(void)
+{
+	return ControlFile->wal_level;
+}
+
 /*
  * Initialization of shared memory for XLOG
  */
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..bb7c80d6cc 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -203,11 +203,31 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			 * can restart from there.
 			 */
 			break;
+		case

Re: Reloptions for table access methods

2020-12-15 Thread Simon Riggs

On Tue, 1 Sept 2020 at 18:21, Jeff Davis  wrote:

> I went with the simple approach because fixing that problem seemed a
> bit over-engineered. Here are some thoughts on what we could do:

The simple patch is admirable, but not something we should put into core.

I definitely don't agree with the premise that all existing heap
options should be common across all new or extension tableAMs. There
are dozens of such options and I don't believe that they would all
have meaning in all future storage plugins.

I think options should just work exactly the same for Indexes and Tables.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: SQL/JSON: functions

út 15. 12. 2020 v 18:00 odesílatel Simon Riggs 
napsal:

> On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov 
> wrote:
> >
> > Attached 50th version of the patches. Only the documentation was changed
> > since the previous version.
>
> I can imagine the effort required to get to v50, so I salute your efforts.
>
> The document for SQL Standard has now been published as CD
> 9075-2-Foundation (Thanks Peter).
>
> That gives us a clearer picture of what is being voted on and should
> allow Nikita to complete his work.
>
> I suggest we move forwards on this now, but if anyone objects to
> including this in PG14 in favour of waiting for the vote, please say
> so clearly so we can skip to PG15.
>

Maybe this set of patches can be reorganized and divided. Some parts like
json generating functions are almost trivial and without controversions
with clean benefits for users.

The most complexity is related to json_table function. Nikita did a very
good job and implemented this function in maximal conformance with ANSI SQL
with a maximal set of features. On second hand it is hard to do review
because this patch is really complex, and a lot of functionality was not
implemented elsewhere (so isn't possible to compare results). I think it
should be possible to reduce complexity and divide acceptance of json_table
to some steps like basic functionality (on MySQL level), enhanced
functionality (on Oracle level), and full functionality (the Postgres will
be first).  This functionality is interesting and maximal conformity with
SQL/JSON is great so I am for merging it. But if it will be divided into
some chronological steps, then there can be higher probability of merging
to upstream in a good time.

Regards

Pavel

> --
> Simon Riggshttp://www.EnterpriseDB.com/
>

Re: REINDEX backend filtering

2020-12-15 Thread Magnus Hagander

On Tue, Dec 15, 2020 at 12:22 PM Julien Rouhaud  wrote:
>
> On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier  wrote:
> >
> > On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:
> > > Now that we have the infrastructure to track indexes that might be 
> > > corrupted
> > > due to changes in collation libraries, I think it would be a good idea to 
> > > offer
> > > an easy way for users to reindex all indexes that might be corrupted.
> >
> > Yes.  It would be a good thing.
> >
> > > The filter is also implemented so that you could cumulate multiple 
> > > filters, so
> > > it could be easy to add more filtering, for instance:
> > >
> > > REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;
> > >
> > > to only rebuild indexes depending on outdated libc collations, or
> > >
> > > REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;
> > >
> > > to only rebuild indexes depending on a specific version of libc.
> >
> > Deciding on the grammar to use depends on the use cases we would like
> > to satisfy.  From what I heard on this topic, the goal is to reduce
> > the amount of time necessary to reindex a system so as REINDEX only
> > works on indexes whose dependent collation versions are not known or
> > works on indexes in need of a collation refresh (like a reindexdb
> > --all --collation -j $jobs).  What would be the benefit in having more
> > complexity with library-dependent settings while we could take care
> > of the use cases that matter the most with a simple grammar?  Perhaps
> > "not_current" is not the best match as a keyword, we could just use
> > "collation" and handle that as a boolean.  As long as we don't need
> > new operators in the grammar rules..
>
> I'm not sure what the DBA usual pattern here.  If the reindexing
> runtime is really critical, I'm assuming that at least some people
> will dig into library details to see what are the collations that
> actually broke in the last upgrade and will want to reindex only
> those, and force the version for the rest of the indexes.  And
> obviously, they probably won't wait to have multiple collation
> versions dependencies before taking care of that.  In that case the
> filters that would matters would be one to only keep indexes with an
> outdated collation version, and an additional one for a specific
> collation name.  Or we could have the COLLATION keyword without
> additional argument mean all outdated collations, and COLLATION
> 'collation_name' to specify a specific one.  This is maybe a bit ugly,
> and would probably require a different approach for reindexdb.

Is this really a common enough operation that we need it i the main grammar?

Having the functionality, definitely, but what if it was "just" a
function instead? So you'd do something like:
SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
\gexec

Or even a function that returns the REINDEX commands directly (taking
a parameter to turn on/off concurrency for example).

That also seems like it would be easier to make flexible, and just as
easy to plug into reindexdb?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: On login trigger: take three

> Please notice that we still need GUC to disable on-login triggers: to make
>> it possible for superuser who did mistake and defined incorrect on-login
>> trigger to
>> login to the system.
>> Do we need GUC to disable all other event triggers? May be I am wrong,
>> but I do not see much need in such GUC: error in any of such event triggers
>> is non fatal
>> and can be easily reverted.
>> So the only question is whether "disable_client_connection_trigger"
>> should be true by default or not...
>>
>> I agree with you that @2 is a little bit chaotic and @1 looks like a
>> workaround.
>> But from my point of view @3 is not the best solution but overkill:
>> maintaining yet another shared hash just to save few milliseconds on login
>> seems to be too high price.
>> Actually there are many things which are loaded by new backend from the
>> database on start: for example - catalog.
>> This is why launch of new backend is an expensive operation.
>> Certainly if we execute "select 1", then system catalog is not needed...
>> But does anybody start new backend just to execute "select 1" and exit?
>>
>>
>>
> I understand so the implementation of a new shared cache can be a lot of
> work. The best way is enhancing pg_database about one column with
> information about the login triggers (dathaslogontriggers). In init time
> these data are in syscache, and can be easily checked. Some like
> pg_attribute have an atthasdef column.  Same fields has pg_class -
> relhasrules, relhastriggers, ... Then the overhead of this design should be
> really zero.
>
> What do you think about it?
>
> I like this approach more than implementation of shared hash.
> But still I have some concerns:
>
> 1. pg_database table format has to be changed. Certainly it is not
> something  completely unacceptable. But IMHO we should try to avoid
> modification of such commonly used catalog tables as much as possible.
>

yes, I know. Unfortunately I  don't see any other and correct solution.
There should be more wide discussion before this work about this topic. On
second hand, this change should not break anything (if this new field will
be placed on as the last field). The logon trigger really looks like a
database trigger - so I think so this semantic is correct. I have no idea
if it is acceptable for committers :-/. I hope so.

The fact that the existence of a logon trigger can be visible from a shared
database object can be an interesting feature too.


> 2. It is not so easy to maintain this flag. There can be multiple on-login
> triggers defined. If such trigger is dropped, we can not just clear this
> flag.
> We should check if other triggers exist. Now assume that there are two
> triggers and two concurrent transactions dropping each one.
> According to their snapshots them do not see changes made by other
> transaction. So them remove both triggers but didn't clear the flag.
> Certainly we can use counter instead of flag. But I am not sure that their
> will be not other problems with maintaining counter.
>

I don't think it is necessary.  My opinion is not too strong, but if
pg_class doesn't need to hold a number of triggers, then I think so
pg_database doesn't need to hold this number too.


> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: SQL/JSON: functions

2020-12-15 Thread Simon Riggs

On Fri, 17 Jul 2020 at 21:26, Nikita Glukhov  wrote:
>
> Attached 50th version of the patches. Only the documentation was changed
> since the previous version.

I can imagine the effort required to get to v50, so I salute your efforts.

The document for SQL Standard has now been published as CD
9075-2-Foundation (Thanks Peter).

That gives us a clearer picture of what is being voted on and should
allow Nikita to complete his work.

I suggest we move forwards on this now, but if anyone objects to
including this in PG14 in favour of waiting for the vote, please say
so clearly so we can skip to PG15.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Proposed patch for key managment

On Tue, Dec 15, 2020 at 10:36:56AM +0800, Neil Chen wrote:
> 2. I tried to add support for AES_CTR mode, and the code for encrypting buffer
> blocks. In the process I found that in pg_cipher_ctx_create, the key length is
> declared as "byte". However, in the CryptoKey structure, the length is stored
> as "bit", which leads me to use a form similar to Key->klen / 8 when I call
> this function. Maybe we should unify the two to avoid unnecessary confusion.

Yes, I would also like to get opinions on this.  We certainly have to
have the key length be in _bit_ units when visible by users, but I see a
lot of cases where we allocate arrays based on bytes.  I am unclear
where the proper units should be.  At a minimum, we should specify the
units in the function parameter names.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: On login trigger: take three

2020-12-15 Thread Konstantin Knizhnik




On 15.12.2020 18:25, Pavel Stehule wrote:



út 15. 12. 2020 v 15:06 odesílatel Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> napsal:




On 15.12.2020 16:18, Pavel Stehule wrote:



út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik
mailto:k.knizh...@postgrespro.ru>>
napsal:



On 11.12.2020 19:27, Pavel Stehule wrote:



pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik
mailto:k.knizh...@postgrespro.ru>> napsal:



On 11.12.2020 18:40, Pavel Stehule wrote:


is not correct. It makes it not possible to
superuser to disable triggers for all users.


pg_database_ownercheck returns true for superuser always.


Sorry, but I consider different case: when normal user
is connected to the database.
In this case pg_database_ownercheck returns false and
trigger is not disabled, isn't it?


My idea was to reduce necessary rights to database owners. 
But you have a true, so only superuser can create event
trigger, so this feature cannot be used in DBaaS
environments, and then my original idea was wrong.




Also GUCs are not associated with any database. So
I do not understand why  this check of database
ownership is relevant at all?

What kind of protection violation we want to prevent?

It seems to be obvious that normal user should not
be able to prevent trigger execution because this
triggers may be used to enforce some security policies.
If trigger was created by user itself, then it can
drop or disable it using ALTER statement. GUC is
not needed for it.


when you cannot connect to the database, then you
cannot do ALTER. In DBaaS environments lot of users has
not superuser rights.



But only superusers can set login triggers, right?
So only superuser can make a mistake in this trigger.
But he have enough rights to recover this error. Normal
users are not able to define on connection triggers and
should not have rights to disable them.


yes, it is true

Pavel


-- 
Konstantin Knizhnik

Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



So what's next?
I see three options:

1. Do not introduce GUC for disabling all event triggers
(especially taken in account that them are  disabled by default).
Return back to the patch
on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with
"disable_client_connection_trigger"
and make it true by default (to eliminate any overhead for
users which are not using on logintriggers).

2. Have two GUCS: "disable_client_connection_trigger" and
"disable_event_triggers".

3. Implement some mechanism for caching presence of event
triggers in shared memory.


@3 is the best design (the things should work well by default),
@2 is a little bit chaotic and @1 looks like a workaround.



Please notice that we still need GUC to disable on-login triggers:
to make it possible for superuser who did mistake and defined
incorrect on-login trigger to
login to the system.
Do we need GUC to disable all other event triggers? May be I am
wrong, but I do not see much need in such GUC: error in any of
such event triggers is non fatal
and can be easily reverted.
So the only question is whether
"disable_client_connection_trigger" should be true by default or
not...

I agree with you that @2 is a little bit chaotic and @1 looks like
a workaround.
But from my point of view @3 is not the best solution but
overkill: maintaining yet another shared hash just to save few
milliseconds on login seems to be too high price.
Actually there are many things which are loaded by new backend
from the database on start: for example - catalog.
This is why launch of new backend is an expensive operation.
Certainly if we execute "select 1", then system catalog is not
needed...
But does anybody start new backend just to execute "select 1" and
exit?



I understand so the implementation of a new shared cache can be a lot 
of work. The best way is enhancing pg_database about one column with 
information about the login triggers (dathaslogontriggers). In init 
time these data are in syscache, and can be easily checked. Some like 
pg_attribute have an atthasdef column.  Same fields has pg_class - 
relhasrules, relhastriggers, ... Then the overhead of this design 
should be really zero.


What do you think about it?


I like this approach more than implementation of shared hash.
But still I have

Re: Proposed patch for key managment

On Mon, Dec 14, 2020 at 11:16:18PM -0500, Bruce Momjian wrote:
> > 1. Previously, we added a variable bootstrap_keys_wrap that is used for
> > encryption during initdb. However, since we save the "wrapped" key, we need 
> > to
> > use a global KEK that can be accessed in boot mode to unwrap it before 
> > use... I
> > don't know if that's good. To make it simple, I modified the
> > bootstrap_keys_wrap to store the "unwrapped" key so that the encryption
> > function can get it correctly. (The variable name should be changed
> > accordingly).
> 
> I see what you are saying.  We store the wrapped in bootstrap mode, but
> the unwrapped in normal mode.  There is also the case of when we copy
> the keys from an old cluster.  I will work on a patch tomorrow and
> report back here.

I had not considered that we need the date keys available in bootstrap
mode, even if we copied them from another cluster during pg_upgrade.  I
have updated the diff URLs and attaching a patch showing the changes I
made. Basically, I had to separate BootStrapKmgr() into sections:

1.  copy or create an empty live key directory
2.  get the pass phrase
3.  populate the live key directory if we didn't copy it
4.  decrypt they keys into a file-scoped variable

Thanks for showing me this missing feature.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

diff --git a/src/backend/crypto/kmgr.c b/src/backend/crypto/kmgr.c
new file mode 100644
index 9143e72..24c5d7f
*** a/src/backend/crypto/kmgr.c
--- b/src/backend/crypto/kmgr.c
*** static KmgrShmemData *KmgrShmem;
*** 50,56 
  char   *cluster_passphrase_command = NULL;
  int		file_encryption_keylen = 0;
  
! CryptoKey	bootstrap_keys_wrap[KMGR_MAX_INTERNAL_KEYS];
  
  extern char *bootstrap_old_key_datadir;
  extern int	bootstrap_file_encryption_keylen;
--- 50,56 
  char   *cluster_passphrase_command = NULL;
  int		file_encryption_keylen = 0;
  
! CryptoKey	bootstrap_keys[KMGR_MAX_INTERNAL_KEYS];
  
  extern char *bootstrap_old_key_datadir;
  extern int	bootstrap_file_encryption_keylen;
*** static CryptoKey *generate_crypto_key(in
*** 65,74 
  void
  BootStrapKmgr(void)
  {
! 	PgKeyWrapCtx	*ctx;
  	char		passphrase[KMGR_MAX_PASSPHRASE_LEN];
- 	uint8		KEK_enc[KMGR_ENC_KEY_LEN];
- 	uint8		KEK_hmac[KMGR_MAC_KEY_LEN];
  	int			passlen;
  
  #ifndef USE_OPENSSL
--- 65,74 
  void
  BootStrapKmgr(void)
  {
! 	char		live_path[MAXPGPATH];
! 	CryptoKey	*keys_wrap;
! 	int			nkeys;
  	char		passphrase[KMGR_MAX_PASSPHRASE_LEN];
  	int			passlen;
  
  #ifndef USE_OPENSSL
*** BootStrapKmgr(void)
*** 78,83 
--- 78,85 
  			  errhint("Compile with --with-openssl to use cluster encryption.";
  #endif
  
+ 	snprintf(live_path, sizeof(live_path), "%s/%s", DataDir, LIVE_KMGR_DIR);
+ 
  	/* copy cluster file encryption keys from an old cluster? */
  	if (bootstrap_old_key_datadir != NULL)
  	{
*** BootStrapKmgr(void)
*** 87,122 
   bootstrap_old_key_datadir, LIVE_KMGR_DIR);
  		copydir(old_key_dir, LIVE_KMGR_DIR, true);
  	}
! 	/* generate new cluster file encryption keys */
  	else
  	{
! 		char live_path[MAXPGPATH];
! 
! 		if (mkdir(LIVE_KMGR_DIR, pg_dir_create_mode) < 0)
  			ereport(ERROR,
  	(errcode_for_file_access(),
  	 errmsg("could not create cluster file encryption directory \"%s\": %m",
  			LIVE_KMGR_DIR)));
  
- 		memset(bootstrap_keys_wrap, 0, sizeof(bootstrap_keys_wrap));
- 		/* bzero keys on exit */
- 		on_proc_exit(bzeroKmgrKeys, 0);
- 	
- 		/* Get key encryption key from the passphrase command */
- 		snprintf(live_path, sizeof(live_path), "%s/%s", DataDir, LIVE_KMGR_DIR);
- 		passlen = kmgr_run_cluster_passphrase_command(cluster_passphrase_command,
- 	  passphrase, KMGR_MAX_PASSPHRASE_LEN,
- 	  live_path);
- 		if (passlen < KMGR_MIN_PASSPHRASE_LEN)
- 			ereport(ERROR,
- 	(errmsg("passphrase must be at least %d bytes",
- 			KMGR_MIN_PASSPHRASE_LEN)));
- 	
  		/* Get key encryption key and HMAC key from passphrase */
  		kmgr_derive_keys(passphrase, passlen, KEK_enc, KEK_hmac);
  
- 		explicit_bzero(passphrase, passlen);
- 	
  		/* Create temporary key wrap context */
  		ctx = pg_create_keywrap_ctx(KEK_enc, KEK_hmac);
  		if (!ctx)
--- 89,128 
   bootstrap_old_key_datadir, LIVE_KMGR_DIR);
  		copydir(old_key_dir, LIVE_KMGR_DIR, true);
  	}
! 	/* create empty directory */
  	else
  	{
! 		 if (mkdir(LIVE_KMGR_DIR, pg_dir_create_mode) < 0)
  			ereport(ERROR,
  	(errcode_for_file_access(),
  	 errmsg("could not create cluster file encryption directory \"%s\": %m",
  			LIVE_KMGR_DIR)));
+ 	}
+ 
+ 	/*
+ 	 * Get key encryption key from the passphrase command.  The passphrase
+ 	 * command might want to check for the existance of files in the
+ 	 * live directory, so run this _after_ copying the directory in place.
+ 	 */
+ 	passlen =

Re: On login trigger: take three

út 15. 12. 2020 v 15:06 odesílatel Konstantin Knizhnik <
k.knizh...@postgrespro.ru> napsal:

>
>
> On 15.12.2020 16:18, Pavel Stehule wrote:
>
>
>
> út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> napsal:
>
>>
>>
>> On 11.12.2020 19:27, Pavel Stehule wrote:
>>
>>
>>
>> pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik <
>> k.knizh...@postgrespro.ru> napsal:
>>
>>>
>>>
>>> On 11.12.2020 18:40, Pavel Stehule wrote:
>>>
>>>
>>> is not correct. It makes it not possible to superuser to disable
 triggers for all users.

>>>
>>> pg_database_ownercheck returns true for superuser always.
>>>
>>>
>>> Sorry, but I consider different case: when normal user is connected to
>>> the database.
>>> In this case pg_database_ownercheck returns false and trigger is not
>>> disabled, isn't it?
>>>
>>
>> My idea was to reduce necessary rights to database owners.  But you have
>> a true, so only superuser can create event trigger, so this feature cannot
>> be used in DBaaS environments, and then my original idea was wrong.
>>
>>
>>>
>>> Also GUCs are not associated with any database. So I do not understand
 why  this check of database ownership is relevant at all?

 What kind of protection violation we want to prevent?

 It seems to be obvious that normal user should not be able to prevent
 trigger execution because this triggers may be used to enforce some
 security policies.
 If trigger was created by user itself, then it can drop or disable it
 using ALTER statement. GUC is not needed for it.

>>>
>>> when you cannot connect to the database, then you cannot do ALTER. In
>>> DBaaS environments lot of users has not superuser rights.
>>>
>>>
>>>
>>> But only superusers can set login triggers, right?
>>> So only superuser can make a mistake in this trigger. But he have enough
>>> rights to recover this error. Normal users are not able to define on
>>> connection triggers and
>>> should not have rights to disable them.
>>>
>>
>> yes, it is true
>>
>> Pavel
>>
>>
>>> --
>>> Konstantin Knizhnik
>>> Postgres Professional: http://www.postgrespro.com
>>> The Russian Postgres Company
>>>
>>>
>> So what's next?
>> I see three options:
>>
>> 1. Do not introduce GUC for disabling all event triggers (especially
>> taken in account that them are  disabled by default).
>> Return back to the patch
>> on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with
>> "disable_client_connection_trigger"
>> and make it true by default (to eliminate any overhead for users which
>> are not using on logintriggers).
>>
>> 2. Have two GUCS: "disable_client_connection_trigger" and
>> "disable_event_triggers".
>>
>> 3. Implement some mechanism for caching presence of event triggers in
>> shared memory.
>>
>
> @3 is the best design (the things should work well by default), @2 is a
> little bit chaotic and @1 looks like a workaround.
>
>
> Please notice that we still need GUC to disable on-login triggers: to make
> it possible for superuser who did mistake and defined incorrect on-login
> trigger to
> login to the system.
> Do we need GUC to disable all other event triggers? May be I am wrong, but
> I do not see much need in such GUC: error in any of such event triggers is
> non fatal
> and can be easily reverted.
> So the only question is whether "disable_client_connection_trigger" should
> be true by default or not...
>
> I agree with you that @2 is a little bit chaotic and @1 looks like a
> workaround.
> But from my point of view @3 is not the best solution but overkill:
> maintaining yet another shared hash just to save few milliseconds on login
> seems to be too high price.
> Actually there are many things which are loaded by new backend from the
> database on start: for example - catalog.
> This is why launch of new backend is an expensive operation.
> Certainly if we execute "select 1", then system catalog is not needed...
> But does anybody start new backend just to execute "select 1" and exit?
>
>
>
I understand so the implementation of a new shared cache can be a lot of
work. The best way is enhancing pg_database about one column with
information about the login triggers (dathaslogontriggers). In init time
these data are in syscache, and can be easily checked. Some like
pg_attribute have an atthasdef column.  Same fields has pg_class -
relhasrules, relhastriggers, ... Then the overhead of this design should be
really zero.

What do you think about it?

Pavel


>
>
>
>
> Regards
>
> Pavel
>
>
>>
>> --
>> Konstantin Knizhnik
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: Proposed patch for key managment

On Tue, Dec 15, 2020 at 02:20:33PM +0900, Michael Paquier wrote:
> > Uh, I got this code from pg_resetwal.c, which does something similar to
> > pg_altercpass.
> 
> Yeah, that's actually the part where it is a bad idea to just copy
> this pattern.  pg_resetwal should not do that in the long term in my
> opinion, but nobody has come to clean up this stuff.  (- -;)

Glad you asked about this.  It turns out pg_altercpass works fine with
postgres_fe.h, so I will now use that.  pg_resetwal still can't use it
though.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Speeding up GIST index creation for tsvectors

2020-12-15 Thread Amit Khandekar

On Sun, 13 Dec 2020 at 9:28 PM, Andrey Borodin  wrote:
> +1
> This will make all INSERTs and UPDATES for tsvector's GiSTs.

Oh, I didn't realize that this code is getting used in GIST index
insertion and creation too. Will check there.

> Also I really like idea of taking advantage of hardware capabilities like 
> __builtin_* etc wherever possible.

Yes. Also, the __builtin_popcount() uses SIMD vectorization (on arm64
: "cnt  v0.8b, v0.8b"), hence there's all the more reason to use it.
Over and above that, I had thought that if we can auto-vectorize the
byte-by-byte xor operation and the popcount() call using compiler
optimizations, we would benefit out of this, but didn't see any more
improvement. I hoped for the benefit because that would have allowed
us to process in 128-bit chunks or 256-bit chunks, since the vector
registers are at least that long. Maybe gcc is not that smart to
translate __builtin_popcount() to 128/256 bit vectorized instruction.
But for XOR operator, it does translate to 128bit vectorized
instructions (on arm64 : "eor  v2.16b, v2.16b, v18.16b")

> Meanwhile there are at least 4 incarnation of hemdistsign() functions that 
> are quite similar. I'd propose to refactor them somehow...

Yes, I hope we get the benefit there also. Before that, I thought I
should post the first use-case to get some early comments. Thanks for
your encouraging comments :)

Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)

2020-12-15 Thread Tom Lane

Noah Misch  writes:
> On Mon, Dec 14, 2020 at 01:59:03PM -0500, Tom Lane wrote:
>> Here's a rolled-up patch that does some further documentation work
>> and gets rid of the unnecessary memset's as well.

> +1 on removing the memset() calls.  That said, it's not a big deal if more
> creep in over time; it doesn't qualify as a project policy violation.

Right, that part is just neatnik-ism.  Neither the calls with memset
nor the ones without are buggy.

>> + * *infoP and hash_flags should specify at least the entry sizes and key

> s/should/must/

OK; thanks for reviewing!

regards, tom lane

Re: On login trigger: take three

út 15. 12. 2020 v 15:06 odesílatel Konstantin Knizhnik <
k.knizh...@postgrespro.ru> napsal:

>
>
> On 15.12.2020 16:18, Pavel Stehule wrote:
>
>
>
> út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> napsal:
>
>>
>>
>> On 11.12.2020 19:27, Pavel Stehule wrote:
>>
>>
>>
>> pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik <
>> k.knizh...@postgrespro.ru> napsal:
>>
>>>
>>>
>>> On 11.12.2020 18:40, Pavel Stehule wrote:
>>>
>>>
>>> is not correct. It makes it not possible to superuser to disable
 triggers for all users.

>>>
>>> pg_database_ownercheck returns true for superuser always.
>>>
>>>
>>> Sorry, but I consider different case: when normal user is connected to
>>> the database.
>>> In this case pg_database_ownercheck returns false and trigger is not
>>> disabled, isn't it?
>>>
>>
>> My idea was to reduce necessary rights to database owners.  But you have
>> a true, so only superuser can create event trigger, so this feature cannot
>> be used in DBaaS environments, and then my original idea was wrong.
>>
>>
>>>
>>> Also GUCs are not associated with any database. So I do not understand
 why  this check of database ownership is relevant at all?

 What kind of protection violation we want to prevent?

 It seems to be obvious that normal user should not be able to prevent
 trigger execution because this triggers may be used to enforce some
 security policies.
 If trigger was created by user itself, then it can drop or disable it
 using ALTER statement. GUC is not needed for it.

>>>
>>> when you cannot connect to the database, then you cannot do ALTER. In
>>> DBaaS environments lot of users has not superuser rights.
>>>
>>>
>>>
>>> But only superusers can set login triggers, right?
>>> So only superuser can make a mistake in this trigger. But he have enough
>>> rights to recover this error. Normal users are not able to define on
>>> connection triggers and
>>> should not have rights to disable them.
>>>
>>
>> yes, it is true
>>
>> Pavel
>>
>>
>>> --
>>> Konstantin Knizhnik
>>> Postgres Professional: http://www.postgrespro.com
>>> The Russian Postgres Company
>>>
>>>
>> So what's next?
>> I see three options:
>>
>> 1. Do not introduce GUC for disabling all event triggers (especially
>> taken in account that them are  disabled by default).
>> Return back to the patch
>> on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with
>> "disable_client_connection_trigger"
>> and make it true by default (to eliminate any overhead for users which
>> are not using on logintriggers).
>>
>> 2. Have two GUCS: "disable_client_connection_trigger" and
>> "disable_event_triggers".
>>
>> 3. Implement some mechanism for caching presence of event triggers in
>> shared memory.
>>
>
> @3 is the best design (the things should work well by default), @2 is a
> little bit chaotic and @1 looks like a workaround.
>
>
> Please notice that we still need GUC to disable on-login triggers: to make
> it possible for superuser who did mistake and defined incorrect on-login
> trigger to
> login to the system.
> Do we need GUC to disable all other event triggers? May be I am wrong, but
> I do not see much need in such GUC: error in any of such event triggers is
> non fatal
> and can be easily reverted.
> So the only question is whether "disable_client_connection_trigger" should
> be true by default or not...
>

> I agree with you that @2 is a little bit chaotic and @1 looks like a
> workaround.
> But from my point of view @3 is not the best solution but overkill:
> maintaining yet another shared hash just to save few milliseconds on login
> seems to be too high price.
> Actually there are many things which are loaded by new backend from the
> database on start: for example - catalog.
> This is why launch of new backend is an expensive operation.
> Certainly if we execute "select 1", then system catalog is not needed...
> But does anybody start new backend just to execute "select 1" and exit?
>

 This is the worst case but tested on a not too bad CPU (i7). In virtual
environments with overloaded CPU the effect can be worse.



>
>
>
>
>
>
> Regards
>
> Pavel
>
>
>>
>> --
>> Konstantin Knizhnik
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

2020-12-15 Thread Fujii Masao

On 2020/11/04 18:06, Fujii Masao wrote:

On 2020/10/29 15:21, Fujii Masao wrote:

On 2020/10/07 11:30, Bharath Rupireddy wrote:

On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
wrote:

On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao wrote:

+1 Or it's also ok to make each patch separately.
Anyway, thanks!

Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.

I'm attaching all the 4 patches here together.

1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
-- This is the patch initially sent in this mail by me, I just
renamed it.
2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
-- This is the patch proposed by Fuji Masao, I just renamed it.
3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch

Please consider these patches for further review.

Thanks for the patches! They look good to me. So barring any objections,
I will commit them one by one.

I pushed the following two patches.

- v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
- v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch

As I told in other thread [1], I'm thinking to revert this patch
because this change could have bad side-effect on the startup
process waiting for recovery conflict.

Before applying the patch, the latch that the startup process
used to wait for recovery conflict was different from the latch
that SIGHUP signal handler or walreceiver process, etc set to
wake the startup process up. So SIGHUP or walreceiver didn't
wake the startup process waiting for recovery conflict up unnecessary.

But the patch got rid of the dedicated latch for signaling
the startup process. This change forced us to use the same latch
to make the startup process wait or wake up. Which caused SIGHUP
signal handler or walreceiver proces to wake the startup process
waiting on the latch for recovery conflict up unnecessarily frequently.

While waiting for recovery conflict on buffer pin, deadlock needs
to be checked at least every deadlock_timeout. But that frequent
wakeups could prevent the deadlock timer from being triggered and
could delay that deadlock checks.

Therefore, I'm thinking to revert the commit ac22929a26 and 113d3591b8.

[1]
https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726...@oss.nttdata.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: On login trigger: take three

2020-12-15 Thread Konstantin Knizhnik




On 15.12.2020 16:18, Pavel Stehule wrote:



út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> napsal:




On 11.12.2020 19:27, Pavel Stehule wrote:



pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik
mailto:k.knizh...@postgrespro.ru>>
napsal:



On 11.12.2020 18:40, Pavel Stehule wrote:


is not correct. It makes it not possible to superuser to
disable triggers for all users.


pg_database_ownercheck returns true for superuser always.


Sorry, but I consider different case: when normal user is
connected to the database.
In this case pg_database_ownercheck returns false and trigger
is not disabled, isn't it?


My idea was to reduce necessary rights to database owners.  But
you have a true, so only superuser can create event trigger, so
this feature cannot be used in DBaaS environments, and then my
original idea was wrong.




Also GUCs are not associated with any database. So I do
not understand why  this check of database ownership is
relevant at all?

What kind of protection violation we want to prevent?

It seems to be obvious that normal user should not be
able to prevent trigger execution because this triggers
may be used to enforce some security policies.
If trigger was created by user itself, then it can drop
or disable it using ALTER statement. GUC is not needed
for it.


when you cannot connect to the database, then you cannot do
ALTER. In DBaaS environments lot of users has not superuser
rights.



But only superusers can set login triggers, right?
So only superuser can make a mistake in this trigger. But he
have enough rights to recover this error. Normal users are
not able to define on connection triggers and
should not have rights to disable them.


yes, it is true

Pavel


-- 
Konstantin Knizhnik

Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



So what's next?
I see three options:

1. Do not introduce GUC for disabling all event triggers
(especially taken in account that them are  disabled by default).
Return back to the patch
on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with
"disable_client_connection_trigger"
and make it true by default (to eliminate any overhead for users
which are not using on logintriggers).

2. Have two GUCS: "disable_client_connection_trigger" and
"disable_event_triggers".

3. Implement some mechanism for caching presence of event triggers
in shared memory.


@3 is the best design (the things should work well by default), @2 is 
a little bit chaotic and @1 looks like a workaround.




Please notice that we still need GUC to disable on-login triggers: to 
make it possible for superuser who did mistake and defined incorrect 
on-login trigger to

login to the system.
Do we need GUC to disable all other event triggers? May be I am wrong, 
but I do not see much need in such GUC: error in any of such event 
triggers is non fatal

and can be easily reverted.
So the only question is whether "disable_client_connection_trigger" 
should be true by default or not...


I agree with you that @2 is a little bit chaotic and @1 looks like a 
workaround.
But from my point of view @3 is not the best solution but overkill: 
maintaining yet another shared hash just to save few milliseconds on 
login seems to be too high price.
Actually there are many things which are loaded by new backend from the 
database on start: for example - catalog.

This is why launch of new backend is an expensive operation.
Certainly if we execute "select 1", then system catalog is not needed...
But does anybody start new backend just to execute "select 1" and exit?







Regards

Pavel



-- 
Konstantin Knizhnik

Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2020-12-15 Thread Peter Eisentraut

Here is a new patch for this.  This now follows the implementation that 
Tom has suggested:  Leave date_part() alone, add a new set of extract() 
functions, and map the SQL EXTRACT construct to those.  I have basically 
just copied over the implementations from my previous patch and placed 
them next to the existing date_part() implementations.  So all the 
behavior is still the same as in the previous patches.


One thing I still need to look into is how to not lose all the test 
coverage for date_part().  But that should be fairly mechanical, so I'm 
leaving it off in this version.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/
>From b40213b0d31b8ac55cda4658c21d06e701b6e273 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 15 Dec 2020 14:27:58 +0100
Subject: [PATCH v4] Change return type of EXTRACT to numeric

The previous implementation of EXTRACT mapped internally to
date_part(), which returned type double precision (since it was
implemented long before the numeric type existed).  This can lead to
imprecise output in some cases, so returning numeric would be
preferrable.  Changing the return type of an existing function is a
bit risky, so instead we do the following:  We implement a new set of
functions, which are now called "extract", in parallel to the existing
date_part functions.  They work the same way internally but use
numeric instead of float8.  The EXTRACT construct is now mapped by the
parser to these new extract functions.  That way, dumps of views
etc. from old versions (which would use date_part) continue to work
unchanged, but new uses will map to the new extract functions.

Additionally, the reverse compilation of EXTRACT now reproduces the
original syntax, using the new mechanism introduced in
40c24bfef92530bd846e111c1742c2a54441c62c.

The following minor changes of behavior result from the new
implementation:

- The column name from an isolated EXTRACT call is now "extract"
  instead of "date_part".

- Extract from date now rejects inappropriate field names such as
  HOUR.  It was previously mapped internally to extract from
  timestamp, so it would silently accept everything appropriate for
  timestamp.

- Return values when extracting fields with possibly fractional
  values, such as second and epoch, now have the full scale that the
  value has internally (so, for example, '1.00' instead of just
  '1').

Discussion: 
https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce1...@phystech.edu
---
 doc/src/sgml/func.sgml  |  10 +-
 src/backend/parser/gram.y   |   2 +-
 src/backend/utils/adt/date.c| 407 +++
 src/backend/utils/adt/ruleutils.c   |  21 +
 src/backend/utils/adt/timestamp.c   | 708 +++-
 src/include/catalog/pg_proc.dat |  18 +
 src/test/regress/expected/create_view.out   |   2 +-
 src/test/regress/expected/date.out  | 482 ++---
 src/test/regress/expected/interval.out  |  72 +-
 src/test/regress/expected/psql_crosstab.out |  12 +-
 src/test/regress/expected/time.out  |  24 +-
 src/test/regress/expected/timetz.out|  42 +-
 src/test/regress/sql/date.sql   |   6 +-
 13 files changed, 1445 insertions(+), 361 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df29af6371..a42993c4a5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -8705,7 +8705,7 @@ Date/Time Functions
   extract
  
  extract ( field 
from timestamp )
- double precision
+ numeric
 
 
  Get timestamp subfield; see 
@@ -8719,7 +8719,7 @@ Date/Time Functions

 
  extract ( field 
from interval )
- double precision
+ numeric
 
 
  Get interval subfield; see 
@@ -9234,7 +9234,7 @@ EXTRACT, 
date_part
 well.)  field is an identifier or
 string that selects what field to extract from the source value.
 The extract function returns values of type
-double precision.
+numeric.
 The following are valid field names:
 
 
@@ -9645,6 +9645,10 @@ EXTRACT, 
date_part
 be a string value, not a name.  The valid field names for
 date_part are the same as for
 extract.
+For historical reasons, the date_part function
+returns values of type double precision.  This can result in
+a loss of precision in certain uses.  Using extract
+is recommended instead.

 
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8f341ac006..2e8924aff8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -13768,7 +13768,7 @@ func_expr_common_subexpr:
{ $$ = makeTypeCast($3, $5, @1); }
| EXTRACT '(' extract_list ')'
{
-   $$ = (Node *) 
makeFuncCall(

Re: TAP tests aren't using the magic words for Windows file access

2020-12-15 Thread Andrew Dunstan



On 12/15/20 12:05 AM, r.zhar...@postgrespro.ru wrote:
> Hello hackers,
>
> Are there any plans to backport the patch to earlier versions
> of the Postgres?
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=114541d58e5970e51b78b77b65de16210beaab43
>
>
> We rarely see the issue with the pg_ctl/004_logrotate test on
> the REL_12_STABLE branch. On my notebook I can easily reproduce
> the "Permission denied at src/test/perl/TestLib.pm line 259"
> error with the small change below. But the same test on the
> 13th version and the 12th version with the TestLib patch does
> not fail.
>
> diff --git a/src/bin/pg_ctl/t/004_logrotate.pl
> b/src/bin/pg_ctl/t/004_logrotate.pl 
>    
> index bc39abd23e4..e49e159bc84
> 100644   
> 
>   ---
> a/src/bin/pg_ctl/t/004_logrotate.pl 
> 
>   +++
> b/src/bin/pg_ctl/t/004_logrotate.pl 
> 
>   @@ -72,7 +72,7 @@ for (my
> $attempts = 0; $attempts < $max_attempts;
> $attempts++)  
> 
> 
> {  
> 
> 
>  $new_current_logfiles = slurp_file($node->data_dir .
> '/current_logfiles');   
>   last
> if $new_current_logfiles ne
> $current_logfiles;  
> 
>  -  
> usleep(100_000);   
> 
> 
>   +  
> usleep(1);
> 
> 
> }  
> 
> 
> 
> 
>    
> note "now current_logfiles = $new_current_logfiles";
>
>
>


Oops, looks like that slipped off my radar somehow, I'll see about
backpatching it right away.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: On login trigger: take three

út 15. 12. 2020 v 14:12 odesílatel Konstantin Knizhnik <
k.knizh...@postgrespro.ru> napsal:

>
>
> On 11.12.2020 19:27, Pavel Stehule wrote:
>
>
>
> pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> napsal:
>
>>
>>
>> On 11.12.2020 18:40, Pavel Stehule wrote:
>>
>>
>> is not correct. It makes it not possible to superuser to disable triggers
>>> for all users.
>>>
>>
>> pg_database_ownercheck returns true for superuser always.
>>
>>
>> Sorry, but I consider different case: when normal user is connected to
>> the database.
>> In this case pg_database_ownercheck returns false and trigger is not
>> disabled, isn't it?
>>
>
> My idea was to reduce necessary rights to database owners.  But you have a
> true, so only superuser can create event trigger, so this feature cannot be
> used in DBaaS environments, and then my original idea was wrong.
>
>
>>
>> Also GUCs are not associated with any database. So I do not understand
>>> why  this check of database ownership is relevant at all?
>>>
>>> What kind of protection violation we want to prevent?
>>>
>>> It seems to be obvious that normal user should not be able to prevent
>>> trigger execution because this triggers may be used to enforce some
>>> security policies.
>>> If trigger was created by user itself, then it can drop or disable it
>>> using ALTER statement. GUC is not needed for it.
>>>
>>
>> when you cannot connect to the database, then you cannot do ALTER. In
>> DBaaS environments lot of users has not superuser rights.
>>
>>
>>
>> But only superusers can set login triggers, right?
>> So only superuser can make a mistake in this trigger. But he have enough
>> rights to recover this error. Normal users are not able to define on
>> connection triggers and
>> should not have rights to disable them.
>>
>
> yes, it is true
>
> Pavel
>
>
>> --
>> Konstantin Knizhnik
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>
> So what's next?
> I see three options:
>
> 1. Do not introduce GUC for disabling all event triggers (especially taken
> in account that them are  disabled by default).
> Return back to the patch
> on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with
> "disable_client_connection_trigger"
> and make it true by default (to eliminate any overhead for users which are
> not using on logintriggers).
>
> 2. Have two GUCS: "disable_client_connection_trigger" and
> "disable_event_triggers".
>
> 3. Implement some mechanism for caching presence of event triggers in
> shared memory.
>

@3 is the best design (the things should work well by default), @2 is a
little bit chaotic and @1 looks like a workaround.

Regards

Pavel


>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: On login trigger: take three

2020-12-15 Thread Konstantin Knizhnik




On 11.12.2020 19:27, Pavel Stehule wrote:



pá 11. 12. 2020 v 17:05 odesílatel Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> napsal:




On 11.12.2020 18:40, Pavel Stehule wrote:


is not correct. It makes it not possible to superuser to
disable triggers for all users.


pg_database_ownercheck returns true for superuser always.


Sorry, but I consider different case: when normal user is
connected to the database.
In this case pg_database_ownercheck returns false and trigger is
not disabled, isn't it?


My idea was to reduce necessary rights to database owners.  But you 
have a true, so only superuser can create event trigger, so this 
feature cannot be used in DBaaS environments, and then my original 
idea was wrong.





Also GUCs are not associated with any database. So I do not
understand why  this check of database ownership is relevant
at all?

What kind of protection violation we want to prevent?

It seems to be obvious that normal user should not be able to
prevent trigger execution because this triggers may be used
to enforce some security policies.
If trigger was created by user itself, then it can drop or
disable it using ALTER statement. GUC is not needed for it.


when you cannot connect to the database, then you cannot do
ALTER. In DBaaS environments lot of users has not superuser rights.



But only superusers can set login triggers, right?
So only superuser can make a mistake in this trigger. But he have
enough rights to recover this error. Normal users are not able to
define on connection triggers and
should not have rights to disable them.


yes, it is true

Pavel


-- 
Konstantin Knizhnik

Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



So what's next?
I see three options:

1. Do not introduce GUC for disabling all event triggers (especially 
taken in account that them are  disabled by default).
Return back to the patch 
on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch with 
"disable_client_connection_trigger"
and make it true by default (to eliminate any overhead for users which 
are not using on logintriggers).


2. Have two GUCS: "disable_client_connection_trigger" and 
"disable_event_triggers".


3. Implement some mechanism for caching presence of event triggers in 
shared memory.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Movement of restart_lsn position movement of logical replication slots is very slow

2020-12-15 Thread Amit Kapila

On Tue, Dec 15, 2020 at 11:00 AM Jammie  wrote:
>
> Thanks Amit for the response
>
> We are using pgJDBC sample program here
> https://jdbc.postgresql.org/documentation/head/replication.html
>
> the setFlushLSN is coming from the pgJDBC only.
>
> git hub for APIs of pgJDBC methods available.
>
> https://github.com/pgjdbc/pgjdbc
>
> The second slot refers to "private" slot.
>
> So ""we are not doing reading from the stream' ==> It means that we are 
> having readPending call only from the shared slot then we get the 
> lastReceivedLSN() from stream and
> send it back to stream as confirmed_flush_lsn for both private and shared 
> slot. We dont do readPending call to private slot. we will use private slot 
> only when we dont have choice. It is kind of reserver slot for us.
>

I think this (not performing read/decode on the private slot) could be
the reason why it lagging behind. If you want to use as a reserve slot
then you probably want to at least perform
pg_replication_slot_advance() to move it to the required position. The
restart_lsn won't move unless you read/decode from that slot.

-- 
With Regards,
Amit Kapila.

Re: Add session statistics to pg_stat_database

2020-12-15 Thread Laurenz Albe

On Tue, 2020-12-15 at 13:53 +0100, Laurenz Albe wrote:
> Attached is patch version 9.

Aah, I forgot the ++.
Version 10 attached.

Yours,
Laurenz Albe
From b40e34141c80ff59c0005f430bd8c273918eb7bb Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 15 Dec 2020 13:46:44 +0100
Subject: [PATCH] Add session statistics to pg_stat_database

If "track_counts" is active, track the following per database:
- total number of connections
- number of sessions that ended by loss of network connection,
  fatal errors and operator intervention
- total time spent in database sessions
- total time spent executing queries
- total idle in transaction time

This is useful to check if connection pooling is working.
It also helps to estimate the size of the connection pool
required to keep the database busy, which depends on the
percentage of the transaction time that is spent idling.

Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.ca...@cybertec.at
Reviewed-By: Soumyadeep Chakraborty, Justin Pryzby, Masahiro Ikeda, Magnus Hagander

(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID)
---
 doc/src/sgml/monitoring.sgml |  69 ++
 src/backend/catalog/system_views.sql |   7 ++
 src/backend/postmaster/pgstat.c  | 134 ++-
 src/backend/tcop/postgres.c  |  10 +-
 src/backend/utils/adt/pgstatfuncs.c  |  94 +++
 src/backend/utils/error/elog.c   |   8 ++
 src/include/catalog/pg_proc.dat  |  28 ++
 src/include/pgstat.h |  39 
 src/test/regress/expected/rules.out  |   7 ++
 9 files changed, 391 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 52a69a5366..6206fefec0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3731,6 +3731,75 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   session_time double precision
+  
+  
+   Time spent by database sessions in this database, in milliseconds
+   (note that statistics are only updated when the state of a session
+   changes, so if sessions have been idle for a long time, this idle time
+   won't be included)
+  
+ 
+
+ 
+  
+   active_time double precision
+  
+  
+   Time spent executing SQL statements in this database, in milliseconds
+  
+ 
+
+ 
+  
+   idle_in_transaction_time double precision
+  
+  
+   Time spent idling while in a transaction in this database, in milliseconds
+  
+ 
+
+ 
+  
+   sessions bigint
+  
+  
+   Total number of sessions established to this database
+  
+ 
+
+ 
+  
+   sessions_abandoned bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   because connection to the client was lost
+  
+ 
+
+ 
+  
+   sessions_fatal bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   by fatal errors
+  
+ 
+
+ 
+  
+   sessions_killed bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   by operator intervention
+  
+ 
+
  
   
stats_reset timestamp with time zone
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b140c210bc..3a2569b135 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -924,6 +924,13 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_checksum_last_failure(D.oid) AS checksum_last_failure,
 pg_stat_get_db_blk_read_time(D.oid) AS blk_read_time,
 pg_stat_get_db_blk_write_time(D.oid) AS blk_write_time,
+pg_stat_get_db_session_time(D.oid) AS session_time,
+pg_stat_get_db_active_time(D.oid) AS active_time,
+pg_stat_get_db_idle_in_transaction_time(D.oid) AS idle_in_transaction_time,
+pg_stat_get_db_sessions(D.oid) AS sessions,
+pg_stat_get_db_sessions_abandoned(D.oid) AS sessions_abandoned,
+pg_stat_get_db_sessions_fatal(D.oid) AS sessions_fatal,
+pg_stat_get_db_sessions_killed(D.oid) AS sessions_killed,
 pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
 FROM (
 SELECT 0 AS oid, NULL::name AS datname
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7c75a25d21..a2337b78f1 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -258,6 +258,9 @@ static int	pgStatXactCommit = 0;
 static int	pgStatXactRollback = 0;
 PgStat_Counter pgStatBlockReadTime = 0;
 PgStat_Counter pgStatBlockWriteTime = 0;
+static PgStat_Counter pgStatActiveTime = 0;
+static PgStat_Counter pgStatTransactionIdleTime = 0;
+SessionE

Re: Add session statistics to pg_stat_database

2020-12-15 Thread Laurenz Albe

On Sun, 2020-12-13 at 17:49 +0100, Magnus Hagander wrote:
> > > I am considering the cases
> > > 
> > > 1) client just went away (currently "aborted")
> > > 2) death by FATAL error
> > > 3) killed by the administrator (or shutdown)
> > 
> > I named the three counters "sessions_client_eof", "sessions_fatal" and
> > "sessions_killed", but I am not wedded to these bike shed colors.
> 
> In true bikeshedding mode, I'm not entirely happy with sessions_client_eof,
>  but I'm also not sure I have a better suggestion. Maybe just "sessions_lost"
>  or "sessions_connlost", which is basically the terminology that the 
> documentation uses?
>  Maybe it's just me, but I don't really like the eof terminology here.
> 
> What do you think about that? Or does somebody else have an opinion here?

I slept over it, and came up with "sessions_abandoned".

> In today's dept of small things I noticed:
> 
> +   if (disconnect)
> +   msg.m_disconnect = pgStatSessionEndCause;
> 
> in the non-disconnect state, that variable is left uninitialized, isn't it?
> It does end up getting ignored later, but to be more future proof the enum 
> should probably
>  have a value specifically for "not disconnected yet"?

Yes.  I named it DISCONNECT_NOT_YET.

> +   case DISCONNECT_CLIENT_EOF:
> +   ++(dbentry->n_sessions_client_eof);
> +   break;
> 
> The normal syntax we'd use for that would be
>   dbentry->n_sessions_client_eof++;

Ok, changed.

> + typedef enum sessionEndType {
> 
> To be consistent with the other enums in the same place, seems this should be 
> SessionEndType.

True.  I have renamed the type.

Attached is patch version 9.
Added goodie: I ran pgindent on it.

Yours,
Laurenz Albe
From b40e34141c80ff59c0005f430bd8c273918eb7bb Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 15 Dec 2020 13:46:44 +0100
Subject: [PATCH] Add session statistics to pg_stat_database

If "track_counts" is active, track the following per database:
- total number of connections
- number of sessions that ended by loss of network connection,
  fatal errors and operator intervention
- total time spent in database sessions
- total time spent executing queries
- total idle in transaction time

This is useful to check if connection pooling is working.
It also helps to estimate the size of the connection pool
required to keep the database busy, which depends on the
percentage of the transaction time that is spent idling.

Discussion: https://postgr.es/m/b07e1f9953701b90c66ed368656f2aef40cac4fb.ca...@cybertec.at
Reviewed-By: Soumyadeep Chakraborty, Justin Pryzby, Masahiro Ikeda, Magnus Hagander

(This requires a catversion bump, as well as an update to PGSTAT_FILE_FORMAT_ID)
---
 doc/src/sgml/monitoring.sgml |  69 ++
 src/backend/catalog/system_views.sql |   7 ++
 src/backend/postmaster/pgstat.c  | 134 ++-
 src/backend/tcop/postgres.c  |  10 +-
 src/backend/utils/adt/pgstatfuncs.c  |  94 +++
 src/backend/utils/error/elog.c   |   8 ++
 src/include/catalog/pg_proc.dat  |  28 ++
 src/include/pgstat.h |  39 
 src/test/regress/expected/rules.out  |   7 ++
 9 files changed, 391 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 52a69a5366..6206fefec0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3731,6 +3731,75 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   session_time double precision
+  
+  
+   Time spent by database sessions in this database, in milliseconds
+   (note that statistics are only updated when the state of a session
+   changes, so if sessions have been idle for a long time, this idle time
+   won't be included)
+  
+ 
+
+ 
+  
+   active_time double precision
+  
+  
+   Time spent executing SQL statements in this database, in milliseconds
+  
+ 
+
+ 
+  
+   idle_in_transaction_time double precision
+  
+  
+   Time spent idling while in a transaction in this database, in milliseconds
+  
+ 
+
+ 
+  
+   sessions bigint
+  
+  
+   Total number of sessions established to this database
+  
+ 
+
+ 
+  
+   sessions_abandoned bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   because connection to the client was lost
+  
+ 
+
+ 
+  
+   sessions_fatal bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   by fatal errors
+  
+ 
+
+ 
+  
+   sessions_killed bigint
+  
+  
+   Number of database sessions to this database that were terminated
+   by operator intervention
+  
+ 
+
  
   
stats_reset timestamp with time zone
diff --git a

Re: [HACKERS] logical decoding of two-phase transactions

2020-12-15 Thread Amit Kapila

On Mon, Dec 14, 2020 at 2:59 PM Amit Kapila  wrote:
>

Today, I looked at one of the issues discussed earlier in this thread
[1] which is that decoding can block (or deadlock can happen) when the
user explicitly locks the catalog relation (like Lock pg_class) or
perform Cluster on non-relmapped catalog relations (like Cluster
pg_trigger using pg_class_oid_index; and the user_table on which we
have performed any operation has a trigger) in the prepared xact. As
discussed previously, we don't have a problem when user tables are
exclusively locked because during decoding we don't acquire any lock
on those and in fact, we have a test case for the same in the patch.

In the previous discussion, most people seem to be of opinion that we
should document it in a category "don't do that", or prohibit to
prepare transactions that lock system tables in the exclusive mode as
any way that can block the entire system. The other possibility could
be that the plugin can allow enabling lock_timeout when it wants to
allow decoding of two-phase xacts and if the timeout occurs it tries
to fetch by disabling two-phase option provided by the patch.

I think it is better to document this as there is no realistic
scenario where it can happen. I also think separately (not as part of
this patch) we can investigate whether it is a good idea to prohibit
prepare for transactions that acquire exclusive locks on catalog
relations.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAMGcDxf83P5SGnGH52%3D_0wRP9pO6uRWCMRwAA0nxKtZvir2_vQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Parallel Inserts in CREATE TABLE AS

2020-12-15 Thread Bharath Rupireddy

On Tue, Dec 15, 2020 at 5:48 PM Hou, Zhijie  wrote:
> > A little explanation about how to push down the ctas info in append.
> >
> > 1. about how to ignore tuple cost in this case.
> > IMO, it create gather path under append like the following:
> > query_planner
> > -make_one_rel
> > --set_base_rel_sizes
> > ---set_rel_size
> > set_append_rel_size (*)
> > -set_rel_size
> > --set_subquery_pathlist
> > ---subquery_planner
> > grouping_planner
> > -apply_scanjoin_target_to_paths
> > --generate_useful_gather_paths
> >
> > set_append_rel_size seems the right place where we can check and set a flag
> > to ignore tuple cost later.
> > We can set the flag for two cases when there is no parent path will be
> > created(such as : limit,sort,distinct...):
> > i) query_level is 1
> > ii) query_level > 1 and we have set the flag in the parent_root.
> >
> > The case ii) is to check append under append:
> > Append
> >->Append
> >->Gather
> >->Other plan
> >
> > 2.about how to push ctas info down.
> >
> > We traversing the whole plans tree, and we only care Append and Gather type.
> > Gather: It set the ctas dest info and returned true at once if the 
> > gathernode
> > does not have projection.
> > Append: It will recursively traversing the subplan of Appendnode and will
> > reture true if one of the subplan can be parallel.
> >
> > +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) {
> > + bool parallel = false;
> > +
> > + if(ps == NULL)
> > + return parallel;
> > +
> > + if(IsA(ps, AppendState))
> > + {
> > + AppendState *aps = (AppendState *) ps;
> > + for(int i = 0; i < aps->as_nplans; i++)
> > + {
> > + parallel |=
> > PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
> > + }
> > + }
> > + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
> > + {
> > + GatherState *gstate = (GatherState *) ps;
> > + parallel = true;
> > +
> > + ((DR_intorel *) dest)->is_parallel = true;
> > + gstate->dest = dest;
> > + ps->plan->plan_rows = 0;
> > + }
> > +
> > + return parallel;
> > +}
>
> So sorry for my miss, my last patch has some mistakes.
> Attatch the new one.

Thanks for the append patches. Basically your changes look good to me.
I'm merging them to the original patch set and adding the test cases
to cover these cases. I will post the updated patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

RE: Parallel Inserts in CREATE TABLE AS

2020-12-15 Thread Hou, Zhijie

> From: Hou, Zhijie [mailto:houzj.f...@cn.fujitsu.com]
> Sent: Tuesday, December 15, 2020 7:30 PM
> To: Bharath Rupireddy 
> Cc: Amit Kapila ; Luc Vlaming ;
> PostgreSQL-development ; Zhihong Yu
> ; Dilip Kumar 
> Subject: RE: Parallel Inserts in CREATE TABLE AS
> 
> > Thanks for the append use case.
> >
> > Here's my analysis on pushing parallel inserts down even in case the
> > top node is Append.
> >
> > For union cases which need to remove duplicate tuples, we can't push
> > the inserts or CTAS dest receiver down. If I'm not wrong, Append node
> > is not doing duplicate removal(??), I saw that it's the HashAggregate
> > node (which is the top node that removes the duplicate tuples). And
> > also for except/except all/intersect/intersect all cases we receive
> > HashSetOp nodes on top of Append. So for both cases, our check for
> > Gather or Append at the top node is enough to detect this to not allow
> parallel inserts.
> >
> > For union all:
> > case 1: We can push the CTAS dest receiver to each Gather node  Append
> >  ->Gather
> >  ->Parallel Seq Scan
> >  ->Gather
> >  ->Parallel Seq Scan
> >   ->Gather
> >  ->Parallel Seq Scan
> >
> > case 2: We can still push the CTAS dest receiver to each Gather node.
> > Non-Gather nodes will do inserts as they do now i.e. by sending tuples
> > to Append and from there to CTAS dest receiver.
> >  Append
> >  ->Gather
> >  ->Parallel Seq Scan
> >  ->Seq Scan / Join / any other non-Gather node
> >  ->Gather
> >  ->Parallel Seq Scan
> >  ->Seq Scan / Join / any other non-Gather node
> >
> > case 3:  We can push the CTAS dest receiver to Gather  Gather
> >  ->Parallel Append
> >  ->Parallel Seq Scan
> >  ->Parallel Seq Scan
> >
> > case 4: We can push the CTAS dest receiver to Gather  Gather
> >  ->Parallel Append
> >  ->Parallel Seq Scan
> >  ->Parallel Seq Scan
> >  ->Seq Scan / Join / any other non-Gather node
> >
> > Please let me know if I'm missing any other possible use case.
> >
> > Thoughts?
> 
> 
> Yes, The analysis looks right to me.
> 
> 
> > As suggested by Amit earlier, I kept the 0001 patch(so far) such that
> > it doesn't have the code to influence the planner to consider parallel
> > tuple cost as 0. It works on the plan whatever gets generated and
> > decides to allow parallel inserts or not. And in the 0002 patch, I
> > added the code for influencing the planner to consider parallel tuple
> > cost as 0. Maybe we can have a 0003 patch for tests alone.
> >
> > Once we are okay with the above analysis and use cases, we can
> > incorporate the Append changes to respective patches.
> >
> > Hope that's okay.
> 
> A little explanation about how to push down the ctas info in append.
> 
> 1. about how to ignore tuple cost in this case.
> IMO, it create gather path under append like the following:
> query_planner
> -make_one_rel
> --set_base_rel_sizes
> ---set_rel_size
> set_append_rel_size (*)
> -set_rel_size
> --set_subquery_pathlist
> ---subquery_planner
> grouping_planner
> -apply_scanjoin_target_to_paths
> --generate_useful_gather_paths
> 
> set_append_rel_size seems the right place where we can check and set a flag
> to ignore tuple cost later.
> We can set the flag for two cases when there is no parent path will be
> created(such as : limit,sort,distinct...):
> i) query_level is 1
> ii) query_level > 1 and we have set the flag in the parent_root.
> 
> The case ii) is to check append under append:
> Append
>->Append
>->Gather
>->Other plan
> 
> 2.about how to push ctas info down.
> 
> We traversing the whole plans tree, and we only care Append and Gather type.
> Gather: It set the ctas dest info and returned true at once if the gathernode
> does not have projection.
> Append: It will recursively traversing the subplan of Appendnode and will
> reture true if one of the subplan can be parallel.
> 
> +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) {
> + bool parallel = false;
> +
> + if(ps == NULL)
> + return parallel;
> +
> + if(IsA(ps, AppendState))
> + {
> + AppendState *aps = (AppendState *) ps;
> + for(int i = 0; i < aps->as_nplans; i++)
> + {
> + parallel |=
> PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
> + }
> + }
> + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
> + {
> + GatherState *gstate = (GatherState *) ps;
> + parallel = true;
> +
> + ((DR_intorel *) dest)->is_parallel = true;
> + gstate->dest = dest;
> + ps->plan->plan_rows = 0;
> + }
> +
> + return parallel;
> +}

So sorry for my miss, my last patch has some mistakes.
Attatch the new one.

Best regards,
houzj




0001-support-pctas-in-append-parallel-inserts.patch
Description: 0001-support-pctas-in-append-

RE: libpq debug log

2020-12-15 Thread iwata....@fujitsu.com

Hi Alvaro san,


> There are some things still to do:
I worked on some to do.

> 1. Is the handling of protocol 2 correct?
Protocol 3.0 is implemented in PostgreSQL v7.4 or later. Therefore, most 
servers and clients today want to connect using 3.0. 
Also, wishful thinking, I think Protocol 2.0 will no longer be supported. So I 
didn't develop it aggressively.
Another reason I'm concerned about implementing it would make the code less 
maintainable.

> 5. Error messages are still printing the terminating zero byte. I
> suggest that it should be suppressed.
I suppressed to print terminating zero byte like this;
2020-12-15 00:54:09 UTC < ErrorResponse 100 S "ERROR" V "ERROR" C "42P01" M 
"relation "a" does not exist" P "15" F "parse_relation.c" L "1379" R 
"parserOpenTable" 0

I thought about not outputting it, but the document said that if zero,
 it was the last message, so I made it output "0".

> 6. Let's redefine pqTraceMaybeBreakLine() (I renamed from
> pqLogLineBreak):
Sure. I redefined this function.

> 7. Why does it make sense to call pqTraceMaybeBreakLine when
> commsource=FROM_FRONTEND?
Backend messages include break line. However frontend messages don’t have it. 
So I call this functions.

Pending lists.
>  0. XXX comments
There were 4 XXX comments in Alvaro san's patch. 3 XXX comments are left in 
current patch.
I will answer this in next e-mail.

> 2. We need a mode to suppress print of time;
> 3. COPY ... (FORMAT BINARY) emits "invalid protocol" ... not good.
> 4. Even in text format, COPY output is not very useful.  How can we
improve that?


Regards,
Aya Iwata
Fujitsu


v10-0001-libpq-trace.patch
Description: v10-0001-libpq-trace.patch

RE: Parallel Inserts in CREATE TABLE AS

2020-12-15 Thread Hou, Zhijie

> Thanks for the append use case.
> 
> Here's my analysis on pushing parallel inserts down even in case the top
> node is Append.
> 
> For union cases which need to remove duplicate tuples, we can't push the
> inserts or CTAS dest receiver down. If I'm not wrong, Append node is not
> doing duplicate removal(??), I saw that it's the HashAggregate node (which
> is the top node that removes the duplicate tuples). And also for
> except/except all/intersect/intersect all cases we receive HashSetOp nodes
> on top of Append. So for both cases, our check for Gather or Append at the
> top node is enough to detect this to not allow parallel inserts.
> 
> For union all:
> case 1: We can push the CTAS dest receiver to each Gather node  Append
>  ->Gather
>  ->Parallel Seq Scan
>  ->Gather
>  ->Parallel Seq Scan
>   ->Gather
>  ->Parallel Seq Scan
> 
> case 2: We can still push the CTAS dest receiver to each Gather node.
> Non-Gather nodes will do inserts as they do now i.e. by sending tuples to
> Append and from there to CTAS dest receiver.
>  Append
>  ->Gather
>  ->Parallel Seq Scan
>  ->Seq Scan / Join / any other non-Gather node
>  ->Gather
>  ->Parallel Seq Scan
>  ->Seq Scan / Join / any other non-Gather node
> 
> case 3:  We can push the CTAS dest receiver to Gather  Gather
>  ->Parallel Append
>  ->Parallel Seq Scan
>  ->Parallel Seq Scan
> 
> case 4: We can push the CTAS dest receiver to Gather  Gather
>  ->Parallel Append
>  ->Parallel Seq Scan
>  ->Parallel Seq Scan
>  ->Seq Scan / Join / any other non-Gather node
> 
> Please let me know if I'm missing any other possible use case.
> 
> Thoughts?


Yes, The analysis looks right to me.


> As suggested by Amit earlier, I kept the 0001 patch(so far) such that it
> doesn't have the code to influence the planner to consider parallel tuple
> cost as 0. It works on the plan whatever gets generated and decides to allow
> parallel inserts or not. And in the 0002 patch, I added the code for
> influencing the planner to consider parallel tuple cost as 0. Maybe we can
> have a 0003 patch for tests alone.
> 
> Once we are okay with the above analysis and use cases, we can incorporate
> the Append changes to respective patches.
> 
> Hope that's okay.

A little explanation about how to push down the ctas info in append.

1. about how to ignore tuple cost in this case.
IMO, it create gather path under append like the following:
query_planner
-make_one_rel
--set_base_rel_sizes
---set_rel_size
set_append_rel_size (*)
-set_rel_size
--set_subquery_pathlist
---subquery_planner
grouping_planner
-apply_scanjoin_target_to_paths
--generate_useful_gather_paths

set_append_rel_size seems the right place where we can check and set a flag to 
ignore tuple cost later.
We can set the flag for two cases when there is no parent path will be 
created(such as : limit,sort,distinct...):
i) query_level is 1
ii) query_level > 1 and we have set the flag in the parent_root.

The case ii) is to check append under append:
Append
   ->Append
   ->Gather
   ->Other plan

2.about how to push ctas info down.

We traversing the whole plans tree, and we only care Append and Gather type.
Gather: It set the ctas dest info and returned true at once if the gathernode 
does not have projection.
Append: It will recursively traversing the subplan of Appendnode and will 
reture true if one of the subplan can be parallel.

+PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps)
+{
+   bool parallel = false;
+   
+   if(ps == NULL)
+   return parallel;
+
+   if(IsA(ps, AppendState))
+   {
+   AppendState *aps = (AppendState *) ps;
+   for(int i = 0; i < aps->as_nplans; i++)
+   {
+   parallel |= PushDownCTASParallelInsertState(dest, 
aps->appendplans[i]);
+   }
+   }
+   else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
+   {
+   GatherState *gstate = (GatherState *) ps;
+   parallel = true;
+
+   ((DR_intorel *) dest)->is_parallel = true;
+   gstate->dest = dest;
+   ps->plan->plan_rows = 0;
+   }
+
+   return parallel;
+}

Best regards,
houzj




0001-support-pctas-in-append-parallel-inserts.patch
Description: 0001-support-pctas-in-append-parallel-inserts.patch


0002-support-pctas-in-append-tuple-cost-adjustment.patch
Description: 0002-support-pctas-in-append-tuple-cost-adjustment.patch

Re: REINDEX backend filtering

2020-12-15 Thread Julien Rouhaud

On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier  wrote:
>
> On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:
> > Now that we have the infrastructure to track indexes that might be corrupted
> > due to changes in collation libraries, I think it would be a good idea to 
> > offer
> > an easy way for users to reindex all indexes that might be corrupted.
>
> Yes.  It would be a good thing.
>
> > The filter is also implemented so that you could cumulate multiple filters, 
> > so
> > it could be easy to add more filtering, for instance:
> >
> > REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;
> >
> > to only rebuild indexes depending on outdated libc collations, or
> >
> > REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;
> >
> > to only rebuild indexes depending on a specific version of libc.
>
> Deciding on the grammar to use depends on the use cases we would like
> to satisfy.  From what I heard on this topic, the goal is to reduce
> the amount of time necessary to reindex a system so as REINDEX only
> works on indexes whose dependent collation versions are not known or
> works on indexes in need of a collation refresh (like a reindexdb
> --all --collation -j $jobs).  What would be the benefit in having more
> complexity with library-dependent settings while we could take care
> of the use cases that matter the most with a simple grammar?  Perhaps
> "not_current" is not the best match as a keyword, we could just use
> "collation" and handle that as a boolean.  As long as we don't need
> new operators in the grammar rules..

I'm not sure what the DBA usual pattern here.  If the reindexing
runtime is really critical, I'm assuming that at least some people
will dig into library details to see what are the collations that
actually broke in the last upgrade and will want to reindex only
those, and force the version for the rest of the indexes.  And
obviously, they probably won't wait to have multiple collation
versions dependencies before taking care of that.  In that case the
filters that would matters would be one to only keep indexes with an
outdated collation version, and an additional one for a specific
collation name.  Or we could have the COLLATION keyword without
additional argument mean all outdated collations, and COLLATION
'collation_name' to specify a specific one.  This is maybe a bit ugly,
and would probably require a different approach for reindexdb.

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-15 Thread Alexey Kondratov

On 2020-12-15 03:14, Justin Pryzby wrote:

On Sat, Dec 12, 2020 at 01:45:26PM -0600, Justin Pryzby wrote:

On Sat, Dec 12, 2020 at 09:20:35AM +0100, Peter Eisentraut wrote:
> On 2020-12-11 21:27, Alvaro Herrera wrote:
> > By the way--  What did you think of the idea of explictly marking the
> > types used for bitmasks using types bits32 and friends, instead of plain
> > int, which is harder to spot?
>
> If we want to make it clearer, why not turn the thing into a struct, as in
> the attached patch, and avoid the bit fiddling altogether.

I like this.
It's a lot like what I wrote as [PATCH v31 1/5] ExecReindex and 
ReindexParams
In my v31 patch, I moved ReindexOptions to a private structure in 
indexcmds.c,
with an "int options" bitmask which is passed to reindex_index() et 
al.  Your
patch keeps/puts ReindexOptions index.h, so it also applies to 
reindex_index,

which I think is good.

So I've rebased this branch on your patch.

Some thoughts:

 - what about removing the REINDEXOPT_* prefix ?
 - You created local vars with initialization like "={}". But I 
thought it's
   needed to include at least one struct member like "={false}", or 
else

   they're not guaranteed to be zerod ?
 - You passed the structure across function calls.  The usual 
convention is to

   pass a pointer.

I think maybe Michael missed this message (?)
I had applied some changes on top of Peter's patch.

I squished those commits now, and also handled ClusterOption and 
VacuumOption

in the same style.

Some more thoughts:
 - should the structures be named in plural ? "ReindexOptions" etc.  
Since they

   define *all* the options, not just a single bit.
 - For vacuum, do we even need a separate structure, or should the 
members be

   directly within VacuumParams ?  It's a bit odd to write
   params.options.verbose.  Especially since there's also ternary 
options which

   are directly within params.

This is exactly what I have thought after looking on Peter's patch. 
Writing 'params.options.some_option' looks just too verbose. I even 
started moving all vacuum options into VacuumParams on my own and was in 
the middle of the process when realized that there are some places that 
do not fit well, like:

if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
onerel->rd_rel,
params->options & VACOPT_ANALYZE))

Here we do not want to set option permanently, but rather to trigger 
some additional code path in the vacuum_is_relation_owner(), IIUC. With 
unified VacuumParams we should do:

bool opt_analyze = params->analyze;
...
params->analyze = true;
if (!vacuum_is_relation_owner(RelationGetRelid(onerel),
onerel->rd_rel, params))
...
params->analyze = opt_analyze;

to achieve the same, but it does not look good to me, or change the 
whole interface. I have found at least one other place like that so far 
--- vacuum_open_relation() in the analyze_rel().

Not sure how we could better cope with such logic.

 - Then, for cluster, I think it should be called ClusterParams, and 
eventually

   include the tablespaceOid, like what we're doing for Reindex.

I am awaiting feedback on these before going further since I've done 
too much

rebasing with these ideas going back and forth and back.

Yes, we have moved a bit from my original patch set in the thread with 
all this refactoring. However, all the consequent patches are heavily 
depend on this interface, so let us decide first on the proposed 
refactoring.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: archive status ".ready" files may be created too early